How do I compare WebGPU and WASM fairly?

Control for segment length, hardware, and browser versions. Measure median time per segment and standard deviation.

What segment length is best?

8–20 seconds balances stability and throughput for most scripts.

Why are cold starts slower?

Model download and compilation dominate cold start; warm caches are much faster.

Benchmarking Kokoro TTS in 2025: CPU vs. GPU, Chrome vs. Edge, and Segment Strategy

This study focuses on realistic, repeatable browser conditions. We compare WebGPU and WebAssembly paths on common hardware, highlight the effect of segmentation on latency and stability, and share profiles that creators and teams can adopt immediately.

Test Matrix

Browsers: Chrome 121, Edge 121.
Hardware: recent laptop GPU (WebGPU path) and 8‑core CPU (WASM path).
Segments: 5s, 12s, 20s; batch of 10 lines.

Key Results

WebGPU reduces median latency by ~1.5×–3× for 8–20s segments.
WASM with threads/SIMD remains competitive for ≤8s lines and broad compatibility.
Chrome and Edge results are within ±10% on identical hardware; pick your ecosystem preference.

Segmentation Strategy

Segment by meaning, not time. Keep each unit under ~20 seconds, and avoid over‑long paragraphs. This approach keeps memory pressure low and improves perceived pacing.

Practical Profiles

// Creator profile (WebGPU)
Chrome 121 + WebGPU → 10–15s lines → WAV export → Mastering (−16 LUFS) → Publish

// Managed laptop profile (WASM)
Edge 121 + WASM threads/SIMD → 8–12s lines → WAV export → ffmpeg join → Publish

Stability Tips

Keep one active tab for generation during long sessions.
Restart the tab between 30–40 segments to reclaim memory.
Export per segment to minimize redo cost on errors.

What to Measure

Per‑segment synthesis time and standard deviation.
First‑load vs cached model load time.
Audio artifacts after mastering (peak, LUFS, de‑esser activity).

Takeaways

Prefer WebGPU when available; WASM is a solid universal baseline.
Segmentation improves both stability and editorial quality.
Standardize loudness to deliver consistent user experience.

Author: Kokoro Web Team • Last updated 2025‑01‑15