In-browser Kokoro text-to-speech

Launch Kokoro TTS
Benchmarks Last updated: 2025‑01‑15

Benchmarking Kokoro TTS in 2025: CPU vs. GPU, Chrome vs. Edge, and Segment Strategy

This study focuses on realistic, repeatable browser conditions. We compare WebGPU and WebAssembly paths on common hardware, highlight the effect of segmentation on latency and stability, and share profiles that creators and teams can adopt immediately.

Test Matrix

  • Browsers: Chrome 121, Edge 121.
  • Hardware: recent laptop GPU (WebGPU path) and 8‑core CPU (WASM path).
  • Segments: 5s, 12s, 20s; batch of 10 lines.

Key Results

  • WebGPU reduces median latency by ~1.5×–3× for 8–20s segments.
  • WASM with threads/SIMD remains competitive for ≤8s lines and broad compatibility.
  • Chrome and Edge results are within ±10% on identical hardware; pick your ecosystem preference.

Segmentation Strategy

Segment by meaning, not time. Keep each unit under ~20 seconds, and avoid over‑long paragraphs. This approach keeps memory pressure low and improves perceived pacing.

Practical Profiles

// Creator profile (WebGPU)
Chrome 121 + WebGPU → 10–15s lines → WAV export → Mastering (−16 LUFS) → Publish

// Managed laptop profile (WASM)
Edge 121 + WASM threads/SIMD → 8–12s lines → WAV export → ffmpeg join → Publish

Stability Tips

  • Keep one active tab for generation during long sessions.
  • Restart the tab between 30–40 segments to reclaim memory.
  • Export per segment to minimize redo cost on errors.

What to Measure

  • Per‑segment synthesis time and standard deviation.
  • First‑load vs cached model load time.
  • Audio artifacts after mastering (peak, LUFS, de‑esser activity).

Takeaways

  • Prefer WebGPU when available; WASM is a solid universal baseline.
  • Segmentation improves both stability and editorial quality.
  • Standardize loudness to deliver consistent user experience.

Author: Kokoro Web Team • Last updated 2025‑01‑15