When should I use WebGPU vs WASM?

Use WebGPU for lowest latency on modern GPUs; use WASM for broad compatibility in managed or virtualized environments.

How do I keep long runs stable?

Split into ≤20s sense units, export per segment, and relaunch the tab every 30–40 segments to reclaim memory.

Why is the first run slower?

Large models need to download and compile; caching speeds up warm starts.

WebGPU vs. WebAssembly for Kokoro TTS Inference: When to Use Which and Why

Kokoro Web supports both WebGPU and WebAssembly execution paths. This article gives you a practical framework to choose the right path based on latency, compatibility, and deployment constraints—backed by hands‑on tuning tactics for 2025 browsers.

1) Summary Framework

WebGPU: Lowest latency on modern GPUs; best for live preview and real‑time iteration.
WebAssembly: Broadest compatibility; great for managed desktops and VMs without GPU access.
Hybrid: Develop on WebGPU, ship with WebAssembly fallback for universal access.

2) Latency and Throughput

On comparable hardware, WebGPU typically produces 1.3×–3× lower latency for medium segments (8–20 seconds) and scales predictably with batch size. WASM with threads and SIMD still performs well for single‑segment synthesis, especially on high‑end CPUs.

3) Compatibility and Policy Constraints

Enterprise devices: WebAssembly often wins due to locked‑down GPUs and driver policies.
Education or kiosk modes: WASM reduces operational surprises; fewer moving parts.
Creative teams: WebGPU accelerates iteration and subjective voice tuning.

4) Memory and Stability Tuning

The Kokoro model is compact enough for daily use but still benefits from careful memory hygiene in long sessions.

Segment scripts into sense units of ≤20 seconds to reduce peak memory.
Keep a single active tab for heavy generation; avoid parallel tabs sharing GPU context.
Prefer WAV export per segment; concatenate later with a DAW or ffmpeg.

5) Browser Feature Flags

Use stable channels when possible. If testing on pre‑release builds, ensure WebGPU and WASM threads/SIMD are enabled and that cross‑origin isolation requirements are met for shared memory usage.

6) Developer Experience

WebGPU: Excellent for preview loops; perceptually faster voice tuning encourages better writing.
WASM: Predictable deployment footprint; ideal when you need a single path that “just works”.

7) Security and Privacy

Both paths keep text and audio on device—no uploads. Favor permissions‑light UX: explain model downloads, show progress, and disclose caching so users understand why later sessions are faster.

8) Example Pipelines

// WebGPU‑first pipeline (creator workstation)
Write script → Segment (≤20s) → Preview with WebGPU → Export WAV → Mastering → Publish

// WASM‑first pipeline (managed laptops)
Write script → Segment (≤20s) → Generate with WASM → Export WAV → Join with ffmpeg → Publish

9) Choosing Under Constraints

Strict IT policies / VDI: WASM path.
Live sessions / workshops: WebGPU if available; otherwise pre‑render key lines.
Mixed fleet: Detect WebGPU → else WASM, but unify UX and export options.

10) Takeaways

Use WebGPU to move fast and explore voice/style.
Use WASM to reach everyone with minimal surprises.
Design your workflow so either path can ship identical audio assets.

Author: Kokoro Web Team • Last updated 2025‑01‑15