In-browser Kokoro text-to-speech

Launch Kokoro TTS
Kokoro TTS Handbook

Kokoro TTS Complete Guide 2025

Kokoro Web puts a fast, private English voice generator inside your browser. This guide covers every step—from prepping scripts to mastering SSML—so you can create professional audio for tutorials, product updates, marketing videos, and more.

82M
Lightweight Kokoro model
0 Uploads
Private, on-device rendering
Under 2s
Average render time per paragraph

Guide Overview

01. Getting Started with Kokoro Web

Open kokoroweb.app and press “Launch Kokoro TTS.” The app loads the 82M voice model locally (WebGPU by default, WASM fallback). Expect a one-time download of ~150MB—no servers involved.

Quick setup checklist

  • Preload the voice model before meetings or recording sessions.
  • Wear headphones to monitor audio playback while editing.
  • Bookmark Kokoro Web to pin it next to your script editor.

02. Script Preparation Techniques

High-quality Kokoro TTS output starts with well-structured text. Keep sentences concise, use punctuation for pacing, and add stage cues to guide tone.

Formatting guidelines

  • Limit sentences to 20–25 words.
  • Use commas for short pauses; periods or for longer breaks.
  • Spell names phonetically in parentheses if needed.

Helpful macros

Create keyboard snippets for recurring phrases, such as “Thanks for tuning in to Kokoro TTS” or “Here’s what changed this sprint.” This keeps the Kokoro TTS keyword density consistent.

03. Voice Selection & Tuning

Kokoro Web includes three curated voices. Match the voice to your audience and adjust speed for natural delivery.

Voice Best For Suggested Speed Notes
American Female Product tutorials, e-learning, customer success 0.98×–1.05× Warm, approachable tone
American Male Engineering updates, podcasts, newsletters 0.95×–1.00× Solid for technical narration
British Female Financial summaries, academic content 0.92×–0.98× Authoritative delivery

04. SSML Essentials

Kokoro TTS respects a practical set of SSML tags. Use them to control pitch, pauses, and pronunciation without writing custom code.

<speak>
Welcome to <emphasis level="moderate">Kokoro Web</emphasis>.
<break time="250ms"/> 
We ship private, on-device voice synthesis with no uploads.
<say-as interpret-as="characters">AI</say-as> at your fingertips.
</speak>

Test SSML snippets in short paragraphs first. Overusing them can make narration feel robotic.

05. Creative Workflows

06. Troubleshooting & FAQ

Audio sounds flat?

Add emphasis tags, break sentences shorter, or lower speed slightly.

Mispronunciations?

Include phonetic hints or use <say-as> for acronyms and numbers.

Slow rendering?

Check if WebGPU is enabled (chrome://gpu). Falling back to WASM on low-end hardware is normal but slightly slower.

07. Automation Ideas

Because Kokoro TTS runs client-side, automation means prepping scripts and opening the app rather than sending API calls.

08. Export & Delivery Checklist

  1. Generate WAV: Export each section and name files clearly (e.g., update-2025-01.wav).
  2. Normalize: Use Audacity or ffmpeg to set loudness to -16 LUFS (voice-only) or -14 LUFS (voice + music).
  3. Convert: Save MP3 versions for lightweight sharing if needed.
  4. Distribute: Upload to your CMS, podcast host, or video timeline.

Ready to build with Kokoro TTS?

Open Kokoro Web, paste your script, and hear the difference of private, on-device voice synthesis. No logins, no fees—just instant audio.

Launch Kokoro Web