01. Getting Started with Kokoro Web
Open kokoroweb.app and press “Launch Kokoro TTS.” The app loads the 82M voice model locally (WebGPU by default, WASM fallback). Expect a one-time download of ~150MB—no servers involved.
- Use Chrome 120+, Edge 120+, or Safari 17+ for best WebGPU performance.
- If WebGPU is unavailable, Kokoro TTS automatically switches to WASM.
- Keep your browser tab active during initial model load to avoid throttling.
Quick setup checklist
- Preload the voice model before meetings or recording sessions.
- Wear headphones to monitor audio playback while editing.
- Bookmark Kokoro Web to pin it next to your script editor.
02. Script Preparation Techniques
High-quality Kokoro TTS output starts with well-structured text. Keep sentences concise, use punctuation for pacing, and add stage cues to guide tone.
Formatting guidelines
- Limit sentences to 20–25 words.
- Use commas for short pauses; periods or
…for longer breaks. - Spell names phonetically in parentheses if needed.
Helpful macros
Create keyboard snippets for recurring phrases, such as “Thanks for tuning in to Kokoro TTS” or “Here’s what changed this sprint.” This keeps the Kokoro TTS keyword density consistent.
03. Voice Selection & Tuning
Kokoro Web includes three curated voices. Match the voice to your audience and adjust speed for natural delivery.
| Voice | Best For | Suggested Speed | Notes |
|---|---|---|---|
| American Female | Product tutorials, e-learning, customer success | 0.98×–1.05× | Warm, approachable tone |
| American Male | Engineering updates, podcasts, newsletters | 0.95×–1.00× | Solid for technical narration |
| British Female | Financial summaries, academic content | 0.92×–0.98× | Authoritative delivery |
04. SSML Essentials
Kokoro TTS respects a practical set of SSML tags. Use them to control pitch, pauses, and pronunciation without writing custom code.
<speak> Welcome to <emphasis level="moderate">Kokoro Web</emphasis>. <break time="250ms"/> We ship private, on-device voice synthesis with no uploads. <say-as interpret-as="characters">AI</say-as> at your fingertips. </speak>
Test SSML snippets in short paragraphs first. Overusing them can make narration feel robotic.
05. Creative Workflows
- Weekly product updates: Reuse a script template, swap numbers and highlights, render, and publish within 10 minutes.
- Course narration: Break lessons into 2–3 minute segments, generate audio per section, and store WAV files in a version-controlled repo.
- Support announcements: Pair Kokoro TTS audio with email bulletins to make updates more accessible.
06. Troubleshooting & FAQ
Audio sounds flat?
Add emphasis tags, break sentences shorter, or lower speed slightly.
Mispronunciations?
Include phonetic hints or use <say-as> for acronyms and numbers.
Slow rendering?
Check if WebGPU is enabled (chrome://gpu). Falling back to WASM on low-end hardware is normal but slightly slower.
07. Automation Ideas
Because Kokoro TTS runs client-side, automation means prepping scripts and opening the app rather than sending API calls.
- Trigger reminders in Notion or Linear after publishing release notes.
- Use Raycast or Alfred shortcuts to paste templated intros into Kokoro TTS.
- Store voice settings in a shared doc so teammates replicate your tone.
08. Export & Delivery Checklist
- Generate WAV: Export each section and name files clearly (e.g.,
update-2025-01.wav). - Normalize: Use Audacity or ffmpeg to set loudness to -16 LUFS (voice-only) or -14 LUFS (voice + music).
- Convert: Save MP3 versions for lightweight sharing if needed.
- Distribute: Upload to your CMS, podcast host, or video timeline.
Ready to build with Kokoro TTS?
Open Kokoro Web, paste your script, and hear the difference of private, on-device voice synthesis. No logins, no fees—just instant audio.
Launch Kokoro Web