1. Prepare Text for Kokoro TTS
Kokoro TTS responds best to clean inputs. Before you hit “Generate,” polish the script so the model understands rhythm, tone, and pronunciation.
Text cleanup checklist
- Fix punctuation: Add commas and periods where you expect natural pauses from Kokoro TTS.
- Spell phonetically: Use phonetic spellings (e.g., “proh-nuhn-see-ay-shun”) for tricky words.
- Insert stage cues: Brackets like
[whisper]or(excited)guide the delivery. - Group sentences: Break long paragraphs into two-sentence blocks to avoid breathless runs.
Keep the Kokoro TTS keyword density between 3–4% in long-form scripts to maintain strong relevance without sounding forced.
2. Choose the Right Kokoro TTS Voice
Select a voice profile that matches your project. Kokoro Web includes American and British accents plus male and female registers.
Kokoro TTS - American Female
Perfect for explainers, e-learning, and friendly marketing scripts.
Kokoro TTS - American Male
Great for interviews, podcasts, and developer updates.
Kokoro TTS - British Female
Adds an authoritative tone to financial or academic briefings.
After choosing, adjust the Kokoro TTS speed slider. Most scripts shine at 0.95–1.05x speed.
3. Control Pace & Pauses
Kokoro TTS respects punctuation and SSML breaks. Add short sentences for quick pacing or ellipses for suspenseful pauses.
Avoid robotic pacing
- Don’t string four or more clauses together without punctuation.
- Swap repeated commas for semicolons when a breath is needed.
- Use
<break time="400ms"/>inside SSML blocks for deliberate pauses.
4. Add Emphasis & Energy
Strategic emphasis keeps Kokoro TTS from sounding flat. Combine typography and SSML tags for best results.
Do this
- Capitalize important acronyms (e.g., “GPU”, “TTS”).
- Wrap key phrases in
<emphasis level="moderate">. - Use emojis sparingly to cue tone (👏, 😊) when appropriate.
Sample Kokoro TTS emphasis snippet
<speak>
<emphasis level="strong">Welcome to Kokoro Web.</emphasis>
This lightweight model delivers private, on-device speech in seconds.
</speak>
5. Use SSML for Precision
Kokoro TTS supports a subset of SSML through Transformers.js. These tags help you steer pronunciation without code.
| SSML Tag | Purpose | Example |
|---|---|---|
| <break> | Insert a pause measured in ms | <break time="300ms"/> |
| <prosody> | Change rate or pitch | <prosody rate="90%">slower intro</prosody> |
| <say-as> | Spell out numbers or date | <say-as interpret-as="characters">K T T S</say-as> |
Tip: Keep SSML balanced. Overuse can make Kokoro TTS feel mechanical.
6. Rapid Iteration & Testing
Since Kokoro TTS runs locally, iterate quickly. Generate short clips, adjust wording, and export when satisfied.
- Start with 4–6 sentence blocks to evaluate flow.
- Use the built-in audio player to check transitions.
- Copy generated text into a version control or note-taking tool for reuse.
7. Post-Process the Audio
Kokoro TTS outputs clean WAV files. For polished projects, run a quick mastering pass.
Recommended steps
- Normalize loudness to -16 LUFS for podcasts or -14 LUFS for video.
- Apply gentle EQ (1–2 dB) to brighten if desired.
- Add subtle room reverb to blend with existing audio tracks.
Tools to try
- Audacity (free) for normalization.
- iZotope RX Elements for batch cleanup.
- DaVinci Resolve Fairlight for video workflows.
8. Build a Repeatable Kokoro TTS Workflow
Once you land on a formula, turn it into a reusable checklist. Consistency is the fastest way to scale Kokoro TTS content.
- Template the script: Create a document with placeholders for emphasis, pauses, and speaker notes.
- Batch voices: Render different voices for the same script to compare resonance with the target audience.
- Archive exports: Keep versioned WAV files and metadata (speed, voice choice, SSML tags) for future updates.