In-browser Kokoro text-to-speech

Launch Kokoro TTS
Workflow Last updated: 2025‑01‑15

From Local Captioning to Voiceover: Whisper‑Style Transcription + Kokoro TTS

This tutorial outlines a private pipeline to transcribe content locally, edit and segment the script, and generate clean voiceovers with Kokoro—all without sending media to servers.

1) Transcribe Locally

  • Run a local transcription tool to produce text and timestamps (e.g., .srt).
  • Export raw text for editing; keep timestamps for caption use.

2) Edit and Segment

Clean filler words, fix names, and split into sense units of ≤20 seconds. Keep punctuation and stage directions to guide prosody.

3) Generate with Kokoro

  • Preview each segment, adjust pacing and emphasis, export WAV.
  • Name files sequentially to simplify assembly (001.wav, 002.wav...).

4) Assemble and Master

  • Concatenate segments with a DAW or ffmpeg; normalize to −16 LUFS.
  • Deliver MP3 192–256 kbps for web; keep WAV masters.

Benefits

  • Privacy by default; no uploads.
  • Consistent narration quality and repeatable edits.
  • Faster iteration on phrasing and timing.

Author: Kokoro Web Team • Last updated 2025‑01‑15