In-browser Kokoro text-to-speech

Launch Kokoro TTS
Audio Engineering Last updated: 2025‑01‑15

Audio Quality Engineering for TTS: Sample Rates, Loudness, De‑Essing, and Delivery

Great scripts and voices still need engineering discipline. This guide shows how to set realistic targets for loudness and peak control, when to use 44.1 vs 48 kHz, how to apply subtle EQ and de‑essing, and which delivery formats preserve quality without bloating file sizes.

Sample Rate and Bit Depth

  • Master at 48 kHz if your video timeline is 48 kHz; use 44.1 kHz if audio‑only.
  • Export WAV 24‑bit for editing headroom; convert to delivery formats later.

Loudness Normalization

Target −16 LUFS for stereo (−19 LUFS mono). Use a transparent limiter at the end of your chain with 1–2 dB of gain reduction on peaks. Avoid over‑compression; narration should breathe.

Subtle EQ and De‑Esser

  • High‑shelf +1–2 dB above 6–8 kHz for air, if needed.
  • De‑esser centered at 5–8 kHz with low ratio; only engage on obvious “s” spikes.
  • High‑pass at 60–80 Hz to remove sub‑bass rumble in mixed content.

Delivery Formats

  • Web: MP3 192–256 kbps CBR or AAC 160–192 kbps.
  • Archival/Editing: WAV 24‑bit; keep masters for future revisions.
  • Podcasts: Normalize to platform guidelines; check true peak below −1 dBTP.

Reference Chain

// Simple mastering chain for narration
High‑pass (70 Hz) → Gentle EQ → De‑esser → Limiter (−1 dBTP, GR 1–2 dB) → LUFS meter

Quality Checklist

  • No clipping; peaks stay under −1 dBTP.
  • Integrated loudness near −16 LUFS stereo.
  • Articulation remains clear after light de‑essing.
  • Export settings match destination platform.

Author: Kokoro Web Team • Last updated 2025‑01‑15