I’m thrilled to share a major update to my Chatterbox TTS Extended repo. This new version is a game-changer for anyone working with text-to-speech technology. The best part? It saves a ton of time by eliminating the need for generating multiple versions of each chunk to reduce artifacts.
So, what’s the secret? I’ve found that by using the pyrnnoise denoising module, I can get rid of 95%-100% of artifacts, especially when used with the auto-editor feature. This has allowed me to generate audiobooks incredibly faster than before.
But that’s not all. I’ve also fixed the issue where setting a specific seed didn’t reproduce the same results. That’s no longer a problem.
If you’re interested in exploring Chatterbox TTS Extended, you can find the repo on GitHub. Installation is easy, and I’ve listed the current features below:
* Text input (box + multi-file upload)
* Reference audio (conditioning)
* Separate/merge file output
* Emotion, CFG, temperature, seed
* Batch/smart-append/split (sentences)
* Sound word remove/replace
* Inline reference number removal
* Dot-letter correction
* Lowercase & whitespace normalization
* Auto-Editor post-processing
* pyrnnoise denoising (RNNoise)
* FFmpeg normalization (EBU/peak)
* WAV/MP3/FLAC export
* Candidates per chunk, retries, fallback
* Parallelism (workers)
* Whisper/faster-whisper backend
* Persistent settings (JSON/CSV per output)
* Settings load/save in UI
* Audio preview & download
* Help/Instructions
* Voice Conversion (VC tab)
I’ve seen some amazing forks of Chatterbox TTS in the Stable Diffusion subreddit. It’s amazing what people have been doing with this technology. My version is focused on audiobook creation for my kids, but I’m excited to see how others will use it.
—
*Further reading: Chatterbox TTS Extended GitHub repo