What is VoiceCraft?

VoiceCraft is an advanced tool designed for zero-shot speech editing and text-to-speech (TTS) tasks, particularly adept at handling diverse and uncontrolled data sources like audiobooks, internet videos, and podcasts.

Leveraging token infilling neural codec language models, VoiceCraft achieves state-of-the-art performance in both speech editing and zero-shot TTS.With minimal reference, it can clone or edit unseen voices within seconds.

Key features include model weights available on HuggingFace, training guidance, and inference demos for speech editing and TTS.The tool offers multiple ways to run TTS inference, including with and without Docker.

It provides comprehensive environment setup instructions and supports training and fine-tuning of models.Users can train VoiceCraft models using provided datasets and manifest files, preparing utterances, transcripts, and phoneme sequences.

The codebase is licensed under CC BY-NC-SA 4.0, while model weights are under Coqui Public Model License 1.0.0.Acknowledgments are given to related projects and individuals, and a citation for VoiceCraft's paper is provided.

A disclaimer emphasizes the ethical use of the technology, prohibiting unauthorized speech generation or editing.Overall, VoiceCraft offers a sophisticated solution for handling various speech editing and TTS tasks with high accuracy and efficiency.

VoiceCraft user reviews

Would you recommend VoiceCraft?

Recommend this tool?

VoiceCraft use cases

Edit speech seamlessly in diverse contexts like audiobooks and podcasts.
Generate natural-sounding speech from text inputs, useful for audiobook creation.
Train and fine-tune models to personalize and optimize speech generation tasks.