Instructions to use ResembleAI/chatterbox with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use ResembleAI/chatterbox with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Inference
- Notebooks
- Google Colab
- Kaggle
Is research paper available for this model?
I checked on github, huggingface and some other platforms but can't find any link to Chatterbox's research paper. Is that available? If not, any plans to make it available?
Hi @FahadMF679 , thansk for the interest! There's no research paper - mainly because the research team is small (3 people) and we're working full steam ahead on another model (diffusion conversational - like notebooklm) that we want to open source soon as well as adding new languages to chatterbox.
Hi @FahadMF679 , thansk for the interest! There's no research paper - mainly because the research team is small (3 people) and we're working full steam ahead on another model (diffusion conversational - like notebooklm) that we want to open source soon as well as adding new languages to chatterbox.
Why is t3_cfg.safetensors twice the size of t3_cfg.pt? To my knowledge, the safetensors format conversion doesn't inflate the checkpoint size this much. Are we looking at two completely different checkpoints?