| --- |
| '[object Object]': null |
| license: mit |
| tags: |
| - audio |
| - deep-learning |
| - pytorch |
| - generative-adversarial-network |
| - codec |
| - gans |
| - compression-algorithm |
| - audio-compression |
| - RVQ |
| --- |
| |
|
|
| # Descript Audio Codec |
|
|
| π With Descript Audio Codec, you can compress **44.1 KHz audio** into discrete codes at a **low 8 kbps bitrate**. <br> |
| π€ That's approximately **90x compression** while maintaining exceptional fidelity and minimizing artifacts. <br> |
| πͺ Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio. <br> |
| π It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br> |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| - **License:** MIT |
|
|
| ### Model Sources |
|
|
| - **Repository:** [Github Repo](https://github.com/descriptinc/descript-audio-codec) |
| - **Paper:** [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN |
| ](http://arxiv.org/abs/2306.06546) |
| - **Demo:** [Demo Site](https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5) |
|
|
| ## Uses |
|
|
| The model is intended for compressing audio files containing speech, music and environmental sounds. |
|
|
| ### Out-of-Scope Use |
|
|
| It is not intended to be used for compressing other file formats such as text, images, etc. |
|
|
| ## Bias, Risks, and Limitations |
| Our model has difficulty reconstructing some challenging audio. It |
| performs best for speech and has more issues with environmental sounds. It |
| does not model some musical instruments perfectly, such as glockenspeil, or synthesizer sounds. |
|
|
|
|
| ## How to Get Started with the Model |
| This model is meant to be used with our official repo linked above. We release the model here for redundancy purposes. |
| Our code is able to pull the weights from their |
| [original location on Github](https://github.com/descriptinc/descript-audio-codec/releases/download/0.0.1/weights.pth). |
| Please refer to the official [README](https://github.com/descriptinc/descript-audio-codec#readme) for usage instructions. |
|
|
| ## Citation |
|
|
| **BibTeX:** |
|
|
| ``` |
| @misc{kumar2023highfidelity, |
| title={High-Fidelity Audio Compression with Improved RVQGAN}, |
| author={Rithesh Kumar and Prem Seetharaman and Alejandro Luebs and Ishaan Kumar and Kundan Kumar}, |
| year={2023}, |
| eprint={2306.06546}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.SD} |
| } |
| ``` |