Instructions to use nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0 with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
Any publication?
#5
by sappho192 - opened
Hi, thank you for releasing this model into public.
I'd like to study what changes were made in this 2.0 version compared to the previous model, but I couldn't find any papers related to this.
Is there any way I can find out in detail what has changed?
Thanks in advance.
The biggest diff in the training dataset, plus slightly different augmentations. The training data of 2.0 version includes non-speech audio samples to help the model distinguish between speech and non-speech sounds (such as coughing, laughter, and breathing, etc.)
You can refer to MarbleNet Paper: https://arxiv.org/pdf/2010.13886
sappho192 changed discussion status to closed