Training languages in the model card

by fyvo - opened Jul 4, 2022

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

+222

-198

fyvo

BigScience Workshop org Jul 4, 2022

•

edited Jul 4, 2022

The model card does not show the proportion of Arabic in the training data. The distribution of languages from the Niger-Congo family contains 'Kuganda', a probable misspelling of 'Luganda', spoken in Uganda. It is difficult to tell, as the corpora for Niger-Congo languages are not documented individually.

fyvo changed pull request status to open Jul 4, 2022

ybelkada

BigScience Workshop org Jul 4, 2022

Thanks for pointing out this!
I think it is worth it to open a PR on the main bloom repo as well since the model cards have been copied from there
cc-ing also @cakiki in case I did not missed anything

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment