What is the split used to report results on OpenAI Moderation Dataset?

by avanigupta - opened Apr 17, 2024

Apr 17, 2024

•

edited Apr 17, 2024

It seems the model is trained/finetuned on evaluation OpenAI Moderation Dataset: https://huggingface.co/datasets/mmathys/openai-moderation-api-evaluation

There are some validation metric scores mentioned here: https://huggingface.co/KoalaAI/Text-Moderation#validation-metrics

I want to ask could you provide the split you used for it? I am not able to replicate your scores on entire https://huggingface.co/datasets/mmathys/openai-moderation-api-evaluation dataset.

@KoalaAI could you please comment on it?

avanigupta

Apr 17, 2024

This comment has been hidden

DarwinAnim8or

Koala AI org May 8, 2024

Hi! Sorry for the delayed response-- I don't get notifications from this org.

This dataset was used as a base, it was modified to fit within the requirements of AutoTrain; which has since been axed so I'm not sure I still have the variant training data split.
I still have the script used to modify the training data, but the split was randomly made by AT during training.

DarwinAnim8or changed discussion status to closed Dec 1, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment