Text Classification
Transformers
Safetensors
Portuguese
bert
Eval Results (legacy)
text-embeddings-inference
Instructions to use Silly-Machine/TuPy-Bert-Large-Multilabel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Silly-Machine/TuPy-Bert-Large-Multilabel with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Silly-Machine/TuPy-Bert-Large-Multilabel")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Silly-Machine/TuPy-Bert-Large-Multilabel") model = AutoModelForSequenceClassification.from_pretrained("Silly-Machine/TuPy-Bert-Large-Multilabel") - Notebooks
- Google Colab
- Kaggle
File size: 3,340 Bytes
795d652 be19dd6 795d652 c7794dc 795d652 ea15bf8 795d652 c7794dc c6ee238 c7794dc ea15bf8 be19dd6 c7794dc ea15bf8 159cb69 ea15bf8 159cb69 ea15bf8 159cb69 ea15bf8 795d652 a1d9df5 ea15bf8 a1d9df5 ea15bf8 795d652 ea15bf8 795d652 be19dd6 795d652 ea15bf8 795d652 c7794dc 795d652 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | ---
license: mit
datasets:
- Silly-Machine/TuPyE-Dataset
language:
- pt
pipeline_tag: text-classification
base_model: neuralmind/bert-large-portuguese-cased
widget:
- text: 'Bom dia, flor do dia!!'
model-index:
- name: Yi-34B
results:
- task:
type: text-classfication
dataset:
name: TuPyE-Dataset
type: Silly-Machine/TuPyE-Dataset
metrics:
- type: f1
value: 0.85
name: F1-score
verified: true
- type: precision
value: 0.85
name: Precision
verified: true
- type: recall
value: 0.85
name: Recall
verified: true
---
## Introduction
TuPy-Bert-Large-Multilabel is a fine-tuned BERT model designed specifically for multilabel classification of hate speech in Portuguese.
Derived from the [BERTimbau large](https://huggingface.co/neuralmind/bert-large-portuguese-cased),
TuPy-Bert-Large-Multilabel is a refined solution for addressing categorical hate speech concerns (ageism, aporophobia, body shame, capacitism, LGBTphobia, political,
racism, religious intolerance, misogyny, and xenophobia).
For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data.
In the creation of a specialized Portuguese Language Model tailored for hate speech classification,
the original BERTimbau model underwent fine-tuning processe carried out on
the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks.
## Available models
| Model | Arch. | #Layers | #Params |
| ---------------------------------------- | ---------- | ------- | ------- |
| `Silly-Machine/TuPy-Bert-Base-Binary-Classifier` | BERT-Base |12 |109M|
| `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24 | 334M |
| `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12 | 109M |
| `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24 | 334M |
## Example usage
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
import torch
import numpy as np
from scipy.special import softmax
def classify_hate_speech(model_name, text):
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
# Tokenize input text and prepare model input
model_input = tokenizer(text, padding=True, return_tensors="pt")
# Get model output scores
with torch.no_grad():
output = model(**model_input)
scores = softmax(output.logits.numpy(), axis=1)
ranking = np.argsort(scores[0])[::-1]
# Print the results
for i, rank in enumerate(ranking):
label = config.id2label[rank]
score = scores[0, rank]
print(f"{i + 1}) Label: {label} Score: {score:.4f}")
# Example usage
model_name = "Silly-Machine/TuPy-Bert-Large-Multilabel"
text = "Bom dia, flor do dia!!"
classify_hate_speech(model_name, text)
``` |