File size: 3,340 Bytes

795d652
 
 
be19dd6
795d652
 
c7794dc
795d652
ea15bf8
795d652
 
c7794dc
 
 
 
 
c6ee238
c7794dc
ea15bf8
be19dd6
c7794dc
ea15bf8
159cb69
ea15bf8
 
 
159cb69
ea15bf8
 
 
159cb69
ea15bf8
 
795d652
 
 
 
 
a1d9df5
ea15bf8
a1d9df5
ea15bf8
795d652
 
ea15bf8
 
 
 
795d652
 
 
 
 
be19dd6
 
 
 
795d652
ea15bf8
795d652
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7794dc
795d652

---
license: mit
datasets:
- Silly-Machine/TuPyE-Dataset
language:
- pt

pipeline_tag: text-classification
base_model: neuralmind/bert-large-portuguese-cased
widget:
- text: 'Bom dia, flor do dia!!'

model-index:
  - name: Yi-34B
    results:
      - task:
          type: text-classfication
        dataset:
          name: TuPyE-Dataset
          type: Silly-Machine/TuPyE-Dataset
        metrics:
          - type: f1
            value: 0.85
            name: F1-score
            verified: true
          - type: precision
            value: 0.85
            name: Precision
            verified: true
          - type: recall
            value: 0.85
            name: Recall
            verified: true 
---

## Introduction


TuPy-Bert-Large-Multilabel is a fine-tuned BERT model designed specifically for multilabel classification of hate speech in Portuguese. 
Derived from the [BERTimbau large](https://huggingface.co/neuralmind/bert-large-portuguese-cased), 
TuPy-Bert-Large-Multilabel is a refined solution for addressing categorical hate speech concerns (ageism, aporophobia, body shame, capacitism, LGBTphobia, political, 
racism, religious intolerance, misogyny, and xenophobia).
For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).

The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. 
In the creation of a specialized Portuguese Language Model tailored for hate speech classification,
the original BERTimbau model underwent fine-tuning processe carried out on 
the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks.

## Available models

| Model                                    | Arch.      | #Layers | #Params |
| ---------------------------------------- | ---------- | ------- | ------- |
| `Silly-Machine/TuPy-Bert-Base-Binary-Classifier`  | BERT-Base	|12	|109M|
| `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24      | 334M    |
| `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12      | 109M    |
| `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24      | 334M    |

## Example usage

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
import torch
import numpy as np
from scipy.special import softmax

def classify_hate_speech(model_name, text):
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    config = AutoConfig.from_pretrained(model_name)

    # Tokenize input text and prepare model input
    model_input = tokenizer(text, padding=True, return_tensors="pt")

    # Get model output scores
    with torch.no_grad():
        output = model(**model_input)
        scores = softmax(output.logits.numpy(), axis=1)
        ranking = np.argsort(scores[0])[::-1]

    # Print the results
    for i, rank in enumerate(ranking):
        label = config.id2label[rank]
        score = scores[0, rank]
        print(f"{i + 1}) Label: {label} Score: {score:.4f}")

# Example usage
model_name = "Silly-Machine/TuPy-Bert-Large-Multilabel"
text = "Bom dia, flor do dia!!"
classify_hate_speech(model_name, text)
```