OpenCensor-H1-Mini

OpenCensor-H1-Mini is a lightweight, efficient version of OpenCensor-H1, designed to detect profanity, toxicity, and offensive content in Hebrew text. It is fine-tuned on the onlplab/alephbert-base architecture.

Model Details

Model Name: OpenCensor-H1-Mini
Base Model: onlplab/alephbert-base
Task: Binary Classification (0 = Clean, 1 = Toxic/Profane)
Language: Hebrew
Max Sequence Length: 256 tokens (optimized for efficiency)

Performance

Metric	Score
Accuracy	0.9826
F1-Score	0.9823
Precision	0.9812
Recall	0.9835

Note: Best Threshold = 0.17

Training Graphs

Validation F1	Threshold Analysis

How to Use

You can use this model directly with the Hugging Face transformers library.

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model
model_id = "LikoKIko/OpenCensor-H1-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).eval()

def predict(text):
    # Tokenize input
    inputs = tokenizer(
        text, 
        return_tensors="pt", 
        truncation=True, 
        padding=True, 
        max_length=256
    )
    
    # Predict
    with torch.no_grad():
        logits = model(**inputs).logits
        score = torch.sigmoid(logits).item()
        
    return {
        "text": text,
        "score": round(score, 4),
        "is_toxic": score >= 0.17  # Threshold
    }

# Example usage
text = "אני אוהב את כולם" # "I love everyone"
print(predict(text))

Training Info

The model was trained using an optimized pipeline featuring:

Gradient Accumulation: Ensures stable training with larger effective batch sizes.
Smart Text Cleaning: Removes noise while preserving Hebrew, English, and important symbols (@#$%*).
Dynamic Padding: Uses efficient token lengths based on data distribution.

License

CC-BY-SA-4.0

Citation

@misc{opencensor-h1-mini,
  title = {OpenCensor-H1-Mini: Hebrew Profanity Detection Model},
  author = {LikoKIko},
  year = {2025},
  url = {https://huggingface.co/LikoKIko/OpenCensor-H1-Mini}
}

Downloads last month: 21

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for LikoKIko/OpenCensor-H1-Mini

Base model

onlplab/alephbert-base

Finetuned

(10)

this model

LikoKIko
/

OpenCensor-H1-Mini