Text Classification
Transformers
Safetensors
Hebrew
bert
profanity-detection
toxicity
hebrew
alephbert
text-embeddings-inference
Instructions to use LikoKIko/OpenCensor-H1-Mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LikoKIko/OpenCensor-H1-Mini with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="LikoKIko/OpenCensor-H1-Mini")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("LikoKIko/OpenCensor-H1-Mini") model = AutoModelForSequenceClassification.from_pretrained("LikoKIko/OpenCensor-H1-Mini") - Notebooks
- Google Colab
- Kaggle
OpenCensor-H1-Mini
OpenCensor-H1-Mini is a lightweight, efficient version of OpenCensor-H1, designed to detect profanity, toxicity, and offensive content in Hebrew text. It is fine-tuned on the onlplab/alephbert-base architecture.
Model Details
- Model Name: OpenCensor-H1-Mini
- Base Model:
onlplab/alephbert-base - Task: Binary Classification (0 = Clean, 1 = Toxic/Profane)
- Language: Hebrew
- Max Sequence Length: 256 tokens (optimized for efficiency)
Performance
| Metric | Score |
|---|---|
| Accuracy | 0.9826 |
| F1-Score | 0.9823 |
| Precision | 0.9812 |
| Recall | 0.9835 |
Note: Best Threshold = 0.17
Training Graphs
How to Use
You can use this model directly with the Hugging Face transformers library.
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the model
model_id = "LikoKIko/OpenCensor-H1-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).eval()
def predict(text):
# Tokenize input
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=True,
max_length=256
)
# Predict
with torch.no_grad():
logits = model(**inputs).logits
score = torch.sigmoid(logits).item()
return {
"text": text,
"score": round(score, 4),
"is_toxic": score >= 0.17 # Threshold
}
# Example usage
text = "ืื ื ืืืื ืืช ืืืื" # "I love everyone"
print(predict(text))
Training Info
The model was trained using an optimized pipeline featuring:
- Gradient Accumulation: Ensures stable training with larger effective batch sizes.
- Smart Text Cleaning: Removes noise while preserving Hebrew, English, and important symbols (
@#$%*). - Dynamic Padding: Uses efficient token lengths based on data distribution.
License
CC-BY-SA-4.0
Citation
@misc{opencensor-h1-mini,
title = {OpenCensor-H1-Mini: Hebrew Profanity Detection Model},
author = {LikoKIko},
year = {2025},
url = {https://huggingface.co/LikoKIko/OpenCensor-H1-Mini}
}
- Downloads last month
- 21
Model tree for LikoKIko/OpenCensor-H1-Mini
Base model
onlplab/alephbert-base

