Aunali

Cossale

https://auna.li?q=hf

AI & ML interests

Text2Image and Text2Text generation.

Recent Activity

liked a dataset 3 days ago

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k

liked a model 4 days ago

amaai-lab/merit

liked a model 4 days ago

openbmb/MiniCPM-o-4_5

View all activity

Organizations

liked a dataset 3 days ago

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k

Viewer • Updated May 1 • 38.5k • 8.46k • 330

liked 2 models 4 days ago

amaai-lab/merit

Feature Extraction • Updated 2 days ago • 3

openbmb/MiniCPM-o-4_5

Any-to-Any • 9B • Updated 19 days ago • 199k • 1.39k

upvoted a collection 24 days ago

Toto-2.0

Collection

5 items • Updated 27 days ago • 35

liked a dataset about 1 month ago

PleIAs/CommonLingua-Train

Viewer • Updated Apr 28 • 2.76M • 144 • 15

liked 2 models about 1 month ago

SicariusSicariiStuff/Assistant_Pepe_32B

33B • Updated May 7 • 521 • 47

amaai-lab/apex

Feature Extraction • Updated May 7 • 43 • 4

liked a dataset about 2 months ago

fvdfs41/Discord-Unveiled

Updated Jun 9, 2025 • 34 • 6

reacted to qgallouedec's post with 🔥 about 2 months ago

Post

2032

TRL v1.2 introduces the SSDTrainer 🚀

Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL.

The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model.

from trl.experimental.ssd import SSDConfig, SSDTrainer

trainer = SSDTrainer(
    model="Qwen/Qwen3-4B-Instruct",
    args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95),
    train_dataset=dataset,
)
trainer.train()

v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of use_transformers_paged, and key fixes for VLM response parsing.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0