Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published 8 days ago • 30
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 13 days ago • 157
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published 13 days ago • 217
Rethinking OPD Collection This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 4 items • Updated 14 days ago • 2
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 18 days ago • 68
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL Paper • 2604.28123 • Published 25 days ago • 48
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction Paper • 2604.27393 • Published 26 days ago • 75
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 19 days ago • 111
MiA-Signature: Approximating Global Activation for Long-Context Understanding Paper • 2605.06416 • Published 19 days ago • 55
MAIC-UI: Making Interactive Courseware with Generative UI Paper • 2604.25806 • Published 28 days ago • 8
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 107
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published Jan 9 • 48
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published Feb 12 • 67
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 113