26 2

liyaxuan

lllyx

AI & ML interests

None yet

Recent Activity

upvoted a paper 7 days ago

Post-Trained MoE Can Skip Half Experts via Self-Distillation

upvoted a paper 8 days ago

Self-Distilled Agentic Reinforcement Learning

upvoted a paper 11 days ago

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

View all activity

Organizations

None yet

upvoted a paper 7 days ago

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Paper • 2605.18643 • Published 8 days ago • 30

upvoted a paper 8 days ago

Self-Distilled Agentic Reinforcement Learning

Paper • 2605.15155 • Published 12 days ago • 110

upvoted a paper 11 days ago

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Paper • 2605.13301 • Published 13 days ago • 157

upvoted a paper 12 days ago

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Paper • 2605.13779 • Published 13 days ago • 217

upvoted a collection 15 days ago

Rethinking OPD

Collection

This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 4 items • Updated 14 days ago • 2

upvoted a paper 15 days ago

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Paper • 2605.08083 • Published 18 days ago • 68

upvoted 4 papers 16 days ago

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

Paper • 2604.28123 • Published 25 days ago • 48

upvoted 2 papers 23 days ago

MAIC-UI: Making Interactive Courseware with Generative UI

Paper • 2604.25806 • Published 28 days ago • 8

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 27 days ago • 66

upvoted 2 papers about 1 month ago

Near-Future Policy Optimization

Paper • 2604.20733 • Published Apr 22 • 77

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 107

upvoted 2 papers about 2 months ago

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 48

Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 176

upvoted a paper 3 months ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published Feb 12 • 67

upvoted a collection 3 months ago

UltraData

Collection

Ultra Scale, Ultra Quality, Ultra Coverage • 10 items • Updated 2 days ago • 83

upvoted 2 papers 4 months ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 113

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

liyaxuan

AI & ML interests

Recent Activity

Organizations

lllyx's activity