Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding Paper • 2605.02290 • Published May 4 • 42
AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs Paper • 2605.15565 • Published May 15 • 17
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence Paper • 2605.26494 • Published 26 days ago • 41
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2606.15007 • Published 9 days ago • 15
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models Paper • 2606.16140 • Published 6 days ago • 100
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale Paper • 2606.15079 • Published 8 days ago • 75
Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning Paper • 2605.30039 • Published 23 days ago • 20 • 2
Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning Paper • 2605.30039 • Published 23 days ago • 20
MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection Paper • 2605.30288 • Published 23 days ago • 23
STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations Paper • 2606.05165 • Published 18 days ago • 4
LLM Explainability with Counterfactual Chains and Causal Graphs Paper • 2606.05972 • Published 17 days ago • 17
PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams Paper • 2606.07454 • Published 16 days ago • 13
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 12 days ago • 41
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 10 days ago • 89
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs Paper • 2604.10480 • Published Apr 12 • 20
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 9 days ago • 158
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration? Paper • 2603.03202 • Published Mar 3 • 18