MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published Jan 18 • 52
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation Paper • 2601.18533 • Published Jan 26
GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning Paper • 2604.20659 • Published Apr 22
What Makes Interaction Trajectories Effective for Training Terminal Agents? Paper • 2606.03461 • Published 7 days ago
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published Apr 11 • 82
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics Paper • 2601.14027 • Published Jan 20 • 14
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published Jan 18 • 52
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Paper • 2601.14004 • Published Jan 20 • 49
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving Paper • 2601.01426 • Published Jan 4 • 24