Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads Paper • 2607.01002 • Published 4 days ago • 14
Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning Paper • 2606.31825 • Published 5 days ago • 19
SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use Paper • 2607.01874 • Published 3 days ago • 14
WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory Paper • 2607.02517 • Published 3 days ago • 21
AgenticDataBench: A Comprehensive Benchmark for Data Agents Paper • 2607.01647 • Published 3 days ago • 26
EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments Paper • 2607.02440 • Published 3 days ago • 43
AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents Paper • 2607.02255 • Published 3 days ago • 45
Program-as-Weights: A Programming Paradigm for Fuzzy Functions Paper • 2607.02512 • Published 3 days ago • 74
Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity Paper • 2607.00248 • Published 5 days ago • 23
CausalMix: Data Mixture as Causal Inference for Language Model Training Paper • 2607.01104 • Published 4 days ago • 17
ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving Paper • 2607.00466 • Published 4 days ago • 24
MemSyco-Bench: Benchmarking Sycophancy in Agent Memory Paper • 2607.01071 • Published 4 days ago • 22
Dockerless: Environment-Free Program Verifier for Coding Agents Paper • 2606.28436 • Published 9 days ago • 103
LISA: Likelihood Score Alignment for Visual-condition Controllable Generation Paper • 2606.27192 • Published 10 days ago • 13
Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments Paper • 2606.14397 • Published 10 days ago • 18
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It Paper • 2606.26027 • Published 11 days ago • 18
GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents Paper • 2606.24551 • Published 13 days ago • 28