The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages Paper • 2605.27901 • Published 11 days ago • 13
OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents Paper • 2605.28158 • Published 11 days ago • 6
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws Paper • 2605.21803 • Published 18 days ago • 4
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 20 days ago • 50
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 166
DiagramBank: A Large-scale Dataset of Diagram Design Exemplars with Paper Metadata for Retrieval-Augmented Generation Paper • 2604.20857 • Published Feb 28 • 3
Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents Paper • 2604.04979 • Published Apr 4 • 10