Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 6 days ago • 84
GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation Paper • 2605.27491 • Published 7 days ago • 16
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 14 days ago • 81
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 13 days ago • 204
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation Paper • 2604.27263 • Published 19 days ago • 11
Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces Paper • 2605.14786 • Published 19 days ago • 2
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization Paper • 2605.13641 • Published 20 days ago • 50
RemoteZero: Geospatial Reasoning with Zero Human Annotations Paper • 2605.04451 • Published 27 days ago • 8
ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models Paper • 2405.13729 • Published Apr 29 • 13
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus Paper • 2604.16913 • Published Apr 18 • 1
Action Images: End-to-End Policy Learning via Multiview Video Generation Paper • 2604.06168 • Published Apr 7 • 14