Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 17 days ago • 63
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents Paper • 2605.28775 • Published May 27 • 38
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published May 28 • 79
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published May 27 • 93
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents Paper • 2605.28775 • Published May 27 • 38