MemoBench: Benchmarking World Modeling in Dynamically Changing Environments Paper • 2606.27537 • Published 11 days ago • 6
AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks? Paper • 2606.05080 • Published Jun 3 • 30
Agent Skills Should Go Beyond Text: The Case for Visual Skills Paper • 2606.01414 • Published May 31 • 10
Agent Skills Should Go Beyond Text: The Case for Visual Skills Paper • 2606.01414 • Published May 31 • 10
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published May 18 • 8
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published May 18 • 8
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Paper • 2605.12684 • Published May 12 • 11
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding Paper • 2603.27064 • Published Mar 28 • 29
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding Paper • 2603.27064 • Published Mar 28 • 29
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs Paper • 2602.06566 • Published Feb 6 • 3
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs Paper • 2602.06566 • Published Feb 6 • 3
MIRA: Multimodal Iterative Reasoning Agent for Image Editing Paper • 2511.21087 • Published Nov 26, 2025 • 10
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination Paper • 2511.17490 • Published Nov 21, 2025 • 22
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data Paper • 2510.09781 • Published Oct 10, 2025 • 27
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation Paper • 2308.14710 • Published Aug 28, 2023