LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 5 days ago • 127
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning Paper • 2605.20176 • Published 12 days ago • 12
MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware Paper • 2605.05945 • Published 24 days ago • 10
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis Paper • 2605.14392 • Published 17 days ago • 8
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published 25 days ago • 22
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation Paper • 2604.28196 • Published Apr 30 • 72
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 242
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models Paper • 2604.10949 • Published Apr 13 • 40
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning Paper • 2604.08168 • Published Apr 9 • 18
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies Paper • 2604.00830 • Published Apr 2 • 15
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published Apr 9 • 115
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal Paper • 2603.28224 • Published Mar 30 • 5