Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism Paper • 2605.12524 • Published Apr 7 • 4
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 4 days ago • 169
Evaluating Cognitive Age Alignment in Interactive AI Agents Paper • 2605.17894 • Published 7 days ago • 5
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 12 days ago • 264
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 188
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 629
Demystifying When Pruning Works via Representation Hierarchies Paper • 2603.24652 • Published Apr 6 • 20
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 364
EgoSim: Egocentric World Simulator for Embodied Interaction Generation Paper • 2604.01001 • Published Apr 1 • 38
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 351
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 311
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published Mar 16 • 153
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 211
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval Paper • 2603.04743 • Published Mar 5 • 53
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published Feb 26 • 150