Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism Paper • 2605.12524 • Published Apr 7 • 4
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 3 days ago • 167
nodogoro/cell2_20260521_hossam_coffee_shop_setting20260521_200330 Viewer • Updated 2 days ago • 3.54k • 26 • 1
Evaluating Cognitive Age Alignment in Interactive AI Agents Paper • 2605.17894 • Published 6 days ago • 5
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 11 days ago • 262
latency-sensitive-bench/deadly_corridor_jitter_latency_uniform_min_1_max_3 Viewer • Updated 9 days ago • 2.59k • 100 • 1
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 187
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 629
Demystifying When Pruning Works via Representation Hierarchies Paper • 2603.24652 • Published Apr 6 • 20
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 364