EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published 13 days ago • 79
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published 16 days ago • 185
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts Paper • 2605.22641 • Published 14 days ago • 4
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 15 days ago • 204
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 22 days ago • 270
maxqualia/pi05-simdata22-887b3a25-loop-pi05-simdata22-50ep-3cc44d46 Robotics • 4B • Updated 16 days ago • 41 • 1
IAM: Identity-Aware Human Motion and Shape Joint Generation Paper • 2604.25164 • Published Apr 28 • 2
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published Apr 5 • 53
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models Paper • 2604.08340 • Published Apr 9 • 8