Quantifying the Carbon Emissions of Machine Learning Paper • 1910.09700 • Published Oct 21, 2019 • 55
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation Paper • 2403.12019 • Published Mar 18, 2024 • 11
PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers Paper • 2605.26730 • Published May 27 • 17
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation Paper • 2605.16079 • Published May 15 • 29
DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis Paper • 2604.13416 • Published 16 days ago • 33
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 17 days ago • 49
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published May 12 • 16
ClawArena: Benchmarking AI Agents in Evolving Information Environments Paper • 2604.04202 • Published Apr 5 • 37
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments Paper • 2603.23638 • Published Mar 24 • 11
TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning Paper • 2603.12529 • Published Mar 13 • 19