RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code Paper • 2503.07832 • Published Mar 10, 2025