OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published 7 days ago • 30
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Paper • 2604.10688 • Published Apr 12 • 26
PaTaRM Collection PaTaRM is a Generative Reward Model (GRM) for RLHF alignment. • 4 items • Updated Apr 2 • 2
PaTaRM Collection PaTaRM is a Generative Reward Model (GRM) for RLHF alignment. • 4 items • Updated Apr 2 • 2
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs Paper • 2506.19290 • Published Jun 24, 2025 • 53
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs Paper • 2505.24120 • Published May 30, 2025 • 50