Peng Wang's picture

In a Training Loop 🔄

Peng Wang

stillarrow

·

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

liked a dataset 5 days ago

agentica-org/DeepCoder-Preview-Dataset

liked a model 8 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

updated a model 11 days ago

stillarrow/qwen2.5-coder-1.5b-instruct__scpo_no_std_code_hidden_only_shortcut_guard

View all activity

Organizations

None yet

upvoted a paper 12 days ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 231

upvoted a paper 14 days ago

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Paper • 2602.10090 • Published Feb 10 • 53

upvoted 3 papers about 1 month ago

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Paper • 2604.14268 • Published Apr 15 • 120

Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 195

Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 175

upvoted a collection about 2 months ago

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 38 items • Updated Mar 2 • 367

upvoted a paper about 2 months ago

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 227

upvoted a paper 2 months ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published Mar 16 • 186

upvoted a collection 2 months ago

NeMo Gym

Collection of RL verifiable data for NeMo Gym • 22 items • Updated about 20 hours ago • 59

upvoted a collection 3 months ago

BFS-Prover

LLM Step-Provers in Lean4 • 5 items • Updated Oct 7, 2025 • 8

upvoted 3 papers 3 months ago

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published Feb 23, 2025 • 14

Learning to Repair Lean Proofs from Compiler Feedback

Paper • 2602.02990 • Published Feb 3 • 29

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 74

upvoted a paper 4 months ago

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 103

upvoted an article 4 months ago

Article

Open Responses: What you need to know

+2

evalstate, burtenshaw, merve, pcuenq

•

Jan 15

• 111

upvoted 3 papers 4 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 110

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published Aug 28, 2025 • 37

upvoted a collection 4 months ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 190

upvoted an article 5 months ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

NormalUhr

•

Aug 9, 2025

• 119