-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 110 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
Jianhong Wang
hsvgbkhgbv
AI & ML interests
multi-agent reinforcement learning,
ad hoc teamwork,
robust reinforcement learning
Recent Activity
upvoted a paper about 14 hours ago
Towards Long-horizon Agentic Multimodal Search upvoted a paper about 14 hours ago
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space updated a collection 8 days ago
LLM papersOrganizations
None yet