SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History
Abstract
SkillHone enables continuous evolution of agent skills by maintaining persistent decision histories and incorporating practice feedback for improved performance across research and tool-mediated analysis tasks.
Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in bounded runs and retain only the final artifact, discarding the decision history that later agents need to interpret prior revisions, evaluations, and rejected alternatives. We introduce SkillHone, a harness for continual agent skill evolution grounded in persistent decision history. SkillHone pairs skill revisions with evaluation-side evidence that supplies practice feedback, recording structured histories of diagnoses, revisions, evidence, and outcomes. Role-separated subagents run candidate skills on practice probes with redacted reporting and propose revisions informed by prior decisions, enabling cross-session refinement without rediscovering past rationale. On deep-research benchmarks, SkillHone runs without a pre-integrated search stack and outperforms the commercially backed deep-research agent by 15.8 points on GAIA and 3.2 points on WebWalkerQA-EN, while also exceeding prior skill-evolution methods. We further deploy SkillHone on internal tool-mediated analysis scenarios, where it improves accuracy by an average of 18.8 points across seven settings.
Community
🚀 Excited to share SkillHone, a harness for continual agent skill evolution through persistent decision history.
The core idea is simple: agent skills should not only keep the final optimized artifact, but also preserve the decision history behind each revision — diagnoses, rejected alternatives, evaluation evidence, and outcomes. This allows later agents to continue improving a skill across sessions instead of rediscovering the same failures. 🧠
In our implementation, SkillHone uses role-separated optimization/evaluation agents and redacted practice feedback to evolve portable skills. On deep-research benchmarks, SkillHone improves over prior skill-evolution methods and performs strongly in raw open-web settings without relying on a pre-integrated search stack. 🔁
Links:
📄 arXiv: https://arxiv.org/abs/2606.08671
🌐 Project page: https://zwlijay.github.io/SkillHone-Project
🛠️ Skills: https://github.com/Tencent/SkillHone
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision (2026)
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills (2026)
- Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses (2026)
- SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories (2026)
- SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution (2026)
- CODESKILL: Learning Self-Evolving Skills for Coding Agents (2026)
- SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.08671 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper