arxiv:2606.08671

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

Published on Jun 23

· Submitted by

Zhiwei Li on Jul 1

Tencent

Upvote

Authors:

Zhiwei Li ,

Abstract

SkillHone enables continuous evolution of agent skills by maintaining persistent decision histories and incorporating practice feedback for improved performance across research and tool-mediated analysis tasks.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in bounded runs and retain only the final artifact, discarding the decision history that later agents need to interpret prior revisions, evaluations, and rejected alternatives. We introduce SkillHone, a harness for continual agent skill evolution grounded in persistent decision history. SkillHone pairs skill revisions with evaluation-side evidence that supplies practice feedback, recording structured histories of diagnoses, revisions, evidence, and outcomes. Role-separated subagents run candidate skills on practice probes with redacted reporting and propose revisions informed by prior decisions, enabling cross-session refinement without rediscovering past rationale. On deep-research benchmarks, SkillHone runs without a pre-integrated search stack and outperforms the commercially backed deep-research agent by 15.8 points on GAIA and 3.2 points on WebWalkerQA-EN, while also exceeding prior skill-evolution methods. We further deploy SkillHone on internal tool-mediated analysis scenarios, where it improves accuracy by an average of 18.8 points across seven settings.

View arXiv page View PDF Project page GitHub 15 Add to collection

Community

AndeyTait

Paper author Paper submitter 1 day ago

🚀 Excited to share SkillHone, a harness for continual agent skill evolution through persistent decision history.

The core idea is simple: agent skills should not only keep the final optimized artifact, but also preserve the decision history behind each revision — diagnoses, rejected alternatives, evaluation evidence, and outcomes. This allows later agents to continue improving a skill across sessions instead of rediscovering the same failures. 🧠

In our implementation, SkillHone uses role-separated optimization/evaluation agents and redacted practice feedback to evolve portable skills. On deep-research benchmarks, SkillHone improves over prior skill-evolution methods and performs strongly in raw open-web settings without relying on a pre-integrated search stack. 🔁

Links:
📄 arXiv: https://arxiv.org/abs/2606.08671
🌐 Project page: https://zwlijay.github.io/SkillHone-Project
🛠️ Skills: https://github.com/Tencent/SkillHone