KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs Paper β’ 2604.13226 β’ Published 7 days ago β’ 9
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data Paper β’ 2604.14164 β’ Published 29 days ago β’ 34
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration Paper β’ 2604.14116 β’ Published 6 days ago β’ 13
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper β’ 2604.13151 β’ Published 7 days ago β’ 24
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space Paper β’ 2604.14142 β’ Published 6 days ago β’ 27
WTF GENIUS PAPERS Collection Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models. β’ 108 items β’ Updated 6 days ago β’ 14
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models Paper β’ 2601.18734 β’ Published Jan 26 β’ 5
Towards Active Synthetic Data Generation for Finetuning Language Models Paper β’ 2512.00884 β’ Published Nov 30, 2025 β’ 1
WTF GENIUS PAPERS Collection Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models. β’ 108 items β’ Updated 6 days ago β’ 14
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities Paper β’ 2501.12147 β’ Published Jan 21, 2025 β’ 1
WTF GENIUS PAPERS Collection Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models. β’ 108 items β’ Updated 6 days ago β’ 14
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation Paper β’ 2402.18191 β’ Published Feb 28, 2024 β’ 1
SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking Paper β’ 2406.10882 β’ Published Jun 16, 2024 β’ 2