CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents Paper • 2606.22883 • Published 3 days ago • 31
Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding Paper • 2606.21906 • Published 5 days ago • 20
EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory Paper • 2606.21649 • Published 6 days ago • 27
OpenRath: Session-Centered Runtime State for Agent Systems Paper • 2606.19409 • Published 8 days ago • 71
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 4 days ago • 85
DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams Paper • 2606.21337 • Published 6 days ago • 68
Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention Paper • 2606.20945 • Published 7 days ago • 59
EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 3 days ago • 71
KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking Paper • 2606.22807 • Published 3 days ago • 42
Tmax Collection Data and models associated with "Tmax: A simple recipe for terminal agents". paper: https://arxiv.org/abs/2606.23321 • 23 items • Updated 2 days ago • 10
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 8 days ago • 46
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts Paper • 2606.18967 • Published 8 days ago • 24
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL Paper • 2606.19162 • Published 8 days ago • 20
Learning from the Self-future: On-policy Self-distillation for dLLMs Paper • 2606.18195 • Published 9 days ago • 74
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Paper • 2606.09030 • Published 17 days ago • 30
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation Paper • 2606.17628 • Published 9 days ago • 27
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 9 days ago • 55
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 9 days ago • 203
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 9 days ago • 60