RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator Paper • 2605.21748 • Published 11 days ago • 14
MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal Paper • 2605.07249 • Published 23 days ago • 3
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation Paper • 2605.11739 • Published 18 days ago • 59
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization Paper • 2605.10780 • Published 19 days ago • 33
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 24 days ago • 111
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 25 days ago • 101
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 28 days ago • 166
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models Paper • 2604.16593 • Published Apr 17 • 6
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published Mar 25 • 28
Understanding the Challenges in Iterative Generative Optimization with LLMs Paper • 2603.23994 • Published Mar 25 • 29
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 248