The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 47
SWE-rebench-V2 Collection SWE-rebench-V2 is a curated dataset of software-engineering tasks derived from real GitHub issues and pull requests. • 3 items • Updated Mar 3 • 12
talkie-13b Collection talkie-1930-13b is a vintage language model trained on pre-1931 English-language text. See https://github.com/talkie-lm/talkie to run talkie. • 3 items • Updated Apr 21 • 53
Ling 2.6 Collection Ling-2.6 series is designed for real-world agents that require fast responses, strong execution, and high token efficiency, with several sized SKUs. • 4 items • Updated 12 days ago • 13
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
Granite 4.1 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated 27 days ago • 53
Gemma 4 Collection Gemma 4 is Google's new model family including including E2B, E4B, 26B-A4B, and 31B. • 28 items • Updated Apr 22 • 193
Qwopus3.5-v3.5/v3 Collection 🌟Qwopus3.5-v3.5 is the latest model in the Claude series. • 14 items • Updated 3 days ago • 104
APEX Quants (GGUF) Collection MoE models quantized with the APEX Quantization technique ( https://github.com/mudler/apex-quant ) • 34 items • Updated 9 days ago • 99
view article Article Arabic TTS Arena: Ranking Voice Models the Way Chess Ranks Grandmasters Navid-AI • Mar 12 • 17