GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 629
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 77
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 103
Llama 4 Collection Meta's new Llama 4 multimodal models, Scout & Maverick. Includes Dynamic GGUFs, 16-bit & Dynamic 4-bit uploads. Run & fine-tune them with Unsloth! • 15 items • Updated 28 days ago • 57
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization Paper • 2505.12346 • Published May 18, 2025 • 19
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation Paper • 2409.10262 • Published Sep 16, 2024 • 1
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k
view article Article Trace & Evaluate your Agent with Arize Phoenix +1 schavalii, jgilhuly16, m-ric • Feb 28, 2025 • 41
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial open-r1 • Jan 31, 2025 • 51
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19, 2024 • 31
view article Article A failed experiment: Infini-Attention, and why we should keep trying? +1 neuralink, lvwerra, thomwolf • Aug 14, 2024 • 76
view article Article TGI Multi-LoRA: Deploy Once, Serve 30 Models +1 derek-thomas, dmaniloff, drbh • Jul 18, 2024 • 63
view article Article Preference Optimization for Vision Language Models +2 qgallouedec, vwxyzjn, merve, kashif • Jul 10, 2024 • 93