view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • about 1 month ago • 59
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 168
Running on CPU Upgrade Featured 3.2k The Smol Training Playbook 📚 3.2k The secrets to building world-class LLMs
Running 113 Unlocking On-Policy Distillation for Any Model Family 📝 113 Explore on-policy distillation visualization for any model
view article Article Efficient MultiModal Data Pipeline +3 ariG23498, lusxvr, andito, sergiopaniego, pcuenq • Jul 8, 2025 • 72