pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated Feb 26 • 96
view article Article Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture tiiuae • Jan 5 • 40
view article Article We Got Claude to Fine-Tune an Open Source LLM burtenshaw, evalstate • Dec 4, 2025 • 624
VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 8 items • Updated Mar 2 • 244
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 78
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Oct 7, 2025 • 67
💧 LFM2 Collection LFM2 is a new generation of hybrid models, designed for on-device deployment. • 28 items • Updated Apr 8 • 154
Kimi-K2 Collection Moonshot's MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence • 5 items • Updated Jan 27 • 173
view article Article Open R1: How to use OlympicCoder locally for coding +3 burtenshaw, reach-vb, lewtun, edbeeching, yagilb • Mar 20, 2025 • 63
view article Article Open-source DeepResearch – Freeing our search agents +3 m-ric, albertvillanova, merve, thomwolf, clefourrier • Feb 4, 2025 • 1.32k
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper • 2306.01116 • Published Jun 1, 2023 • 45