Encoders vs Decoders: the Ettin Suite Collection A collection of SOTA, open-data, paired encoder-only and decoder only models ranging from 17M params to 1B. See the paper at https://arxiv.org/abs/250 • 30 items • Updated Mar 2 • 30
view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries sionic-ai • Dec 22, 2025 • 11
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 16 • 71
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 9 • 60
view article Article Introducing Storage Buckets on the Hugging Face Hub +10 Wauplin, coyotte508, XciD, victor, julien-c, lhoestq, pierric, Sylvestre, hlarcher, rajatarya, seanses, assafvayner • Mar 10 • 195
ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 5 days ago • 22
Bharat-NanoBEIR: Indian Language Retrieval Benchmarks Collection NanoBEIR retrieval benchmarks translated into 22 Indian languages across 13 datasets. • 22 items • Updated Dec 13, 2025 • 5
CoRNStack Collection State-of-the-art code retrieval and re-ranking models and datasets • 9 items • Updated Mar 26, 2025 • 21
view article Article ModernVBERT: Towards Smaller Visual Document Retrievers paultltc • Oct 3, 2025 • 46
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 16 items • Updated Mar 2 • 17
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 172
view article Article Streaming datasets: 100x More Efficient +3 andito, lhoestq, burtenshaw, pcuenq, merve • Oct 27, 2025 • 86
view article Article Provence: efficient and robust context pruning for retrieval-augmented generation nadiinchi • Jan 28, 2025 • 26
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning +2 Wauplin, celinah, lysandre, julien-c • Oct 27, 2025 • 75