Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
Suchir Salhan's picture

Suchir Salhan

suchirsalhan
1 13 2
21world's profile picture Gargaz's profile picture dianags's profile picture
·
https://www.suchirsalhan.com/
  • suchirsalhan
  • suchirsalhan
  • ssalhan

AI & ML interests

Multilinguality and Cognitively-Inspired AI. Tokenization, Pretraining, Interpretability & Alignment.

Recent Activity

updated a model 2 days ago
Beetle-FineWeb-100M/beetle-bilingual-l2-50-sequential-33-67-b3-fineweb-100m-isl-eng-1xa100
published a model 2 days ago
Beetle-FineWeb-100M/beetle-bilingual-l2-50-sequential-33-67-b3-fineweb-100m-isl-eng-1xa100
updated a model 2 days ago
Beetle-FineWeb-100M/beetle-bilingual-l2-50-simultaneous-b2-fineweb-100m-isl-eng-1xa100
View all activity

Organizations

SomosNLP's profile picture CLIMB's profile picture ALTA's profile picture CLIMB-MAO's profile picture Pico Language Model's profile picture ADA-LM's profile picture Looking to Learn's profile picture Bayesian-KD's profile picture Cambridge-KAIST2's profile picture BabyLM Challenge's profile picture ByteSpan Tokenisers's profile picture BabyLM Sequence Length's profile picture ContingentChat's profile picture LangMAP's profile picture Beetles's profile picture RA at ALTA's profile picture BrainAlign's profile picture Beetle-Data's profile picture Beetle-HumanScale's profile picture Beetle-FineWeb's profile picture Beetle-FineWeb2-24B's profile picture Beetle-FineWeb-2B's profile picture Beetle-FineWeb-100M's profile picture Beetle Architectural Variants's profile picture Beetle-FineWeb3-24B's profile picture xBLiMPs's profile picture Cognitive-Tokenization-Evaluation's profile picture

suchirsalhan 's datasets 9

suchirsalhan/kidalign-llama-filterable

Viewer • Updated Apr 14 • 97.6k • 15

suchirsalhan/kidalign-llama-3.1-8B-Instruct

Updated Apr 14 • 136

suchirsalhan/babylm-detox

Viewer • Updated Apr 8 • 11.6M • 32

suchirsalhan/gptbert-tokenised

Updated Jul 24, 2025 • 2

suchirsalhan/Phonemized-UD

Viewer • Updated May 30, 2025 • 1.19M • 342

suchirsalhan/BabyLM-Pretokenised

Viewer • Updated Jan 31, 2025 • 1.64M • 15

suchirsalhan/MAO-CHILDES

Viewer • Updated Apr 11, 2024 • 3.81M • 28

suchirsalhan/CLiMP

Preview • Updated Apr 2, 2024 • 36 • 1

suchirsalhan/SLING

Viewer • Updated Apr 2, 2024 • 40k • 97
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs