6 4 172

Batuhan S

Ba2han

AI & ML interests

None yet

Recent Activity

updated a model about 8 hours ago

Ba2han/qwen3_from_scratch

updated a dataset 2 days ago

Ba2han/english_corpus

published a dataset 2 days ago

Ba2han/english_corpus

View all activity

Organizations

None yet

updated a model about 8 hours ago

Ba2han/qwen3_from_scratch

Text Generation • 0.5B • Updated about 2 hours ago • 365

updated a dataset 2 days ago

Ba2han/english_corpus

Updated 1 day ago • 51

published a dataset 2 days ago

Ba2han/english_corpus

Updated 1 day ago • 51

liked a model 3 days ago

bytedance-research/Lance

Any-to-Any • Updated 2 days ago • 1.68k • 780

reacted to danielhanchen's post with 🔥🚀 3 days ago

Post

1937

Qwen3.6 MTP is here! Run locally on 20GB RAM. ⚡️

MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.

Qwen3.6-27B: unsloth/Qwen3.6-27B-MTP-GGUF
Qwen3.6-35B-A3B: unsloth/Qwen3.6-35B-A3B-MTP-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

reacted to Crownelius's post with 🔥 4 days ago

Post

4500

Howdy,
CompactAI-O is launching a tiny Model Golf, and the winner walks away with $50 in RunPod credits. Monthly. Every month. Show up, build, somebody wins.

What it is

Build the best language model you can under 100 million parameters, with at least a 1028-token context window. That's it. Any architecture, any tokenizer, any training scheme you can dream up at 3am. The only catch is it's gotta be open source (MIT, GPL, Apache, AGPL) take your pick.

It scratches the same itch as a Kaggle comp without the dataset\leaderboard nonsense. No fixed benchmark to game. No llama.cpp compatibility hoops. If you wanna train a 50M-param MoE with five experts and a tokenizer built on cookbooks, you can do that. Nothing stopping you.

The rules are listed in the discord and on the organization page if you're interested.

Why $50????

It's symbolic. It ain't gonna make anyone rich. But it's enough to cover a weekend of GPU time, enough to keep enthusiasts coming back, and not so much that it pulls in people who are just there for the money. Enthusiasts build interesting things. Interesting things move the field forward. A little incentive. I'd do it for $50 lol.

How to join

First round opens soon. Landing page is here:

→ CompactAI-O/Tiny-model-golf

For questions or to swap ideas, the Discord's open:

→ https://discord.gg/y2jTct6Cxv

Excited to see what yall come up with. ♥

— Shane

8 replies

published a model 5 days ago

Ba2han/qwen3_from_scratch

Text Generation • 0.5B • Updated about 2 hours ago • 365

updated a model 7 days ago

Ba2han/experimental_auto

Text Generation • 0.6B • Updated 7 days ago • 352

liked a model 8 days ago

RunDiffusion/Juggernaut-Z-Image

Text-to-Image • 6B • Updated 12 days ago • 24.7k • 96

reacted to SeaWolf-AI's post with ❤️ 9 days ago

Post

5384

🧬 Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%

How far can we push LLM reasoning *without* training?

Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's
currently #3. Huge thanks to everyone who upvoted — sharing the core ideas below.

🔗 Paper: Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning (2605.14386)
🔗 arXiv: https://arxiv.org/abs/2605.14386
🔗 Model: FINAL-Bench/Darwin-28B-REASON
🔗 Model: FINAL-Bench/Darwin-28B-Opus

---

TL;DR

Darwin Family is a training-free evolutionary merging framework.
By recombining the weight spaces of existing LLM checkpoints — with zero
gradient-based training — it reaches frontier-level reasoning.

- 🏆 Darwin-28B-Opus: GPQA Diamond 88.89%
- 💸 Zero gradient steps — not a single B200 or H200 hour needed
- 🧬 Consistent gains across 4B → 35B scale
- 🔀 Cross-architecture breeding between Transformer and Mamba families
- 🔁 Stable recursive multi-generation evolution

#Three Core Mechanisms

① 14-dim Adaptive Merge Genome — fine-grained recombination at both
component level (Attention / FFN / MLP / LayerNorm / Embedding) and block
level, expanding the prior evolutionary-merge search space.

② MRI-Trust Fusion — we diagnose each layer's reasoning contribution
via an **MRI (Model Reasoning Importance)** signal and fuse it with
evolutionary search through a **learnable trust parameter**. Trust the
diagnostic too much and search collapses; ignore it and search becomes
inefficient — Darwin learns the balance from data.

③ Architecture Mapper — weight-space breeding across heterogeneous
families. Attention × SSM crossover actually works.

Why It Matters
> Diagnose latent capabilities already encoded in open checkpoints,
> and recombine them — no gradients required.

Replies and critiques welcome 🙌