Gemma 4 26B A4B — Opus 4.6 Reasoning Distillation

Fine-tuned version of google/gemma-4-26B-A4B-it on high-quality reasoning traces distilled from Claude Opus 4.6.

Training Details

  • Base model: google/gemma-4-26B-A4B-it (MoE, 3.8B active params)
  • Dataset: Crownelius/Opus-4.6-Reasoning-3300x
  • Method: LoRA SFT (r=16, alpha=16)
  • Training: 1 epoch, 2160 examples, 540 steps
  • Hardware: NVIDIA H100 80GB
  • Framework: Unsloth + TRL
  • Final loss: 0.751

Quantization

File Size Description
gemma-4-26B-A4B-it.Q3_K_S.gguf 12.2 GB Q3_K_S — fits in 16GB VRAM
gemma-4-26B-A4B-it.BF16-mmproj.gguf ~1.2 GB Vision encoder (required for multimodal)

Usage with llama-server

llama-server \
  --model gemma-4-26B-A4B-it.Q3_K_S.gguf \
  --mmproj gemma-4-26B-A4B-it.BF16-mmproj.gguf \
  --port 8080 \
  --n-gpu-layers 99 \
  --ctx-size 32768 \
  --flash-attn on \
  --cache-type-k q4_0 \
  --cache-type-v q4_0

What improved

  • Structured step-by-step reasoning (thinking before answering)
  • More precise responses on complex multi-step tasks
  • Better mathematical and algorithmic problem solving
  • Consistent <think> / answer formatting

Hardware Requirements

  • Minimum: 14GB VRAM (RTX 3090, RTX 4080, RTX 5060 Ti 16GB)
  • Recommended context: 32K tokens with KV cache q4_0
  • Total VRAM usage: ~13GB at 32K context

License

Apache 2.0 — same as base model.

Downloads last month
1,452
GGUF
Model size
25B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rico03/gemma4-26B-opus-reasoning-GGUF

Adapter
(39)
this model

Dataset used to train rico03/gemma4-26B-opus-reasoning-GGUF