Gemma 4 26B A4B — Opus 4.6 Reasoning Distillation

Fine-tuned version of google/gemma-4-26B-A4B-it on high-quality reasoning traces distilled from Claude Opus 4.6.

Training Details

Base model: google/gemma-4-26B-A4B-it (MoE, 3.8B active params)
Dataset: Crownelius/Opus-4.6-Reasoning-3300x
Method: LoRA SFT (r=16, alpha=16)
Training: 1 epoch, 2160 examples, 540 steps
Hardware: NVIDIA H100 80GB
Framework: Unsloth + TRL
Final loss: 0.751

Quantization

File	Size	Description
`gemma-4-26B-A4B-it.Q3_K_S.gguf`	12.2 GB	Q3_K_S — fits in 16GB VRAM
`gemma-4-26B-A4B-it.BF16-mmproj.gguf`	~1.2 GB	Vision encoder (required for multimodal)

Usage with llama-server

llama-server \
  --model gemma-4-26B-A4B-it.Q3_K_S.gguf \
  --mmproj gemma-4-26B-A4B-it.BF16-mmproj.gguf \
  --port 8080 \
  --n-gpu-layers 99 \
  --ctx-size 32768 \
  --flash-attn on \
  --cache-type-k q4_0 \
  --cache-type-v q4_0

What improved

Structured step-by-step reasoning (thinking before answering)
More precise responses on complex multi-step tasks
Better mathematical and algorithmic problem solving
Consistent <think> / answer formatting

Hardware Requirements

Minimum: 14GB VRAM (RTX 3090, RTX 4080, RTX 5060 Ti 16GB)
Recommended context: 32K tokens with KV cache q4_0
Total VRAM usage: ~13GB at 32K context

License

Apache 2.0 — same as base model.

Downloads last month: 1,452

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

3-bit

Model tree for rico03/gemma4-26B-opus-reasoning-GGUF

Base model

google/gemma-4-26B-A4B-it

Adapter

(39)

this model

rico03
/

gemma4-26B-opus-reasoning-GGUF