Gemma 4 26B A4B — Opus 4.6 Reasoning Distillation
Fine-tuned version of google/gemma-4-26B-A4B-it on high-quality reasoning traces distilled from Claude Opus 4.6.
Training Details
- Base model: google/gemma-4-26B-A4B-it (MoE, 3.8B active params)
- Dataset: Crownelius/Opus-4.6-Reasoning-3300x
- Method: LoRA SFT (r=16, alpha=16)
- Training: 1 epoch, 2160 examples, 540 steps
- Hardware: NVIDIA H100 80GB
- Framework: Unsloth + TRL
- Final loss: 0.751
Quantization
| File | Size | Description |
|---|---|---|
gemma-4-26B-A4B-it.Q3_K_S.gguf |
12.2 GB | Q3_K_S — fits in 16GB VRAM |
gemma-4-26B-A4B-it.BF16-mmproj.gguf |
~1.2 GB | Vision encoder (required for multimodal) |
Usage with llama-server
llama-server \
--model gemma-4-26B-A4B-it.Q3_K_S.gguf \
--mmproj gemma-4-26B-A4B-it.BF16-mmproj.gguf \
--port 8080 \
--n-gpu-layers 99 \
--ctx-size 32768 \
--flash-attn on \
--cache-type-k q4_0 \
--cache-type-v q4_0
What improved
- Structured step-by-step reasoning (thinking before answering)
- More precise responses on complex multi-step tasks
- Better mathematical and algorithmic problem solving
- Consistent
<think>/ answer formatting
Hardware Requirements
- Minimum: 14GB VRAM (RTX 3090, RTX 4080, RTX 5060 Ti 16GB)
- Recommended context: 32K tokens with KV cache q4_0
- Total VRAM usage: ~13GB at 32K context
License
Apache 2.0 — same as base model.
- Downloads last month
- 1,452
Hardware compatibility
Log In to add your hardware
3-bit
Model tree for rico03/gemma4-26B-opus-reasoning-GGUF
Base model
google/gemma-4-26B-A4B-it