Qwen3.6 35B A3B — Opus 4.6 Reasoning Distillation (GGUF)
Fine-tuned version of Qwen/Qwen3.6-35B-A3B — one of the strongest open-weight agentic coding models — on high-quality reasoning traces distilled from Claude Opus 4.6 at maximum reasoning effort.
Qwen3.6-35B-A3B scores 73.4% on SWE-bench Verified and 51.5% on Terminal-Bench 2.0, outperforming Gemma4-31B and Qwen3.5-35B on agentic coding tasks. This fine-tune adds structured Claude-style reasoning on top of an already exceptional base.
Why this model?
- Base model is SOTA for agentic coding — Qwen3.6 beats Gemma4-31B on SWE-bench by 21 points
- Reasoning distillation from Claude Opus 4.6 — learns explicit structured thinking before answering
- MoE efficiency — only 3B parameters active at inference, fast on consumer hardware
- Multiple quantizations — from 11GB (IQ2_M) to 25GB (Q5_K_M), imatrix-optimized
- No degradation — fine-tuning preserves base model mathematical capabilities
Benchmark Results
⚠️ These are informal benchmarks run by the author, not official evaluations. Results on 30 samples may not be representative of full dataset performance.
GSM8K (Mathematical Reasoning, 30 samples, IQ4_NL quantization)
| Model | Correct | Accuracy | Notes |
|---|---|---|---|
| Qwen3.6-35B-A3B Base | 29/30 | 96.7% | Limited by 1024 token budget |
| Qwen3.6-35B-A3B Opus (this model) | 29/30 | 96.7% | Limited by 1024 token budget |
Both models were limited by the 1024 token budget due to Qwen3.6's extended thinking mode.
The fine-tune is expected to improve on reasoning structure, multi-step planning and agentic coding tasks — areas not covered by GSM8K.
Available Quantizations
| File | Size | VRAM | Type | Quality |
|---|---|---|---|---|
Qwen3.6-35B-A3B-Opus-IQ2_M.gguf |
11.66 GB | 16GB ✅ | imatrix | ★★★☆☆ |
Qwen3.6-35B-A3B-Opus-IQ3_XXS.gguf |
13.62 GB | 16GB ✅ | imatrix | ★★★★☆ |
Qwen3.6-35B-A3B-Opus-IQ3_M.gguf |
15.44 GB | 16GB ✅ | imatrix | ★★★★☆ |
Qwen3.6-35B-A3B-Opus-IQ4_XS.gguf |
18.73 GB | 24GB | imatrix | ★★★★★ |
Qwen3.6-35B-A3B-Opus-IQ4_NL.gguf |
19.78 GB | 24GB | imatrix | ★★★★★ |
Qwen3.6-35B-A3B-Opus-Q4_K_S.gguf |
19.89 GB | 24GB | standard | ★★★★☆ |
Qwen3.6-35B-A3B-Opus-Q5_K_M.gguf |
24.73 GB | 32GB | standard | ★★★★★ |
IQ quantizations use importance matrix calibration (groups_merged.txt) for superior quality at smaller file sizes compared to standard K-quants.
Which quantization should I use?
- 16GB VRAM →
IQ3_M(best quality that fits) orIQ3_XXS(more context headroom) - 24GB VRAM →
IQ4_NL(near-lossless, recommended) orIQ4_XS(slightly smaller) - 32GB VRAM →
Q5_K_M(highest quality)
Training Details
- Base model: Qwen/Qwen3.6-35B-A3B (35B total, 3B active, MoE, 256 experts)
- Method: QLoRA (r=16, alpha=16, nf4 4-bit quantization)
- Datasets:
- Crownelius/Opus-4.6-Reasoning-3300x — ~2160 examples
- TeichAI/Claude-Opus-4.6-Reasoning-887x — ~886 examples
- Total examples: ~3046 reasoning traces from Claude Opus 4.6 at maximum reasoning effort
- Epochs: 1
- Final loss: ~0.64
- Hardware: NVIDIA H100 NVL 94GB VRAM
- Framework: HuggingFace TRL + PEFT
- Quantization: llama.cpp with imatrix calibration (groups_merged.txt)
Usage with llama-server
llama-server \
--model Qwen3.6-35B-A3B-Opus-IQ3_M.gguf \
--port 8080 \
--n-gpu-layers 99 \
--ctx-size 32768 \
--flash-attn on \
--cache-type-k q4_0 \
--cache-type-v q4_0 \
--jinja
Recommended Sampling Parameters
temperature: 1.0
top_p: 0.95
top_k: 20
What this fine-tune adds
- Structured reasoning — explicit
<think>blocks before answering, distilled from Claude Opus 4.6 style - Multi-step problem decomposition — breaks complex tasks into clear reasoning steps
- Improved mathematical reasoning — more consistent step-by-step working
- Better response consistency — more predictable output format
- No capability degradation — base model performance preserved on standard benchmarks
Full precision model
The merged BF16 safetensors (~65GB) are available at rico03/qwen36-35B-opus-reasoning-merged for those who want to run their own quantizations, use with vLLM/Transformers, or perform further fine-tuning.
License
Apache 2.0 — same as base model Qwen/Qwen3.6-35B-A3B. ```
- Downloads last month
- 1,334
2-bit
3-bit
4-bit
5-bit
Model tree for rico03/Qwen3.6-35B-Opus-Reasoning-GGUF
Base model
Qwen/Qwen3.6-35B-A3B