Qwen3.6 35B A3B — Opus 4.6 Reasoning Distillation (GGUF)

Fine-tuned version of Qwen/Qwen3.6-35B-A3B — one of the strongest open-weight agentic coding models — on high-quality reasoning traces distilled from Claude Opus 4.6 at maximum reasoning effort.

Qwen3.6-35B-A3B scores 73.4% on SWE-bench Verified and 51.5% on Terminal-Bench 2.0, outperforming Gemma4-31B and Qwen3.5-35B on agentic coding tasks. This fine-tune adds structured Claude-style reasoning on top of an already exceptional base.

Why this model?

Base model is SOTA for agentic coding — Qwen3.6 beats Gemma4-31B on SWE-bench by 21 points
Reasoning distillation from Claude Opus 4.6 — learns explicit structured thinking before answering
MoE efficiency — only 3B parameters active at inference, fast on consumer hardware
Multiple quantizations — from 11GB (IQ2_M) to 25GB (Q5_K_M), imatrix-optimized
No degradation — fine-tuning preserves base model mathematical capabilities

Benchmark Results

⚠️ These are informal benchmarks run by the author, not official evaluations. Results on 30 samples may not be representative of full dataset performance.

GSM8K (Mathematical Reasoning, 30 samples, IQ4_NL quantization)

Model	Correct	Accuracy	Notes
Qwen3.6-35B-A3B Base	29/30	96.7%	Limited by 1024 token budget
Qwen3.6-35B-A3B Opus (this model)	29/30	96.7%	Limited by 1024 token budget

Both models were limited by the 1024 token budget due to Qwen3.6's extended thinking mode.

The fine-tune is expected to improve on reasoning structure, multi-step planning and agentic coding tasks — areas not covered by GSM8K.

Available Quantizations

File	Size	VRAM	Type	Quality
`Qwen3.6-35B-A3B-Opus-IQ2_M.gguf`	11.66 GB	16GB ✅	imatrix	★★★☆☆
`Qwen3.6-35B-A3B-Opus-IQ3_XXS.gguf`	13.62 GB	16GB ✅	imatrix	★★★★☆
`Qwen3.6-35B-A3B-Opus-IQ3_M.gguf`	15.44 GB	16GB ✅	imatrix	★★★★☆
`Qwen3.6-35B-A3B-Opus-IQ4_XS.gguf`	18.73 GB	24GB	imatrix	★★★★★
`Qwen3.6-35B-A3B-Opus-IQ4_NL.gguf`	19.78 GB	24GB	imatrix	★★★★★
`Qwen3.6-35B-A3B-Opus-Q4_K_S.gguf`	19.89 GB	24GB	standard	★★★★☆
`Qwen3.6-35B-A3B-Opus-Q5_K_M.gguf`	24.73 GB	32GB	standard	★★★★★

IQ quantizations use importance matrix calibration (groups_merged.txt) for superior quality at smaller file sizes compared to standard K-quants.

Which quantization should I use?

16GB VRAM → IQ3_M (best quality that fits) or IQ3_XXS (more context headroom)
24GB VRAM → IQ4_NL (near-lossless, recommended) or IQ4_XS (slightly smaller)
32GB VRAM → Q5_K_M (highest quality)

Training Details

Base model: Qwen/Qwen3.6-35B-A3B (35B total, 3B active, MoE, 256 experts)
Method: QLoRA (r=16, alpha=16, nf4 4-bit quantization)
Datasets:
- Crownelius/Opus-4.6-Reasoning-3300x — ~2160 examples
- TeichAI/Claude-Opus-4.6-Reasoning-887x — ~886 examples
Total examples: ~3046 reasoning traces from Claude Opus 4.6 at maximum reasoning effort
Epochs: 1
Final loss: ~0.64
Hardware: NVIDIA H100 NVL 94GB VRAM
Framework: HuggingFace TRL + PEFT
Quantization: llama.cpp with imatrix calibration (groups_merged.txt)

Usage with llama-server

llama-server \
  --model Qwen3.6-35B-A3B-Opus-IQ3_M.gguf \
  --port 8080 \
  --n-gpu-layers 99 \
  --ctx-size 32768 \
  --flash-attn on \
  --cache-type-k q4_0 \
  --cache-type-v q4_0 \
  --jinja

Recommended Sampling Parameters

temperature: 1.0
top_p: 0.95
top_k: 20

What this fine-tune adds

Structured reasoning — explicit <think> blocks before answering, distilled from Claude Opus 4.6 style
Multi-step problem decomposition — breaks complex tasks into clear reasoning steps
Improved mathematical reasoning — more consistent step-by-step working
Better response consistency — more predictable output format
No capability degradation — base model performance preserved on standard benchmarks

Full precision model

The merged BF16 safetensors (~65GB) are available at rico03/qwen36-35B-opus-reasoning-merged for those who want to run their own quantizations, use with vLLM/Transformers, or perform further fine-tuning.

License

Apache 2.0 — same as base model Qwen/Qwen3.6-35B-A3B. ```

Downloads last month: 1,334

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

Model tree for rico03/Qwen3.6-35B-Opus-Reasoning-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Adapter

(6)

this model

rico03
/

Qwen3.6-35B-Opus-Reasoning-GGUF