Qwen3.6 35B A3B — Opus 4.6 Reasoning Distillation (GGUF)

Fine-tuned version of Qwen/Qwen3.6-35B-A3B — one of the strongest open-weight agentic coding models — on high-quality reasoning traces distilled from Claude Opus 4.6 at maximum reasoning effort.

Qwen3.6-35B-A3B scores 73.4% on SWE-bench Verified and 51.5% on Terminal-Bench 2.0, outperforming Gemma4-31B and Qwen3.5-35B on agentic coding tasks. This fine-tune adds structured Claude-style reasoning on top of an already exceptional base.

Why this model?

  • Base model is SOTA for agentic coding — Qwen3.6 beats Gemma4-31B on SWE-bench by 21 points
  • Reasoning distillation from Claude Opus 4.6 — learns explicit structured thinking before answering
  • MoE efficiency — only 3B parameters active at inference, fast on consumer hardware
  • Multiple quantizations — from 11GB (IQ2_M) to 25GB (Q5_K_M), imatrix-optimized
  • No degradation — fine-tuning preserves base model mathematical capabilities

Benchmark Results

⚠️ These are informal benchmarks run by the author, not official evaluations. Results on 30 samples may not be representative of full dataset performance.

GSM8K (Mathematical Reasoning, 30 samples, IQ4_NL quantization)

Model Correct Accuracy Notes
Qwen3.6-35B-A3B Base 29/30 96.7% Limited by 1024 token budget
Qwen3.6-35B-A3B Opus (this model) 29/30 96.7% Limited by 1024 token budget

Both models were limited by the 1024 token budget due to Qwen3.6's extended thinking mode.

The fine-tune is expected to improve on reasoning structure, multi-step planning and agentic coding tasks — areas not covered by GSM8K.

Available Quantizations

File Size VRAM Type Quality
Qwen3.6-35B-A3B-Opus-IQ2_M.gguf 11.66 GB 16GB ✅ imatrix ★★★☆☆
Qwen3.6-35B-A3B-Opus-IQ3_XXS.gguf 13.62 GB 16GB ✅ imatrix ★★★★☆
Qwen3.6-35B-A3B-Opus-IQ3_M.gguf 15.44 GB 16GB ✅ imatrix ★★★★☆
Qwen3.6-35B-A3B-Opus-IQ4_XS.gguf 18.73 GB 24GB imatrix ★★★★★
Qwen3.6-35B-A3B-Opus-IQ4_NL.gguf 19.78 GB 24GB imatrix ★★★★★
Qwen3.6-35B-A3B-Opus-Q4_K_S.gguf 19.89 GB 24GB standard ★★★★☆
Qwen3.6-35B-A3B-Opus-Q5_K_M.gguf 24.73 GB 32GB standard ★★★★★

IQ quantizations use importance matrix calibration (groups_merged.txt) for superior quality at smaller file sizes compared to standard K-quants.

Which quantization should I use?

  • 16GB VRAMIQ3_M (best quality that fits) or IQ3_XXS (more context headroom)
  • 24GB VRAMIQ4_NL (near-lossless, recommended) or IQ4_XS (slightly smaller)
  • 32GB VRAMQ5_K_M (highest quality)

Training Details

  • Base model: Qwen/Qwen3.6-35B-A3B (35B total, 3B active, MoE, 256 experts)
  • Method: QLoRA (r=16, alpha=16, nf4 4-bit quantization)
  • Datasets:
  • Total examples: ~3046 reasoning traces from Claude Opus 4.6 at maximum reasoning effort
  • Epochs: 1
  • Final loss: ~0.64
  • Hardware: NVIDIA H100 NVL 94GB VRAM
  • Framework: HuggingFace TRL + PEFT
  • Quantization: llama.cpp with imatrix calibration (groups_merged.txt)

Usage with llama-server

llama-server \
  --model Qwen3.6-35B-A3B-Opus-IQ3_M.gguf \
  --port 8080 \
  --n-gpu-layers 99 \
  --ctx-size 32768 \
  --flash-attn on \
  --cache-type-k q4_0 \
  --cache-type-v q4_0 \
  --jinja

Recommended Sampling Parameters

temperature: 1.0
top_p: 0.95
top_k: 20

What this fine-tune adds

  • Structured reasoning — explicit <think> blocks before answering, distilled from Claude Opus 4.6 style
  • Multi-step problem decomposition — breaks complex tasks into clear reasoning steps
  • Improved mathematical reasoning — more consistent step-by-step working
  • Better response consistency — more predictable output format
  • No capability degradation — base model performance preserved on standard benchmarks

Full precision model

The merged BF16 safetensors (~65GB) are available at rico03/qwen36-35B-opus-reasoning-merged for those who want to run their own quantizations, use with vLLM/Transformers, or perform further fine-tuning.

License

Apache 2.0 — same as base model Qwen/Qwen3.6-35B-A3B. ```

Downloads last month
1,334
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rico03/Qwen3.6-35B-Opus-Reasoning-GGUF

Adapter
(6)
this model

Datasets used to train rico03/Qwen3.6-35B-Opus-Reasoning-GGUF