Qwen3.5-27B Claude 4.6 Opus Reasoning Distilled v2 โ€” GGUF

Quantized by SolidRusT Networks

IQ4_XS quantization of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 using mradermacher's imatrix calibration data.

What This Is

A 27B parameter model reasoning-distilled from Claude 4.6 Opus, quantized to IQ4_XS with importance matrix for optimal quality/size tradeoff. The v2 training improves tool calling accuracy by 31.6% over v1 on quantized models.

Files

File Size Description
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.IQ4_XS.gguf 14.7GB IQ4_XS imatrix quantization

Performance

Tested on dual AMD Radeon RX 7900 XTX (2ร— 24GB VRAM):

  • ~30 tok/sec generation
  • 131K context window
  • Tool calling confirmed working

Usage

llama.cpp

```bash llama-server
-m Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.IQ4_XS.gguf
--host 0.0.0.0 --port 8080
-c 131072 -ngl 99
--think ```

vLLM

Not recommended โ€” use the FP8 variant for vLLM.

Quantization Details

Credits

Downloads last month
1,574
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for solidrust/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF