Qwen3.6-35B-A3B GGUF (AutoRound Quantized)

This repository contains GGUF quantized versions of Qwen/Qwen3.6-35B-A3B created using Intel's AutoRound quantization method.

Quantization Details

The models were quantized using various schemes provided by the auto-round tool. For better compatibility and smaller size, we provide unified multimodal projector (mmproj) files in F16, BF16, and F32 formats.

Files and Sizes

File Name Quant Type Size Description
Qwen3.6-35B-A3B-Q2_K_S.gguf Q2_K_S 11 GB Extremely high compression, significant quality loss.
Qwen3.6-35B-A3B-Q2_K_MIXED.gguf Q2_K_MIXED 12 GB Recommended high-compression option. Good quality.
Qwen3.6-35B-A3B-Q3_K_S.gguf Q3_K_S 15 GB Very high compression, notable quality loss.
Qwen3.6-35B-A3B-Q3_K_M.gguf Q3_K_M 15 GB Balanced 3-bit quantization.
Qwen3.6-35B-A3B-Q3_K_L.gguf Q3_K_L 15 GB High quality 3-bit quantization.
Qwen3.6-35B-A3B-Q4_0.gguf Q4_0 19 GB Standard 4-bit quantization, good balance.
Qwen3.6-35B-A3B-Q4_1.gguf Q4_1 21 GB Higher quality 4-bit quantization than Q4_0.
Qwen3.6-35B-A3B-Q4_K_S.gguf Q4_K_S 19 GB Small 4-bit K-quant, good efficiency.
Qwen3.6-35B-A3B-Q4_K_M.gguf Q4_K_M 19 GB Recommended 4-bit K-quant, excellent balance.
Qwen3.6-35B-A3B-Q5_0.gguf Q5_0 23 GB Standard 5-bit quantization, very high quality.
Qwen3.6-35B-A3B-Q5_1.gguf Q5_1 25 GB Higher quality 5-bit quantization than Q5_0.
Qwen3.6-35B-A3B-Q5_K_S.gguf Q5_K_S 23 GB Small 5-bit K-quant, very high quality.
Qwen3.6-35B-A3B-Q5_K_M.gguf Q5_K_M 23 GB Recommended 5-bit K-quant, near-lossless.
Qwen3.6-35B-A3B-Q6_K.gguf Q6_K 27 GB 6-bit K-quant, virtually indistinguishable from F16.
Qwen3.6-35B-A3B-Q8_0.gguf Q8_0 35 GB 8-bit quantization, near-lossless.
mmproj-model-f16.gguf F16 858 MB Unified Projector in Float16 format.
mmproj-model-bf16.gguf BF16 861 MB Unified Projector in BFloat16 format.
mmproj-model-f32.gguf F32 1.7 GB Unified Projector in Float32 format.

Generate the Model

The models were generated using Intel's AutoRound with the following command:

auto-round --model Qwen/Qwen3.6-35B-A3B --output_dir ./quantized/ --scheme <SCHEME> --iters 0

Usage with llama.cpp

These models can be used with llama.cpp. For multimodal usage, you must specify the projector file:

./llama-cli -m Qwen3.6-35B-A3B-Q4_K_M.gguf --mmproj mmproj-model-f16.gguf --image your_image.jpg -p "Describe this image."

About AutoRound

AutoRound is an advanced quantization technique from Intel that aims to minimize accuracy loss through automated rounding optimization.

Downloads last month
322
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sphaela/Qwen3.6-35B-A3B-AutoRound-GGUF

Quantized
(166)
this model