Qwen3.6-35B-A3B GGUF (AutoRound Quantized)

This repository contains GGUF quantized versions of Qwen/Qwen3.6-35B-A3B created using Intel's AutoRound quantization method.

Quantization Details

The models were quantized using various schemes provided by the auto-round tool. For better compatibility and smaller size, we provide unified multimodal projector (mmproj) files in F16, BF16, and F32 formats.

Files and Sizes

File Name	Quant Type	Size	Description
`Qwen3.6-35B-A3B-Q2_K_S.gguf`	Q2_K_S	11 GB	Extremely high compression, significant quality loss.
`Qwen3.6-35B-A3B-Q2_K_MIXED.gguf`	Q2_K_MIXED	12 GB	Recommended high-compression option. Good quality.
`Qwen3.6-35B-A3B-Q3_K_S.gguf`	Q3_K_S	15 GB	Very high compression, notable quality loss.
`Qwen3.6-35B-A3B-Q3_K_M.gguf`	Q3_K_M	15 GB	Balanced 3-bit quantization.
`Qwen3.6-35B-A3B-Q3_K_L.gguf`	Q3_K_L	15 GB	High quality 3-bit quantization.
`Qwen3.6-35B-A3B-Q4_0.gguf`	Q4_0	19 GB	Standard 4-bit quantization, good balance.
`Qwen3.6-35B-A3B-Q4_1.gguf`	Q4_1	21 GB	Higher quality 4-bit quantization than Q4_0.
`Qwen3.6-35B-A3B-Q4_K_S.gguf`	Q4_K_S	19 GB	Small 4-bit K-quant, good efficiency.
`Qwen3.6-35B-A3B-Q4_K_M.gguf`	Q4_K_M	19 GB	Recommended 4-bit K-quant, excellent balance.
`Qwen3.6-35B-A3B-Q5_0.gguf`	Q5_0	23 GB	Standard 5-bit quantization, very high quality.
`Qwen3.6-35B-A3B-Q5_1.gguf`	Q5_1	25 GB	Higher quality 5-bit quantization than Q5_0.
`Qwen3.6-35B-A3B-Q5_K_S.gguf`	Q5_K_S	23 GB	Small 5-bit K-quant, very high quality.
`Qwen3.6-35B-A3B-Q5_K_M.gguf`	Q5_K_M	23 GB	Recommended 5-bit K-quant, near-lossless.
`Qwen3.6-35B-A3B-Q6_K.gguf`	Q6_K	27 GB	6-bit K-quant, virtually indistinguishable from F16.
`Qwen3.6-35B-A3B-Q8_0.gguf`	Q8_0	35 GB	8-bit quantization, near-lossless.
`mmproj-model-f16.gguf`	F16	858 MB	Unified Projector in Float16 format.
`mmproj-model-bf16.gguf`	BF16	861 MB	Unified Projector in BFloat16 format.
`mmproj-model-f32.gguf`	F32	1.7 GB	Unified Projector in Float32 format.

Generate the Model

The models were generated using Intel's AutoRound with the following command:

auto-round --model Qwen/Qwen3.6-35B-A3B --output_dir ./quantized/ --scheme <SCHEME> --iters 0

Usage with llama.cpp

These models can be used with llama.cpp. For multimodal usage, you must specify the projector file:

./llama-cli -m Qwen3.6-35B-A3B-Q4_K_M.gguf --mmproj mmproj-model-f16.gguf --image your_image.jpg -p "Describe this image."

About AutoRound

AutoRound is an advanced quantization technique from Intel that aims to minimize accuracy loss through automated rounding optimization.

Downloads last month: 322

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sphaela/Qwen3.6-35B-A3B-AutoRound-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

(166)

this model