Qwen3.6-35B-A3B GGUF (AutoRound Quantized)
This repository contains GGUF quantized versions of Qwen/Qwen3.6-35B-A3B created using Intel's AutoRound quantization method.
Quantization Details
The models were quantized using various schemes provided by the auto-round tool. For better compatibility and smaller size, we provide unified multimodal projector (mmproj) files in F16, BF16, and F32 formats.
Files and Sizes
| File Name | Quant Type | Size | Description |
|---|---|---|---|
Qwen3.6-35B-A3B-Q2_K_S.gguf |
Q2_K_S | 11 GB | Extremely high compression, significant quality loss. |
Qwen3.6-35B-A3B-Q2_K_MIXED.gguf |
Q2_K_MIXED | 12 GB | Recommended high-compression option. Good quality. |
Qwen3.6-35B-A3B-Q3_K_S.gguf |
Q3_K_S | 15 GB | Very high compression, notable quality loss. |
Qwen3.6-35B-A3B-Q3_K_M.gguf |
Q3_K_M | 15 GB | Balanced 3-bit quantization. |
Qwen3.6-35B-A3B-Q3_K_L.gguf |
Q3_K_L | 15 GB | High quality 3-bit quantization. |
Qwen3.6-35B-A3B-Q4_0.gguf |
Q4_0 | 19 GB | Standard 4-bit quantization, good balance. |
Qwen3.6-35B-A3B-Q4_1.gguf |
Q4_1 | 21 GB | Higher quality 4-bit quantization than Q4_0. |
Qwen3.6-35B-A3B-Q4_K_S.gguf |
Q4_K_S | 19 GB | Small 4-bit K-quant, good efficiency. |
Qwen3.6-35B-A3B-Q4_K_M.gguf |
Q4_K_M | 19 GB | Recommended 4-bit K-quant, excellent balance. |
Qwen3.6-35B-A3B-Q5_0.gguf |
Q5_0 | 23 GB | Standard 5-bit quantization, very high quality. |
Qwen3.6-35B-A3B-Q5_1.gguf |
Q5_1 | 25 GB | Higher quality 5-bit quantization than Q5_0. |
Qwen3.6-35B-A3B-Q5_K_S.gguf |
Q5_K_S | 23 GB | Small 5-bit K-quant, very high quality. |
Qwen3.6-35B-A3B-Q5_K_M.gguf |
Q5_K_M | 23 GB | Recommended 5-bit K-quant, near-lossless. |
Qwen3.6-35B-A3B-Q6_K.gguf |
Q6_K | 27 GB | 6-bit K-quant, virtually indistinguishable from F16. |
Qwen3.6-35B-A3B-Q8_0.gguf |
Q8_0 | 35 GB | 8-bit quantization, near-lossless. |
mmproj-model-f16.gguf |
F16 | 858 MB | Unified Projector in Float16 format. |
mmproj-model-bf16.gguf |
BF16 | 861 MB | Unified Projector in BFloat16 format. |
mmproj-model-f32.gguf |
F32 | 1.7 GB | Unified Projector in Float32 format. |
Generate the Model
The models were generated using Intel's AutoRound with the following command:
auto-round --model Qwen/Qwen3.6-35B-A3B --output_dir ./quantized/ --scheme <SCHEME> --iters 0
Usage with llama.cpp
These models can be used with llama.cpp. For multimodal usage, you must specify the projector file:
./llama-cli -m Qwen3.6-35B-A3B-Q4_K_M.gguf --mmproj mmproj-model-f16.gguf --image your_image.jpg -p "Describe this image."
About AutoRound
AutoRound is an advanced quantization technique from Intel that aims to minimize accuracy loss through automated rounding optimization.
- Downloads last month
- 322
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for sphaela/Qwen3.6-35B-A3B-AutoRound-GGUF
Base model
Qwen/Qwen3.6-35B-A3B