These are MXFP4 quantizations of the model Qwen3.6-35B-A3B

Quick Start

Download the latest release of llama.cpp.
Download your preferred model variant from below.
For the mmproj file, it is recommended to use the F32 version for the best visual processing results. F32 > BF16 > F16

Which version should I choose?

All variants use MXFP4 for the MoE (Mixture of Experts) weights to keep the model efficient. The difference lies in how the remaining tensors are handled:

Variant	Quality	Performance	Size	Recommendation
BF16	⭐⭐⭐	Variable*	20.55GiB	Best for maximum accuracy; original unquantized weights.
F16	⭐⭐	Fast	20.55GiB	Great alternative if BF16 is slow on your hardware.
Q8	⭐	Fastest	18.88GiB	Balanced performance and memory usage.

**Note: On some older architectures, BF16 may be slower than F16. Check that your GPU supports native BF16 *

Read the guide from unsloth in order to set up the model's recommended settings:
Qwen3.6 - How to Run Locally Guide

Downloads last month: 2,721

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

4-bit

Model tree for noctrex/Qwen3.6-35B-A3B-MXFP4_MOE-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

(161)

this model