These are MXFP4 quantizations of the model Qwen3.6-35B-A3B
Quick Start
- Download the latest release of llama.cpp.
- Download your preferred model variant from below.
- For the
mmprojfile, it is recommended to use the F32 version for the best visual processing results. F32 > BF16 > F16
Which version should I choose?
All variants use MXFP4 for the MoE (Mixture of Experts) weights to keep the model efficient. The difference lies in how the remaining tensors are handled:
| Variant | Quality | Performance | Size | Recommendation |
|---|---|---|---|---|
| BF16 | ⭐⭐⭐ | Variable* | 20.55GiB | Best for maximum accuracy; original unquantized weights. |
| F16 | ⭐⭐ | Fast | 20.55GiB | Great alternative if BF16 is slow on your hardware. |
| Q8 | ⭐ | Fastest | 18.88GiB | Balanced performance and memory usage. |
**Note: On some older architectures, BF16 may be slower than F16. Check that your GPU supports native BF16 *
Read the guide from unsloth in order to set up the model's recommended settings:
Qwen3.6 - How to Run Locally Guide
- Downloads last month
- 2,721
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for noctrex/Qwen3.6-35B-A3B-MXFP4_MOE-GGUF
Base model
Qwen/Qwen3.6-35B-A3B