ManniX PRO
AI & ML interests
Recent Activity
Organizations
Working on it. Gemma 4 had a regression issue, and it has just been fixed.
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it (NVFP4) is already in the Quantization queue (status: Failed). You do not have permission to re-submit failed models ā please contact an administrator.
If you manage to fix it please remove my model from the list of failed quants, I can't resend it
Working on it. Gemma 4 had a regression issue, and it has just been fixed.
Thanks! Please remove/re-issue my model from the queue once fixed, I can't do it š
The quantization of my model, ManniX-ITA/gemma-4-A4B-98e-v6-coder-it, at INT4 and NVFP4 failed and at MXFP4 is still running since yesterday. I guess it's stuck.
I wonder, is there a specific problem with Gemma4 A4B MoE architecture or should I just re-queue them?
Sent my model to quantize and eval this morning but seems the first two quants are still in running since this morning. Are they really running? 8 hours seems a bit long.
Pure RTN mode powered by AutoRoundā If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
Are there any plans for the qwen3.6-35B model?
Not only keeps the lead on Python and closes the gap to 1-2pp in the other coding languages.
It's actually reasoning better, fixing the under-thinking and over-thinking failures of the full experts router.
All this comes with a cost with only 20b, on top of being very specific to coding; about 3x the thinking tokens in LiveCodeBench but it's good thinking that brings home not only more correct answers but in general a more precise and concise output.
š SCORES (Q6_K, llama.cpp, greedy, EVAL_PROTOCOL v3)
HumanEval 98.78 ā HumanEval+ 93.29 ā LCB-medium-55 v4 96.36
LCB-medium-100 96.00 ā MultiPL-E macro 88.00 (Rust/Java/JS)
MATH-500 91.00 ā GPQA-D 67.17 ā AIME 63.33 ā IFEval 92.00
vs v5-coder: +10.91 LCB-medium / +7.0 MultiPL-E / +10 AIME, HE+ tie
LCB targeting closed the ā9.10pp hole and pushed +1.81pp past the unpruned 128e. Top of the 14ā22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 ā 98.78).
š¦ GGUF SWEEP (all imatrix; Q4_K_M plain ā imatrix hurt it)
Q6_K ā 17.81 GB ā 93.29% (cohort top)
Q3_K_M ā 10.51 GB ā 92.68% ā value leader (imatrix lifted the 3-bit tiers hard)
IQ4_XS ā 11.01 GB ā 92.07% ā safe 4-bit
IQ3_XS ā 9.22 GB ā 92.07% ā smallest on the plateau
IQ2_S ā 7.83 GB ā 89.02% ā sub-8 GB code-grade
āļø SAME-RIG vs Qwen2.5-Coder-14B (RTX 3090, greedy)
Iso-disk 10.5 GB: Q3_K_M 92.68 vs Qwen Q5_K_M 83.54 ā +9.14pp at the same file size
LCB-medium-55 v4, identical split: 96.36 vs 18.18
bf16:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it)
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF)
Ollama:
https://ollama.com/mannix/gemma4-98e-v6-coder
Can I use MTP?
š SCORES (NVFP4A16, vLLM 0.20.2, greedy, EVAL_PROTOCOL v3)
HumanEval 98.17 ā HumanEval+ 92.68 ā LCB-medium-55 v4 85.45
MATH-500 92.00 ā GPQA-D 68.69 ā IFEval 94.00
vs v4: +1.22 HE / +1.22 HE+ / +7.27 LCB-medium
Top of the 14ā22B coder band: +8.6pp HE over Qwen2.5-Coder-14B-Instruct (89.6 ā 98.17). HE+ sanity-audited ā no memorization, no silent-empty.
š¦ EXTENSIVE GGUF SWEEP (16 plain + IQ tiers + 5 CD recipes, all imatrix-calibrated)
Q8_0 ā 21.16 GB ā 93.90% (cohort top)
Q4_K_S ā 12.21 GB ā 93.29% ā plain sweet spot
IQ4_XS ā 11.01 GB ā 93.29% ā sub-12 GB top
ā TWO EXCELLENT SUB-10 GB CONTRIBDYNAMIC CD PICKS (per-layer + IQ-codebook overrides)
CD-IQ4_K_M (Canary W) ā 10.29 GB ā 92.07% ā recommended sub-11 GB
CD-IQ3_XS_L ā 9.27 GB ā 90.24% ā smallest viable code-grade
āļø SAME-RIG vs Qwen2.5-Coder-14B-Instruct (RTX 3090, greedy HE+)
11 GB band: v5-coder IQ4_XS wins +9.75pp at -1.49 bpw
12 GB band: Q4_K_S wins +8.53pp
8 GB band: IQ2_S wins +0.61pp at lower bpw
bf16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF
NVFP4A16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16
Ollama:
https://ollama.com/mannix/gemma4-98e-v5-coder
āāā
š BONUS ā Qwen3.6-27B-Omnimerge-v4-MTP-GGUF
Same v4 weights with the native MTP head retained for llama.cpp speculative decoding (PR #22673, --spec-type draft-mtp). 7 imatrix tiers Q8_0 ā IQ2_M.
HumanEval: 2.0x decode tok/s
MBPP: 2.33x decode tok/s
Both at +1-2pp pass@1 vs the non-MTP build. GPQA Diamond comparison in flight.
MTP-GGUF:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF