20 6 28

ManniX PRO

ManniX-ITA

https://github.com/mann1x

mann1x

AI & ML interests

None yet

Recent Activity

updated a model 1 day ago

ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF

repliedto wenhuach's post 5 days ago

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard), currently supporting `Pure RTN mode` powered by AutoRound ⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

repliedto wenhuach's post 6 days ago

View all activity

Organizations

None yet

updated a model 1 day ago

ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF

20B • Updated 1 day ago • 12.8k • 1

replied to wenhuach's post 5 days ago

Working on it. Gemma 4 had a regression issue, and it has just been fixed.

ManniX-ITA/gemma-4-A4B-98e-v6-coder-it (NVFP4) is already in the Quantization queue (status: Failed). You do not have permission to re-submit failed models — please contact an administrator.

If you manage to fix it please remove my model from the list of failed quants, I can't resend it

replied to wenhuach's post 6 days ago

Working on it. Gemma 4 had a regression issue, and it has just been fixed.

Thanks! Please remove/re-issue my model from the queue once fixed, I can't do it 🙏

updated a model 6 days ago

ManniX-ITA/gemma-4-A4B-98e-v6-coder-it

20B • Updated 6 days ago • 323

replied to wenhuach's post 7 days ago

The quantization of my model, ManniX-ITA/gemma-4-A4B-98e-v6-coder-it, at INT4 and NVFP4 failed and at MXFP4 is still running since yesterday. I guess it's stuck.
I wonder, is there a specific problem with Gemma4 A4B MoE architecture or should I just re-queue them?

replied to wenhuach's post 8 days ago

Sent my model to quantize and eval this morning but seems the first two quants are still in running since this morning. Are they really running? 8 hours seems a bit long.

reacted to wenhuach's post with ❤️ 8 days ago

Post

4475

🚀 We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting Pure RTN mode powered by AutoRound

⭐ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!

7 replies

liked a Space 8 days ago

Low-bit LLM Leaderboard

🏆

209

Track, rank and evaluate open LLMs and chatbots

New activity in ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF 8 days ago

Are there any plans for the qwen3.6-35B model?

#1 opened 8 days ago by

jian2023

posted an update 8 days ago

Post

203

🚀 Gemma-4-A4B 98e v6-coder (C6v3lcb) — LCB-targeted code prune of Gemma 4 26B-A4B, 20.8B MoE (4B-active). Same C6 recipe as v5-coder, re-steered specifically at LiveCodeBench-medium — the one code bench pruning hurt most.

Not only keeps the lead on Python and closes the gap to 1-2pp in the other coding languages.

It's actually reasoning better, fixing the under-thinking and over-thinking failures of the full experts router.

All this comes with a cost with only 20b, on top of being very specific to coding; about 3x the thinking tokens in LiveCodeBench but it's good thinking that brings home not only more correct answers but in general a more precise and concise output.

📊 SCORES (Q6_K, llama.cpp, greedy, EVAL_PROTOCOL v3)

HumanEval 98.78 — HumanEval+ 93.29 — LCB-medium-55 v4 96.36
LCB-medium-100 96.00 — MultiPL-E macro 88.00 (Rust/Java/JS)
MATH-500 91.00 — GPQA-D 67.17 — AIME 63.33 — IFEval 92.00
vs v5-coder: +10.91 LCB-medium / +7.0 MultiPL-E / +10 AIME, HE+ tie

LCB targeting closed the −9.10pp hole and pushed +1.81pp past the unpruned 128e. Top of the 14–22B coder band: +9.2pp HE over Qwen2.5-Coder-14B-Instruct (89.6 → 98.78).

📦 GGUF SWEEP (all imatrix; Q4_K_M plain — imatrix hurt it)

Q6_K — 17.81 GB — 93.29% (cohort top)
Q3_K_M — 10.51 GB — 92.68% ⭐ value leader (imatrix lifted the 3-bit tiers hard)
IQ4_XS — 11.01 GB — 92.07% ⭐ safe 4-bit
IQ3_XS — 9.22 GB — 92.07% — smallest on the plateau
IQ2_S — 7.83 GB — 89.02% — sub-8 GB code-grade

⚔️ SAME-RIG vs Qwen2.5-Coder-14B (RTX 3090, greedy)

Iso-disk 10.5 GB: Q3_K_M 92.68 vs Qwen Q5_K_M 83.54 → +9.14pp at the same file size
LCB-medium-55 v4, identical split: 96.36 vs 18.18

bf16:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it)
GGUF:
ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF ( ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF)
Ollama:
https://ollama.com/mannix/gemma4-98e-v6-coder

updated 2 models 9 days ago

ManniX-ITA/Qwen3.5-4B-MicroCoder-GGUF

4B • Updated 9 days ago • 401 • 1

ManniX-ITA/Qwen3.5-4B-MicroCoder

Image-Text-to-Text • 5B • Updated 9 days ago • 68

published 2 models 9 days ago

ManniX-ITA/gemma-4-A4B-98e-v6-coder-it-GGUF

20B • Updated 1 day ago • 12.8k • 1

ManniX-ITA/gemma-4-A4B-98e-v6-coder-it

20B • Updated 6 days ago • 323

updated a model 10 days ago

ManniX-ITA/gemma-4-A4B-98e-v5-coder-it

20B • Updated 10 days ago • 2.88k • 2

New activity in ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF 11 days ago

Can I use MTP?

#1 opened 11 days ago by

jian2023

updated 2 models 12 days ago

ManniX-ITA/Qwen3.6-27B-Omnimerge-v4

Image-Text-to-Text • 28B • Updated 12 days ago • 940 • 12

ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF

Image-Text-to-Text • 27B • Updated 12 days ago • 37.5k • 32

posted an update 12 days ago

Post

263

🚀 Gemma-4-A4B 98e v5-coder — code-leaning 20.8B MoE (4B-active), C6 layer-relevance-weighted prune of Gemma 4 26B-A4B. Best 20B-class coder I've shipped.

📊 SCORES (NVFP4A16, vLLM 0.20.2, greedy, EVAL_PROTOCOL v3)

HumanEval 98.17 — HumanEval+ 92.68 — LCB-medium-55 v4 85.45
MATH-500 92.00 — GPQA-D 68.69 — IFEval 94.00
vs v4: +1.22 HE / +1.22 HE+ / +7.27 LCB-medium

Top of the 14–22B coder band: +8.6pp HE over Qwen2.5-Coder-14B-Instruct (89.6 → 98.17). HE+ sanity-audited — no memorization, no silent-empty.

📦 EXTENSIVE GGUF SWEEP (16 plain + IQ tiers + 5 CD recipes, all imatrix-calibrated)

Q8_0 — 21.16 GB — 93.90% (cohort top)
Q4_K_S — 12.21 GB — 93.29% ⭐ plain sweet spot
IQ4_XS — 11.01 GB — 93.29% ⭐ sub-12 GB top

⭐ TWO EXCELLENT SUB-10 GB CONTRIBDYNAMIC CD PICKS (per-layer + IQ-codebook overrides)

CD-IQ4_K_M (Canary W) — 10.29 GB — 92.07% — recommended sub-11 GB
CD-IQ3_XS_L — 9.27 GB — 90.24% — smallest viable code-grade

⚔️ SAME-RIG vs Qwen2.5-Coder-14B-Instruct (RTX 3090, greedy HE+)

11 GB band: v5-coder IQ4_XS wins +9.75pp at -1.49 bpw
12 GB band: Q4_K_S wins +8.53pp
8 GB band: IQ2_S wins +0.61pp at lower bpw

bf16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it

GGUF:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-it-GGUF

NVFP4A16:
ManniX-ITA/gemma-4-A4B-98e-v5-coder-NVFP4A16

Ollama:
https://ollama.com/mannix/gemma4-98e-v5-coder

———

🆕 BONUS — Qwen3.6-27B-Omnimerge-v4-MTP-GGUF

Same v4 weights with the native MTP head retained for llama.cpp speculative decoding (PR #22673, --spec-type draft-mtp). 7 imatrix tiers Q8_0 → IQ2_M.

HumanEval: 2.0x decode tok/s
MBPP: 2.33x decode tok/s
Both at +1-2pp pass@1 vs the non-MTP build. GPQA Diamond comparison in flight.

MTP-GGUF:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF

New activity in ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF 12 days ago

MTP version

#1 opened 13 days ago by

DzmitryTheOtherOne

ManniX PRO

AI & ML interests

Recent Activity

Organizations

ManniX-ITA's activity

Low-bit LLM Leaderboard

Are there any plans for the qwen3.6-35B model?

Can I use MTP?

MTP version