SuperGemma4 E4B Abliterated

supergemma4-e4b-abliterated is a private evaluation release whose original upstream base is google/gemma-4-E4B-it.

This SuperGemma release is an abliterated and tuned derivative of that Google E4B base, with additional work for higher release quality, stronger formatting discipline, better code output, and faster time to first token.

This branch is aimed at users who want:

strong code and bug-fix behavior
clean JSON and tool-call formatting
fast first-token responsiveness
release-ready serving behavior on Transformers and OpenAI-compatible stacks

Why This Build Exists

The original Google checkpoint provides the core Gemma 4 E4B capability base. This project line uses an abliterated release path to reduce refusal-heavy behavior, but that kind of modification can regress on exact formatting, tool-call reliability, and service stability if it is not carefully hardened.

This release focuses on recovering and then surpassing baseline quality where it matters for real usage:

exact structured outputs
code correctness
bug-fix reliability
server-facing stability
low-friction deployment on Transformers and OpenAI-compatible serving stacks

Highlights

Release-quality score: 92.34
Exact-eval score: 98.50
Broad-eval score: 83.10
JSON exact-match: 100%
Tool-call accuracy: 90%
Exact code score: 100%
Exact bug-fix score: 100%
Long-context sanity: 100%
TTFT: 2291 ms
PREFILL: 2479.70 tok/s
DECODE: 42.04 tok/s

Lineage

Original upstream base: google/gemma-4-E4B-it
Abliterated and tuned release: Jiunsong/supergemma4-e4b-abliterated

Comparison Snapshot

Measured against the same evaluation harness used for:

google/gemma-4-E4B-it

Model	Release Quality	Exact Overall	JSON	Tool	Code	Bugfix	TTFT ms	PREFILL tok/s	DECODE tok/s
Google base	77.46	83.50	50.0	90.0	62.5	100.0	4827.31	2456.69	42.04
SuperGemma4 E4B Abliterated	92.34	98.50	100.0	90.0	100.0	100.0	2291.23	2479.70	42.04

Stability Notes

This candidate was release-hardened against the failure modes that matter in real serving:

batched OpenAI-compatible serving restored
simple OpenAI-compatible serving restored
unicode output verified
tool-calling output verified
empty-response false-green cases blocked by stricter tests

Validation highlights:

direct reliability audit: 14/14
repeat reliability probe: 90/90
batched soak test: 12/12
simple soak test: 6/6

Recommended Use Cases

coding assistant
bug-fix assistant
strict JSON and schema outputs
agent backends that depend on tool-call formatting
standard BF16 deployment on Hugging Face / Transformers stacks

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Jiunsong/supergemma4-e4b-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Write a compact Python function that groups words by length."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256)

print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Serving

This checkpoint is designed to work well with:

Transformers
vLLM-style OpenAI-compatible stacks

Release Positioning

This private release is the strongest all-around E4B candidate in the current project line for users who want the abliterated base behavior without giving up quality recovery, formatting discipline, or serving readiness.

Downloads last month: 899

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Jiunsong/supergemma4-e4b-abliterated

Base model

google/gemma-4-E4B-it

Finetuned

(90)

this model

Quantizations

7 models