SuperGemma4 E4B Abliterated

supergemma4-e4b-abliterated is a private evaluation release whose original upstream base is google/gemma-4-E4B-it.

This SuperGemma release is an abliterated and tuned derivative of that Google E4B base, with additional work for higher release quality, stronger formatting discipline, better code output, and faster time to first token.

This branch is aimed at users who want:

  • strong code and bug-fix behavior
  • clean JSON and tool-call formatting
  • fast first-token responsiveness
  • release-ready serving behavior on Transformers and OpenAI-compatible stacks

Why This Build Exists

The original Google checkpoint provides the core Gemma 4 E4B capability base. This project line uses an abliterated release path to reduce refusal-heavy behavior, but that kind of modification can regress on exact formatting, tool-call reliability, and service stability if it is not carefully hardened.

This release focuses on recovering and then surpassing baseline quality where it matters for real usage:

  • exact structured outputs
  • code correctness
  • bug-fix reliability
  • server-facing stability
  • low-friction deployment on Transformers and OpenAI-compatible serving stacks

Highlights

  • Release-quality score: 92.34
  • Exact-eval score: 98.50
  • Broad-eval score: 83.10
  • JSON exact-match: 100%
  • Tool-call accuracy: 90%
  • Exact code score: 100%
  • Exact bug-fix score: 100%
  • Long-context sanity: 100%
  • TTFT: 2291 ms
  • PREFILL: 2479.70 tok/s
  • DECODE: 42.04 tok/s

Lineage

  1. Original upstream base: google/gemma-4-E4B-it
  2. Abliterated and tuned release: Jiunsong/supergemma4-e4b-abliterated

Comparison Snapshot

Measured against the same evaluation harness used for:

  • google/gemma-4-E4B-it
Model Release Quality Exact Overall JSON Tool Code Bugfix TTFT ms PREFILL tok/s DECODE tok/s
Google base 77.46 83.50 50.0 90.0 62.5 100.0 4827.31 2456.69 42.04
SuperGemma4 E4B Abliterated 92.34 98.50 100.0 90.0 100.0 100.0 2291.23 2479.70 42.04

Stability Notes

This candidate was release-hardened against the failure modes that matter in real serving:

  • batched OpenAI-compatible serving restored
  • simple OpenAI-compatible serving restored
  • unicode output verified
  • tool-calling output verified
  • empty-response false-green cases blocked by stricter tests

Validation highlights:

  • direct reliability audit: 14/14
  • repeat reliability probe: 90/90
  • batched soak test: 12/12
  • simple soak test: 6/6

Recommended Use Cases

  • coding assistant
  • bug-fix assistant
  • strict JSON and schema outputs
  • agent backends that depend on tool-call formatting
  • standard BF16 deployment on Hugging Face / Transformers stacks

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Jiunsong/supergemma4-e4b-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Write a compact Python function that groups words by length."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256)

print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Serving

This checkpoint is designed to work well with:

  • Transformers
  • vLLM-style OpenAI-compatible stacks

Release Positioning

This private release is the strongest all-around E4B candidate in the current project line for users who want the abliterated base behavior without giving up quality recovery, formatting discipline, or serving readiness.

Downloads last month
899
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jiunsong/supergemma4-e4b-abliterated

Finetuned
(90)
this model
Quantizations
7 models