🌸 PyBlissa-Coder-40M

!! 14.6% SCORE ON HumanEval PASS@1 !!

PyBlissa-Coder-40M is the second model from the PyBlissa-Coder family that mainly supports python coding. Despite its small footprint, 40M parameters, trained on 272M tokens, PyBlissa has achieved an amazing score of 14.6% on the HumanEval dataset; and 4.4% on MBPP dataset (both being benchmark datasets). While scoring such a good number, it's imperfections are also something to be aware of. This model can sometimes generate wrong, inefficient, or broken codes. Though its mostly temperature-dependent.

PyBlissa-Coder-40M

Training curve

Benchmarks

Benchmark Score Protocol Temp
HumanEval pass@1 14.6% (24/164) zero-shot, fenced-code extraction 0.25
MBPP pass@1 4.4% (22/500) official tests-in-prompt (Austin et al. 2021) 0.05

How PyBlissa compares on HumanEval

Model Params HumanEval pass@1
GPT-Neo 125M 0.75%
CodeParrot-small 110M 3.80%
PyCodeGPT 110M 8.33%
PyBlissa-Coder 40M 14.6%

PyBlissa is ~2.75Γ— smaller than CodeParrot-small yet scores roughly 4Γ— higher on HumanEval pass@1, trained on a single consumer GPU.


Model details

Architecture Decoder-only transformer (GPT-2 style, nanoGPT lineage)
Parameters 39.9M
Layers 10
Model dim (d_model) 512
Heads 8 (head_dim 64)
FFN dim (d_ff) 2048
Context length 512 tokens
Vocab size 16,000 (custom ByteLevel BPE)
Tied embeddings Yes
Precision trained in bf16, released as F32 GGUF
Best val loss 0.3615

Training

Hardware 1 Γ— NVIDIA RTX 5080 (16 GB)
Training tokens 272M (train split)
Epochs 5
Optimizer AdamW (Ξ² 0.9/0.95, wd 0.1)
LR schedule cosine, 4e-4 β†’ 4e-5, ~2% warmup
Batch size 48
Total steps 55,405
Wall-clock time ~116 min

Usage

Ollama

ollama run hf.co/Rohanify/PyBlissa-Coder-40M:F32

The repo ships template and params files, so Ollama applies the correct PROMPT:/CODE: format and sampling defaults automatically β€” no Modelfile needed for remote runs.

To run a local GGUF instead:

ollama create pyblissa-40m -f Modelfile
ollama run pyblissa-40m "write a function that checks if a number is prime"

Prompt format

The model was trained on a plain-text wrapper. At inference, the prompt is wrapped as:

PROMPT: {your instruction}
CODE:

The model then emits a fenced ```python code block. (When using Ollama, the template file does this wrapping for you β€” just type a plain instruction.)

Recommended sampling

Parameter Value
temperature 0.25 – 0.3
top_k 10
repeat_penalty 1.25
num_ctx 512

Limitations

PyBlissa is a 40M-parameter model trained primarily for prompt β†’ Python generation. Known limitations:

  • It is a small model: it solves short, self-contained functions well but struggles with multi-step or library-heavy tasks.
  • It sometimes omits import statements for stdlib modules it uses (math, re, hashlib, etc.).
  • It can occasionally emit a short natural-language preamble before the code block on harder prompts.
  • Code explanation and non-Python tasks are out of distribution β€” it may attempt them, but that is not what it was trained for.
  • As with any code model, review and test generated code before running it.

Training data & attribution

This model was trained on the following datasets. Per their licenses, attribution is provided here:

No OpenAI-derived data was used in training.


License

The model weights are released under Apache-2.0. Note that the training data carries its own licenses (CC-BY-4.0 and MIT, see above), which require attribution as provided.


@misc{pyblissa2026,
  title  = {PyBlissa-Coder-40M: A from-scratch Python code model},
  author = {Rohan},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/Rohanify/PyBlissa-Coder-40M}}
}
Downloads last month
-
GGUF
Model size
40M params
Architecture
gpt2
Hardware compatibility
Log In to add your hardware

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support