Instructions to use Rohanify/PyBlissa-Coder-40M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Rohanify/PyBlissa-Coder-40M with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Rohanify/PyBlissa-Coder-40M", filename="PyBlissa-V2-F32.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Rohanify/PyBlissa-Coder-40M with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Rohanify/PyBlissa-Coder-40M:F32 # Run inference directly in the terminal: llama-cli -hf Rohanify/PyBlissa-Coder-40M:F32
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Rohanify/PyBlissa-Coder-40M:F32 # Run inference directly in the terminal: llama-cli -hf Rohanify/PyBlissa-Coder-40M:F32
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Rohanify/PyBlissa-Coder-40M:F32 # Run inference directly in the terminal: ./llama-cli -hf Rohanify/PyBlissa-Coder-40M:F32
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Rohanify/PyBlissa-Coder-40M:F32 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Rohanify/PyBlissa-Coder-40M:F32
Use Docker
docker model run hf.co/Rohanify/PyBlissa-Coder-40M:F32
- LM Studio
- Jan
- vLLM
How to use Rohanify/PyBlissa-Coder-40M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Rohanify/PyBlissa-Coder-40M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rohanify/PyBlissa-Coder-40M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Rohanify/PyBlissa-Coder-40M:F32
- Ollama
How to use Rohanify/PyBlissa-Coder-40M with Ollama:
ollama run hf.co/Rohanify/PyBlissa-Coder-40M:F32
- Unsloth Studio
How to use Rohanify/PyBlissa-Coder-40M with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Rohanify/PyBlissa-Coder-40M to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Rohanify/PyBlissa-Coder-40M to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Rohanify/PyBlissa-Coder-40M to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Rohanify/PyBlissa-Coder-40M with Docker Model Runner:
docker model run hf.co/Rohanify/PyBlissa-Coder-40M:F32
- Lemonade
How to use Rohanify/PyBlissa-Coder-40M with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Rohanify/PyBlissa-Coder-40M:F32
Run and chat with the model
lemonade run user.PyBlissa-Coder-40M-F32
List all available models
lemonade list
output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)πΈ PyBlissa-Coder-40M
!! 14.6% SCORE ON HumanEval PASS@1 !!
PyBlissa-Coder-40M is the second model from the PyBlissa-Coder family that mainly supports python coding. Despite its small footprint, 40M parameters, trained on 272M tokens, PyBlissa has achieved an amazing score of 14.6% on the HumanEval dataset; and 4.4% on MBPP dataset (both being benchmark datasets). While scoring such a good number, it's imperfections are also something to be aware of. This model can sometimes generate wrong, inefficient, or broken codes. Though its mostly temperature-dependent.
Benchmarks
| Benchmark | Score | Protocol | Temp |
|---|---|---|---|
| HumanEval pass@1 | 14.6% (24/164) | zero-shot, fenced-code extraction | 0.25 |
| MBPP pass@1 | 4.4% (22/500) | official tests-in-prompt (Austin et al. 2021) | 0.05 |
How PyBlissa compares on HumanEval
| Model | Params | HumanEval pass@1 |
|---|---|---|
| GPT-Neo | 125M | 0.75% |
| CodeParrot-small | 110M | 3.80% |
| PyCodeGPT | 110M | 8.33% |
| PyBlissa-Coder | 40M | 14.6% |
PyBlissa is ~2.75Γ smaller than CodeParrot-small yet scores roughly 4Γ higher on HumanEval pass@1, trained on a single consumer GPU.
Model details
| Architecture | Decoder-only transformer (GPT-2 style, nanoGPT lineage) |
| Parameters | 39.9M |
| Layers | 10 |
| Model dim (d_model) | 512 |
| Heads | 8 (head_dim 64) |
| FFN dim (d_ff) | 2048 |
| Context length | 512 tokens |
| Vocab size | 16,000 (custom ByteLevel BPE) |
| Tied embeddings | Yes |
| Precision | trained in bf16, released as F32 GGUF |
| Best val loss | 0.3615 |
Training
| Hardware | 1 Γ NVIDIA RTX 5080 (16 GB) |
| Training tokens | 272M (train split) |
| Epochs | 5 |
| Optimizer | AdamW (Ξ² 0.9/0.95, wd 0.1) |
| LR schedule | cosine, 4e-4 β 4e-5, ~2% warmup |
| Batch size | 48 |
| Total steps | 55,405 |
| Wall-clock time | ~116 min |
Usage
Ollama
ollama run hf.co/Rohanify/PyBlissa-Coder-40M:F32
The repo ships template and params files, so Ollama applies the correct
PROMPT:/CODE: format and sampling defaults automatically β no Modelfile
needed for remote runs.
To run a local GGUF instead:
ollama create pyblissa-40m -f Modelfile
ollama run pyblissa-40m "write a function that checks if a number is prime"
Prompt format
The model was trained on a plain-text wrapper. At inference, the prompt is wrapped as:
PROMPT: {your instruction}
CODE:
The model then emits a fenced ```python code block. (When using Ollama, the
template file does this wrapping for you β just type a plain instruction.)
Recommended sampling
| Parameter | Value |
|---|---|
| temperature | 0.25 β 0.3 |
| top_k | 10 |
| repeat_penalty | 1.25 |
| num_ctx | 512 |
Limitations
PyBlissa is a 40M-parameter model trained primarily for prompt β Python generation. Known limitations:
- It is a small model: it solves short, self-contained functions well but struggles with multi-step or library-heavy tasks.
- It sometimes omits
importstatements for stdlib modules it uses (math,re,hashlib, etc.). - It can occasionally emit a short natural-language preamble before the code block on harder prompts.
- Code explanation and non-Python tasks are out of distribution β it may attempt them, but that is not what it was trained for.
- As with any code model, review and test generated code before running it.
Training data & attribution
This model was trained on the following datasets. Per their licenses, attribution is provided here:
- nvidia/OpenCodeInstruct β CC-BY-4.0 https://huggingface.co/datasets/nvidia/OpenCodeInstruct
- flytech/python-codes-25k β MIT https://huggingface.co/datasets/flytech/python-codes-25k
No OpenAI-derived data was used in training.
License
The model weights are released under Apache-2.0. Note that the training data carries its own licenses (CC-BY-4.0 and MIT, see above), which require attribution as provided.
@misc{pyblissa2026,
title = {PyBlissa-Coder-40M: A from-scratch Python code model},
author = {Rohan},
year = {2026},
howpublished = {\url{https://huggingface.co/Rohanify/PyBlissa-Coder-40M}}
}
- Downloads last month
- -
32-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Rohanify/PyBlissa-Coder-40M", filename="PyBlissa-V2-F32.gguf", )