Instructions to use HuggingFaceBio/Carbon-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceBio/Carbon-3B with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("HuggingFaceBio/Carbon-3B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Fix tokenizer: EOS bug + decode skip_special_tokens=True empty string
Tokenizer bug fixes
Bug 1: EOS appended when add_special_tokens=True
encode(add_special_tokens=True) was appending an EOS token, which breaks lighteval's tok_encode_pair invariant. Qwen3 doesn't add BOS/EOS either β the EOS append is removed.
Bug 2: decode(skip_special_tokens=True) returns empty string for pure-DNA generations
The common generation scenario: <dna> is in the prompt, only k-mer tokens + </dna> are in the generated portion being decoded. The elif tid in dna_id_to_token branch was treating all DNA-vocab tokens (including k-mer content) as special tokens and dropping them when skip_special_tokens=True, returning an empty string instead of the DNA sequence.
Fix: only skip actual DNA special tokens (<dna>, </dna>, <oov>); always decode k-mer content tokens.
Also: auto_dna_tags parameter added (default False)
Allows raw DNA strings to be automatically wrapped in <dna>...</dna> for k-mer tokenization. Default is False to preserve existing behaviour (metadata BPE tokens must not be auto-wrapped).
LGTM!
Merging tokenizer fixes: EOS append bug, decode skip_special_tokens=True empty string, auto_dna_tags support.