Uchen vs Umê Classifier (DINOv3 ViT-S)

Binary Tibetan script classifier: Uchen (དབུ་ཅན།, headed/printed script) vs Umê (དབུ་མེད།, headless/cursive script). Fine-tuned from DINOv3 ViT-S on ~10,000 manuscript scans from the Buddhist Digital Resource Center (BDRC).

Dataset: openpecha/uchen-ume-classification-dataset

Which checkpoint to use

Pick the variant that matches how you preprocess at inference:

Your pipeline	Weights	Inference preprocess
Center-crop whole page (resize short edge → 224, center crop)	`center_crop_all/final_model.pt`	`--preprocess center_crop_whole_page`
Raw full manuscript page (no PIL crop before DINO)	`without_preprocess/final_model.pt`	`--preprocess none`

Do not use with_preprocess/ — it was trained with center crop on train/val but evaluated on full-page test (56% acc). That train/test mismatch is why val looked ~99% while test JSON was ~56%.

Best results

Hub split: 9,110 train / 1,000 val / 851 test (work-stratified).

Variant	Train	Val	Test @ eval	Test acc	Test macro-F1	Val macro-F1 (best)
`center_crop_all/`	center crop	center crop	center crop	99.3%	0.983	0.996
`without_preprocess/`	none	none	none (full page)	80.7%	0.708	0.771

Test confusion matrices (851 pages)

Variant	uchen→uchen	uchen→ume	ume→uchen	ume→ume
`center_crop_all/`	94	3	3	751
`without_preprocess/`	97	2	165	603

See confusion_matrix.json and confusion_matrix.png in each variant folder on the Hub.

Training data

Class	Train	Validation	Test	Total
Uchen	~3,124	~340	~290	~3,754
Ume	~5,986	~660	~561	~7,207
Total pages	9,110	1,000	851	10,961

Splits are partitioned at the work level — all pages from the same manuscript stay in one split only.

Architecture

Backbone: DINOv3 ViT-S/16 (21M params)
Head: LayerNorm → Dropout(0.1) → Linear(384, 128) → GELU → Dropout(0.1) → Linear(128, 2)
Stages: A (head) → B (last 2 blocks) → C (last 4 blocks)
Balancing: WeightedRandomSampler + class-weighted cross-entropy

Quick start

Center-crop pipeline (recommended if you crop pages)

from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    "openpecha/uchen-ume-classifier",
    "center_crop_all/final_model.pt",
    repo_type="model",
)
ckpt = torch.load(path, map_location="cpu", weights_only=False)

python inference_uchen_ume.py \
  --image page.jpg \
  --weights center_crop_all/final_model.pt \
  --preprocess center_crop_whole_page

Full-page pipeline

path = hf_hub_download(
    "openpecha/uchen-ume-classifier",
    "without_preprocess/final_model.pt",
    repo_type="model",
)

python inference_uchen_ume.py \
  --weights without_preprocess/final_model.pt \
  --preprocess none

Repo layout

center_crop_all/             ← center_crop_whole_page at inference (~99% test)
  final_model.pt
  model_card.json
  results.json               ← includes confusion_matrix
  confusion_matrix.json
  confusion_matrix.png
without_preprocess/          ← full pages (~81% test)
  final_model.pt
  model_card.json
  results.json
  confusion_matrix.json
  confusion_matrix.png

Limitations

Preprocess must match training. Center-crop model on full pages ≈ 56%; full-page model expects uncropped input.
Trained on BDRC digitised manuscripts; may underperform on photos or non-BDRC scans.
Access requirement: DINOv3 is gated — accept facebook/dinov3-vits16-pretrain-lvd1689m and run huggingface-cli login.

Citation

@misc{karma2026uchenume,
    title        = {Uchen-Ume Classifier: Binary Tibetan Script Classification with DINOv3},
    author       = {Karma Tashi and Elie Roux},
    year         = {2026},
    publisher    = {HuggingFace},
    url          = {https://huggingface.co/openpecha/uchen-ume-classifier},
    note         = {Funded by Khyentse Foundation. Images sourced from the Buddhist Digital Resource Center (BDRC).}
}

Acknowledgements

Developed by Dharmaduta for the Buddhist Digital Resource Center (BDRC) Etext Corpus project, with funding from the Khyentse Foundation. Annotation guidelines by Pentsok Rtsang.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for openpecha/uchen-ume-classifier

Base model

facebook/dinov3-vit7b16-pretrain-lvd1689m

Finetuned

facebook/dinov3-vits16-pretrain-lvd1689m

Finetuned

(13)

this model

Dataset used to train openpecha/uchen-ume-classifier

Evaluation results

Macro F1 (center crop) on openpecha/uchen-ume-classification-benchmark
test set self-reported

0.983
Accuracy (center crop) on openpecha/uchen-ume-classification-benchmark
test set self-reported

0.993
Macro F1 (full page) on openpecha/uchen-ume-classification-benchmark
test set self-reported

0.708
Accuracy (full page) on openpecha/uchen-ume-classification-benchmark
test set self-reported

0.807