SemanticVLA · SimplerEnv (WidowX)
🎉 Accepted to CVPR 2026. ✍️ Fei Ni¹, Zhuo Chen², Yifu Yuan³, Zibin Dong³, Xianze Yao³, Shan Luo², Jianye Hao³, Jiankang Deng¹†, Stefanos Zafeiriou¹†
🏫 ¹Imperial College London ²King's College London ³Tianjin University
✉️ Primary contact: f.ni@imperial.ac.uk
SemanticVLA policy trained on BridgeData V2 (Open X-Embodiment bridge_orig) for 100K steps, intended for SimplerEnv WidowX evaluation. The unified OXE LAM is used as the latent-action tokenizer, and the trace + latent-action auxiliary heads are supervised in the VLM's language stream.
Headline result (SimplerEnv WidowX)
| Task | Success rate |
|---|---|
| Put Eggplant in Basket | 0.958 |
| Spoon on Towel | 1.000 |
| Carrot on Plate | 0.792 |
| Stack Cube | 0.458 |
| Mean | 0.802 |
Architecture
| Component | Choice |
|---|---|
| VLM backbone | Qwen3-VL-4B-Instruct |
| Action head | DiT-B (flow matching) |
| LAM tokenizer | SemanticVLA-LAM (unified OXE LAM) |
| Semantic supervision | Trace + latent action tokens predicted in the VLM's language stream; action decoder unmodified |
| Latent vocabulary size | 32 |
| Latent tokens per sample | 4 |
| Action horizon | 16 |
Training data
This checkpoint is trained on BridgeData V2 (Open X-Embodiment bridge_orig) for 100K steps. It is intended specifically for SimplerEnv WidowX evaluation and is not meant as a general-purpose policy for unrelated robot embodiments.
Files
SemanticVLA-SimplerEnv/
├── README.md
├── config.yaml # loadable model config
├── dataset_statistics.json # action normalization stats
└── final_model/
└── pytorch_model.pt # policy state_dict
How to load
from semanticvla.model.framework.base_framework import baseframework
policy = baseframework.from_pretrained("pytorch_model.pt")
policy.eval()
baseframework.from_pretrained() walks two directory levels up from the checkpoint file to locate config.yaml and dataset_statistics.json. The released layout follows this convention.
To run the SimplerEnv WidowX suite, see examples/SimplerEnv/ in the code repo.
Sibling SemanticVLA checkpoint repos
| Repo | Purpose |
|---|---|
🤗 SemanticVLA-LAM |
Unified OXE LAM consumed by this policy |
🤗 SemanticVLA-LIBERO |
LIBERO policy |
Related resources
- Code: https://github.com/Fei-Ni/SemanticVLA_Offcial
- Dataset (BridgeData V2 in LeRobot v3 with dense traces): 🤗
SemanticVLA-TraceX-240K-Bridge - Datasets collection: https://hf.co/collections/spikefly/semanticvla-datasets
- Model Zoo collection: https://hf.co/collections/spikefly/semanticvla-model-zoo
Citation
@inproceedings{ni2026semanticvla,
title = {SemanticVLA: Towards Semantic Reasoning over Action Memorization via Synergistic Explicit Trace and Latent Action Planning},
author = {Ni, Fei and Chen, Zhuo and Yuan, Yifu and Dong, Zibin and Yao, Xianze and Luo, Shan and Hao, Jianye and Deng, Jiankang and Zafeiriou, Stefanos},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}
License
Released under the MIT License, subject to the upstream BridgeData V2 license.
- Downloads last month
- 14