SemanticVLA · SimplerEnv (WidowX)

🎉 Accepted to CVPR 2026. ✍️ Fei Ni¹, Zhuo Chen², Yifu Yuan³, Zibin Dong³, Xianze Yao³, Shan Luo², Jianye Hao³, Jiankang Deng¹†, Stefanos Zafeiriou¹†
🏫 ¹Imperial College London ²King's College London ³Tianjin University
✉️ Primary contact: f.ni@imperial.ac.uk

SemanticVLA policy trained on BridgeData V2 (Open X-Embodiment bridge_orig) for 100K steps, intended for SimplerEnv WidowX evaluation. The unified OXE LAM is used as the latent-action tokenizer, and the trace + latent-action auxiliary heads are supervised in the VLM's language stream.

Headline result (SimplerEnv WidowX)

Task	Success rate
Put Eggplant in Basket	0.958
Spoon on Towel	1.000
Carrot on Plate	0.792
Stack Cube	0.458
Mean	0.802

Architecture

Component	Choice
VLM backbone	Qwen3-VL-4B-Instruct
Action head	DiT-B (flow matching)
LAM tokenizer	`SemanticVLA-LAM` (unified OXE LAM)
Semantic supervision	Trace + latent action tokens predicted in the VLM's language stream; action decoder unmodified
Latent vocabulary size	32
Latent tokens per sample	4
Action horizon	16

Training data

This checkpoint is trained on BridgeData V2 (Open X-Embodiment bridge_orig) for 100K steps. It is intended specifically for SimplerEnv WidowX evaluation and is not meant as a general-purpose policy for unrelated robot embodiments.

Files

SemanticVLA-SimplerEnv/
├── README.md
├── config.yaml              # loadable model config
├── dataset_statistics.json  # action normalization stats
└── final_model/
    └── pytorch_model.pt     # policy state_dict

How to load

from semanticvla.model.framework.base_framework import baseframework

policy = baseframework.from_pretrained("pytorch_model.pt")
policy.eval()

baseframework.from_pretrained() walks two directory levels up from the checkpoint file to locate config.yaml and dataset_statistics.json. The released layout follows this convention.

To run the SimplerEnv WidowX suite, see examples/SimplerEnv/ in the code repo.

Sibling SemanticVLA checkpoint repos

Repo	Purpose
🤗 `SemanticVLA-LAM`	Unified OXE LAM consumed by this policy
🤗 `SemanticVLA-LIBERO`	LIBERO policy

Related resources

Code: https://github.com/Fei-Ni/SemanticVLA_Offcial
Dataset (BridgeData V2 in LeRobot v3 with dense traces): 🤗 SemanticVLA-TraceX-240K-Bridge
Datasets collection: https://hf.co/collections/spikefly/semanticvla-datasets
Model Zoo collection: https://hf.co/collections/spikefly/semanticvla-model-zoo

Citation

@inproceedings{ni2026semanticvla,
  title     = {SemanticVLA: Towards Semantic Reasoning over Action Memorization via Synergistic Explicit Trace and Latent Action Planning},
  author    = {Ni, Fei and Chen, Zhuo and Yuan, Yifu and Dong, Zibin and Yao, Xianze and Luo, Shan and Hao, Jianye and Deng, Jiankang and Zafeiriou, Stefanos},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

License

Released under the MIT License, subject to the upstream BridgeData V2 license.

Downloads last month: 14

Video Preview

Robotics

Collection including spikefly/SemanticVLA-SimplerEnv

SemanticVLA Model Zoo

Collection

All released SemanticVLA checkpoints — LAM, LIBERO, SimplerEnv. • 3 items • Updated about 23 hours ago • 1