Instructions to use HorizonRobotics/RoboTransfer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use HorizonRobotics/RoboTransfer with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("HorizonRobotics/RoboTransfer", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
metadata
library_name: diffusers
license: apache-2.0
pipeline_tag: image-to-video
RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer

π Abstract
RoboTransfer is a novel diffusion-based video generation framework tailored for robotic visual policy transfer. Unlike conventional approaches, RoboTransfer introduces geometry-aware synthesis by injecting depth and normal priors, ensuring multi-view consistency across dynamic robotic scenes. The method further supports explicit control over scene components, such as background editing, object identity swapping, and motion specification, offering a fine-grained video generation pipeline that benefits embodied learning.
π§ Key Features
- π Geometry-Consistent Diffusion: Injects global 3D cues (depth, normal) and cross-view interactions for multi-view realism.
- π§© Scene Component Control: Enables manipulation of object attributes (pose, identity) and background features.
- π Cross-View Conditioning: Learns representations from multiple camera views with spatial correspondence.
- π€ Robotic Policy Transfer: Facilitates domain adaptation by generating synthetic training data in target domains.
π BibTeX
@article{liu2025robotransfer,
title={RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer},
author={Liu, Liu and Wang, Xiaofeng and Zhao, Guosheng and Li, Keyu Li, Wenkang Qin, Jiaxiong Qiu, Zheng Zhu, Guan Huang, Zhizhong Su},
journal={arXiv preprint arXiv:2505.23171},
year={2025}
}
