VLA - a Philmat Collection

Philmat 's Collections

VLA

updated 1 day ago

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Paper • 2511.15605 • Published Nov 19, 2025 • 25
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1, 2025 • 68
RLDX-1 Technical Report

Paper • 2605.03269 • Published 4 days ago • 95
MolmoAct2: Action Reasoning Models for Real-world Deployment

Paper • 2605.02881 • Published 5 days ago • 261
VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots

Paper • 2512.24673 • Published Dec 31, 2025
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published Jan 31 • 324
Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning

Paper • 2602.10503 • Published Feb 11
MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

Paper • 2603.25406 • Published Mar 26 • 5
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Paper • 2511.17889 • Published Nov 22, 2025 • 5
How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf

Paper • 2602.18397 • Published Feb 20
Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Paper • 2512.13080 • Published Dec 15, 2025 • 17
Long-Term Memory for VLA-based Agents in Open-World Task Execution

Paper • 2604.15671 • Published 22 days ago
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

Paper • 2505.02152 • Published May 4, 2025
TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments

Paper • 2602.02459 • Published Feb 2 • 4
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

Paper • 2511.21192 • Published Mar 10
Towards Accessible Physical AI: LoRA-Based Fine-Tuning of VLA Models for Real-World Robot Control

Paper • 2512.11921 • Published Dec 11, 2025
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM

Paper • 2410.15549 • Published Oct 21, 2024
End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection

Paper • 2511.00139 • Published Oct 31, 2025 • 1
LLaDA-VLA: Vision Language Diffusion Action Models

Paper • 2509.06932 • Published Sep 8, 2025
HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies

Paper • 2512.05693 • Published Dec 5, 2025 • 1
On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

Paper • 2601.06748 • Published Jan 11
TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking

Paper • 2510.07134 • Published Oct 8, 2025
World-Gymnast: Training Robots with Reinforcement Learning in a World Model

Paper • 2602.02454 • Published Feb 2
Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence

Paper • 2510.21860 • Published Oct 23, 2025
RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

Paper • 2602.16444 • Published Feb 18
WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL

Paper • 2602.13977 • Published Feb 15 • 2
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey

Paper • 2510.17111 • Published Oct 20, 2025
Pure Vision Language Action (VLA) Models: A Comprehensive Survey

Paper • 2509.19012 • Published Sep 23, 2025
VLA-0: Building State-of-the-Art VLAs with Zero Modification

Paper • 2510.13054 • Published Oct 15, 2025 • 16
10 Open Challenges Steering the Future of Vision-Language-Action Models

Paper • 2511.05936 • Published Nov 8, 2025 • 6
Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding

Paper • 2507.00416 • Published Jul 1, 2025
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Paper • 2512.11362 • Published Dec 12, 2025 • 23
NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies

Paper • 2510.25122 • Published Oct 29, 2025
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Paper • 2602.10098 • Published Feb 10 • 19
JEPA-VLA: Video Predictive Embedding is Needed for VLA Models

Paper • 2602.11832 • Published Feb 12