HyLaR: Hybrid Latent Reasoning with Decoupled Policy Optimization
We introduce HyLar, a training framework that enables multimodal large language models (MLLMs) to perform hybrid latent reasoning — combining textual chain-of-thought with continuous visual latent representations. HyLar introduces a Canvas-in-Latents mechanism during supervised fine-tuning and a Decoupled Hybrid PPO algorithm during reinforcement learning, allowing the model to seamlessly interleave discrete text reasoning and continuous latent visual thinking.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support