SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 18 days ago • 191
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 3 days ago • 65
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 68
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models Paper • 2501.08453 • Published Jan 14, 2025 • 1
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models Paper • 2503.18886 • Published Mar 24, 2025 • 24
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published Jan 15, 2025 • 15