InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation Paper • 2605.14333 • Published 19 days ago • 34
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision Paper • 2605.05781 • Published 26 days ago • 5
Meta-CoT: Enhancing Granularity and Generalization in Image Editing Paper • 2604.24625 • Published Apr 27 • 26
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models Paper • 2604.25636 • Published Apr 28 • 24
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models Paper • 2604.25636 • Published Apr 28 • 24
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published Nov 20, 2025 • 16
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models Paper • 2305.16223 • Published May 25, 2023
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing Paper • 2406.08850 • Published Jun 13, 2024
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment Paper • 2406.04295 • Published Jun 6, 2024