Lightning Unified Video Editing via In-Context Sparse Attention
Abstract
In-context sparse attention framework enables efficient video editing with reduced computational costs while maintaining visual quality.
Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we build \texttt{LIVEditor} , a novel lightning video editing model via ISA and a proposed video-editing data pipeline that curated a 1.7M high-quality dataset. Extensive experiments demonstrate that LIVEditor achieves a sim60% reduction in attention-module latency while surpassing state-of-the-art methods across EditVerseBench, IVE-Bench, and VIE-Bench, delivering near-lossless acceleration without compromising visual fidelity.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LIVE: Leveraging Image Manipulation Priors for Instruction-based Video Editing (2026)
- ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks (2026)
- GEditBench v2: A Human-Aligned Benchmark for General Image Editing (2026)
- PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference (2026)
- SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing (2026)
- AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding (2026)
- ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.04569 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper