Image-to-Image
Transformers
Safetensors
English
ditfuse
image-fusion
infrared-visible-fusion
multi-focus-fusion
multi-exposure-fusion
diffusion
transformer
multimodal
text-guided
Instructions to use lijiayangCS/DiTFuse with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lijiayangCS/DiTFuse with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-to-image", model="lijiayangCS/DiTFuse")# Load model directly from transformers import DiTFuse model = DiTFuse.from_pretrained("lijiayangCS/DiTFuse", dtype="auto") - Notebooks
- Google Colab
- Kaggle
DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)
This repository provides the official pretrained weights for DiTFuse. The project code is available on GitHub:
๐ GitHub: https://github.com/Henry-Lee-real/DiTFuse
DiTFuse supports multiple fusion tasksโincluding infraredโvisible fusion, multi-focus fusion, multi-exposure fusion, and instruction-driven controllable fusion / segmentationโall within a single unified model.
๐ Available Model Versions
๐น V1 โ Stronger Zero-Shot Generalization
- Designed with better zero-shot fusion capability.
- Performs robustly on unseen fusion scenarios.
- Recommended if your use case emphasizes cross-dataset generalization.
๐น V2 โ Full Capability Version (Paper Model)
This is the main model used in the DiTFuse paper.
Provides the most comprehensive capabilities:
- Full instruction-following control
- Joint fusion + segmentation
- Better fidelity and controllability
- Stronger alignment with text prompts
Recommended for research reproduction, benchmarking, and controllable image fusion tasks.
- Downloads last month
- 12