DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)

This repository provides the official pretrained weights for DiTFuse. The project code is available on GitHub:

👉 GitHub: https://github.com/Henry-Lee-real/DiTFuse

DiTFuse supports multiple fusion tasks—including infrared–visible fusion, multi-focus fusion, multi-exposure fusion, and instruction-driven controllable fusion / segmentation—all within a single unified model.

📌 Available Model Versions

🔹 V1 — Stronger Zero-Shot Generalization

Designed with better zero-shot fusion capability.
Performs robustly on unseen fusion scenarios.
Recommended if your use case emphasizes cross-dataset generalization.

🔹 V2 — Full Capability Version (Paper Model)

This is the main model used in the DiTFuse paper.
Provides the most comprehensive capabilities:
- Full instruction-following control
- Joint fusion + segmentation
- Better fidelity and controllability
- Stronger alignment with text prompts
Recommended for research reproduction, benchmarking, and controllable image fusion tasks.

Downloads last month: 12