GVCFDiffusion: A Breakthrough for Seamless Large-Content Image Generation
Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation (AAAI-25) [Paper] [Code]
Shoukun Sun1, Min Xian1*, Tiankai Yao2, Fei Xu2, and Luca Capriotti2
1MIDA-Lab, Computer Science, University of Idaho; 2Idaho National Laboratory
Generating large images, such as panoramas or 360-degree scenes, remains a significant challenge in the field of text-to-image synthesis. While models like Stable Diffusion excel at producing realistic images, they struggle with large content due to computational constraints. To overcome this, researchers have turned to patch-based methods, where small patches are generated and then merged. However, these methods often result in noticeable seams and inconsistent styles.
Our Solution: Guided and Variance-Corrected Fusion with Style Alignment
Our research introduces three key innovations to address these challenges:
-
Guided Fusion (GF): We use a guidance map to weight the averaging process in overlapped regions, ensuring that patches closer to the center dominate the denoising process. This reduces perturbations and produces more seamless results.
-
Variance-Corrected Fusion (VCF): When using certain samplers like DDPM, averaging overlapped regions can lead to blurred images due to reduced variance. Our VCF technique corrects this variance, preserving fine details and preventing blurriness.
-
One-shot Style Alignment (SA): To ensure consistent style across patches, we align the initial noise through semantic interpolation. Unlike previous methods, our approach performs this alignment once, without disturbing the denoising process.
Impressive Results
Our experiments demonstrate significant improvements over existing methods. When evaluating on 512×3584 panorama images, which is 7x wider than the diffusion model output size, our approach consistently outperforms others in terms of FID, KID, GIQA-QS, and GIQA-DS scores. Visual comparisons show seamless, coherent landscapes with consistent lighting and texture.
Real-World Applications
These techniques can be applied to enhance other fusion-based methods for large image generation, benefiting applications such as:
- Panorama Generation: Creating wide-view landscapes with consistent style.
- High-Resolution Image Creation: Generating images with ultra-fine details.
- Mobile Applications: Efficient for on-device implementation.
Conclusion
Our approach offers a solution for generating high-quality, large-content images without requiring extensive model training or resources. By addressing the core challenges of patch fusion and style consistency, we enable seamless and consistent results that can be applied across various industries.