Document detail
ID

oai:arXiv.org:2408.17095

Topic
Computer Science - Computer Vision... Computer Science - Machine Learnin...
Author
Mukherjee, Avideep Banerjee, Soumya Rai, Piyush Namboodiri, Vinay P.
Category

Computer Science

Year

2024

listing date

9/11/2024

Keywords
blocks computer coherence block-wise approach model generation
Metrics

Abstract

Diffusion-based models demonstrate impressive generation capabilities.

However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices.

Block-wise generation can be a promising alternative for designing compact-sized (parameter-efficient) deep generative models since the model can generate one block at a time instead of generating the whole image at once.

However, block-wise generation is also considerably challenging because ensuring coherence across generated blocks can be non-trivial.

To this end, we design a retrieval-augmented generation (RAG) approach and leverage the corresponding blocks of the images retrieved by the RAG module to condition the training and generation stages of a block-wise denoising diffusion model.

Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation.

While we showcase our approach using the latent diffusion model (LDM) as the base model, it can be used with other variants of denoising diffusion models.

We validate the solution of the coherence problem through the proposed approach by reporting substantive experiments to demonstrate our approach's effectiveness in compact model size and excellent generation quality.

Mukherjee, Avideep,Banerjee, Soumya,Rai, Piyush,Namboodiri, Vinay P., 2024, RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

Document

Open

Share

Source

Articles recommended by ES/IODE AI