Sunday 09 March 2025
The quest for better image generation has been a holy grail of sorts in the field of artificial intelligence, and researchers have made significant strides in recent years. A new study published recently takes this pursuit to the next level by exploring the role of inference-time scaling in diffusion models.
For those unfamiliar with the concept, diffusion models are a class of generative models that dominate the continuous data domains such as images, audio, and videos. They’re trained to remove noise from data, and their generation process typically starts from pure noise and requires multiple iterations of the pre-trained model. The study in question focuses on scaling up these models during inference, rather than just training them with more data or computational resources.
The researchers investigated various search methods for finding better noises during the diffusion sampling process. They proposed two new algorithms: Zero-Order Search and Search Over Paths. The former involves searching a set of Gaussian noise candidates based on their scores from a verifier, while the latter explores different paths of noise samples to find the best one.
To evaluate the performance of these search methods, the researchers used several verifiers, including classification logits from DINO and CLIP, aesthetic predictors, and self-supervised verifiers. They also experimented with different inference compute budgets and found that scaling up compute can lead to significant improvements in image quality.
One of the most interesting aspects of this study is its exploration of the relationship between verifier hacking and degeneracy in evaluation metrics. The researchers observed that when searching against a verifier, the selected noises will eventually overfit to its bias, leading to a loss of diversity in the sample set. This can result in poor performance on other evaluation metrics.
To mitigate this issue, the researchers proposed using verifiers that operate on a population basis and take into account the global structure of the set of selected noises. They also found that self-supervised verifiers, which are designed to select samples with small trajectory curvature in feature space, can be effective in some cases but not others.
The study’s findings have important implications for the development of diffusion models and their applications in various fields. For instance, the proposed search methods could be used to improve image generation quality in tasks such as text-to-image synthesis or image editing.
Moreover, the researchers’ exploration of verifier hacking and degeneracy highlights the need for task-specific verifiers that are designed with a deep understanding of the underlying problem domain.
Cite this article: “Scaling Up Diffusion Models for Improved Image Generation”, The Science Archive, 2025.
Diffusion Models, Generative Models, Image Generation, Inference-Time Scaling, Noise Sampling, Search Methods, Verifiers, Classification Logits, Aesthetic Predictors, Self-Supervised Verifiers







