This Week

OpenAI

Google Releases Mangle: A Programming Language for Deductive Database Programming

2 Mins read

DeepMind

AI Revives Speech After 25-Year Silence

1 Mins read

DeepMind

Roleplay AI Chatbot Apps with the Best Memory: Tested

4 Mins read

Weekly Newsletter

Excepteur sint occaecat cupidatat non proident

Home OpenAI Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

OpenAI

Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

adminUpdated 7 months Ago3 Mins read58 Views

Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

Generative models have revolutionized fields like language, vision, and biology through their ability to learn and sample from complex data distributions. While these models benefit from scaling up during training through increased data, computational resources, and model sizes, their inference-time scaling capabilities face significant challenges. Specifically, diffusion models, which excel in generating continuous data like images, audio, and videos through a denoising process, encounter limitations in performance improvement when simply increasing the number of function evaluations (NFE) during inference. The traditional approach of adding more denoising steps prevents these models from achieving better results despite additional computational investment.

Various approaches have been explored to enhance the performance of generative models during inference. Test-time compute scaling has proven effective for LLMs through improved search algorithms, verification methods, and compute allocation strategies. Researchers have pursued multiple directions in diffusion models including fine-tuning approaches, reinforcement learning techniques, and implementing direct preference optimization. Moreover, sample selection and optimization methods have been developed using Random Search algorithms, VQA models, and human preference models. However, these methods either focus on training-time improvements or limited test-time optimizations, leaving room for more detailed inference-time scaling solutions.

Researchers from NYU, MIT, and Google have proposed a fundamental framework for scaling diffusion models during inference time. Their approach moves beyond simply increasing denoising steps and introduces a novel search-based methodology for improving generation performance through better noise identification. The framework operates along two key dimensions: utilizing verifiers for feedback and implementing algorithms to discover superior noise candidates. This approach addresses the limitations of conventional scaling methods by introducing a structured way to use additional computational resources during inference. The framework’s flexibility allows component combinations to be tailored to specific application scenarios.

The framework’s implementation centers on class-conditional ImageNet generation using a pre-trained SiT-XL model with 256 × 256 resolution and a second-order Heun sampler. The architecture maintains a fixed 250 denoising steps while exploring additional NFEs dedicated to search operations. The core search mechanism employs a Random Search algorithm, implementing a Best-of-N strategy to select optimal noise candidates. The system utilizes two Oracle Verifiers for verification: Inception Score (IS) and Fréchet Inception Distance (FID). IS selection is based on the highest classification probability from a pre-trained InceptionV3 model, while FID selection minimizes divergence against pre-calculated ImageNet Inception feature statistics.

The framework’s effectiveness has been shown through comprehensive testing on different benchmarks. On DrawBench, which features diverse text prompts, the LLM Grader evaluation shows that searching with various verifiers consistently improves sample quality, though with different patterns across setups. ImageReward and Verifier Ensemble perform well, showing improvements across all metrics due to their nuanced evaluation capabilities and alignment with human preferences. The results reveal different optimal configurations on T2I-CompBench, focusing on text-prompt accuracy rather than visual quality. ImageReward emerges as the top performer, while Aesthetic Scores show minimal or negative impact, and CLIP provides modest improvements.

In conclusion, researchers establish a significant advancement in the diffusion models by introducing a framework for inference-time scaling through strategic search mechanisms. The study shows that computational scaling via search methods can achieve substantial performance improvements across different model sizes and generation tasks, with varying computational budgets yielding distinct scaling behaviors. The research concludes that while the approach proves successful, it also reveals the inherent biases in different verifiers and emphasizes the importance of developing task-specific verification methods. This insight opens new avenues for future research in developing more targeted and efficient verification systems for various vision generation tasks.

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.

📄 Meet ‘Height’:The only autonomous project management tool (Sponsored)

Source link

Swarm: A Comprehensive Guide to Lightweight Multi-Agent Orchestration for Scalable and Dynamic Workflows with Code Implementation

Previous post Swarm: A Comprehensive Guide to Lightweight Multi-Agent Orchestration for Scalable and Dynamic Workflows with Code Implementation

Next post SHREC: A Physics-Based Machine Learning Approach to Time Series Analysis

Latest Posts

DeepMind

OpenAI

How Do GPUs and TPUs Differ in Training Large Transformer Models? Top GPUs and TPUs with Benchmark

Both GPUs and TPUs play crucial roles in accelerating the training of...

admin4 Mins read

OpenAI

Google AI Introduced Guardrailed-AMIE (g-AMIE): A Multi-Agent Approach to Accountability in Conversational Medical AI

Recent advances in large language model (LLM)-powered diagnostic AI agents have yielded...

admin3 Mins read

OpenAI

A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations

def plot_advanced_forecasts(test_data, forecasts_dict, series_idx=0): """Advanced plotting with multiple models and uncertainty bands"""...

admin3 Mins read

This Week

Google Releases Mangle: A Programming Language for Deductive Database Programming

AI Revives Speech After 25-Year Silence

Roleplay AI Chatbot Apps with the Best Memory: Tested

Weekly Newsletter

Google AI Proposes a Fundamental Framework for Inference-Time Scaling in Diffusion Models

Leave a comment

Leave a Reply Cancel reply

Latest Posts

AI Revives Speech After 25-Year Silence

Roleplay AI Chatbot Apps with the Best Memory: Tested

I Tested WriteHuman: Some Features Surprised Me

What Is Speaker Diarization? A 2025 Technical Guide: Top 9 Speaker Diarization Libraries and APIs in 2025

How to Implement the LLM Arena-as-a-Judge Approach to Evaluate Large Language Model Outputs

How Do GPUs and TPUs Differ in Training Large Transformer Models? Top GPUs and TPUs with Benchmark

Google AI Introduced Guardrailed-AMIE (g-AMIE): A Multi-Agent Approach to Accountability in Conversational Medical AI

A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations

Get to Know Us

keep in touch