Recent advancements in diffusion models have significantly improved tasks like image, video, and 3D generation, with pre-trained models like Stable Diffusion being pivotal. However, adapting these models to new tasks efficiently remains a challenge. Existing fine-tuning approaches—Additive, Reparameterized, and Selective-based—have limitations, such as added latency, overfitting, or complex parameter selection. A proposed solution involves leveraging “temporarily ineffective” parameters—those with minimal current impact but the potential to learn new information—by reactivating them to enhance the model’s generative capabilities without the drawbacks of existing methods.
Researchers from Shanghai Jiao Tong University and Youtu Lab, Tencent, propose SaRA, a fine-tuning method for pre-trained diffusion models. Inspired by model pruning, SaRA reuses “temporarily ineffective” parameters with small absolute values by optimizing them using sparse matrices while preserving prior knowledge. They employ a nuclear-norm-based low-rank training scheme and a progressive parameter adjustment strategy to prevent overfitting. SaRA’s memory-efficient nonstructural backpropagation reduces memory costs by 40% compared to LoRA. Experiments on Stable Diffusion models show SaRA’s superior performance across various tasks, requiring only a single line of code modification for implementation.
Diffusion models, such as Stable Diffusion, excel in image generation tasks but are limited by their large parameter sizes, making full fine-tuning challenging. Methods like ControlNet, LoRA, and DreamBooth address this by adding external networks or fine-tuning to enable controlled generation or adaptation to new tasks. Parameter-efficient fine-tuning approaches like Addictive Fine-Tuning (AFT) and Reparameterized Fine-Tuning (RFT) introduce low-rank matrices or adapters. At the same time, Selective Fine-Tuning (SFT) focuses on modifying specific parameters. SaRA improves on these methods by reusing ineffective parameters, maintaining model architecture, reducing memory costs, and enhancing fine-tuning efficiency without additional inference latency.
In diffusion models, “ineffective” parameters, identified by their small absolute values, show minimal impact on performance when pruned. Experiments on Stable Diffusion models (v1.4, v1.5, v2.0, v3.0) revealed that setting parameters below a certain threshold to zero sometimes even improves generative tasks. The ineffectiveness is due to optimization randomness, not model structure. Fine-tuning can make these parameters effective again. SaRA, a method, leverages these temporarily ineffective parameters for fine-tuning, using low-rank constraints and progressive adjustment to prevent overfitting and enhance efficiency, significantly reducing memory and computation costs compared to existing methods like LoRA.
The proposed method was evaluated on tasks like backbone fine-tuning, image customization, and video generation using FID, CLIP score, and VLHI metrics. It outperformed existing fine-tuning approaches (LoRA, AdaptFormer, LT-SFT) across datasets, showing superior task-specific learning and prior preservation. Image and video generation achieved better consistency and avoided artifacts. The method also reduced memory usage and training time by over 45%. Ablation studies highlighted the importance of progressive parameter adjustment and low-rank constraints. Correlation analysis revealed more effective knowledge acquisition than other methods, enhancing task performance.
SaRA is a parameter-efficient fine-tuning method that leverages the least impactful parameters in pre-trained models. By utilizing a nuclear norm-based low-rank loss, SaRA prevents overfitting, while its progressive parameter adjustment enhances fine-tuning effectiveness. The unstructured backpropagation reduces memory costs, benefiting other selective fine-tuning methods. SaRA significantly improves generative capabilities in tasks like domain transfer and image editing, outperforming methods like LoRA. It requires only a one-line code modification for easy integration, demonstrating superior performance on models such as Stable Diffusion 1.5, 2.0, and 3.0 across multiple applications.
Check out the Model. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 50k+ ML SubReddit
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.
Leave a comment