Home OpenAI MaskLLM: A Learnable AI Method that Facilitates End-to End Training of LLM Sparsity on Large-Scale Datasets

OpenAI

MaskLLM: A Learnable AI Method that Facilitates End-to End Training of LLM Sparsity on Large-Scale Datasets

adminUpdated 9 months Ago2 Mins read70 Views

MaskLLM: A Learnable AI Method that Facilitates End-to End Training of LLM Sparsity on Large-Scale Datasets

LLMs, characterized by their massive parameter sizes, often lead to inefficiencies in deployment due to high memory and computational demands. One practical solution is semi-structured pruning, particularly the N: M sparsity pattern, which enhances efficiency by maintaining N non-zero values among M parameters. While hardware-friendly, such as for GPUs, this approach faces challenges due to the vast parameter space in LLMs. Methods like SparseGPT and Wanda use small calibration sets and importance criteria to select redundant parameters. Still, these are limited in scope, hindering generalization and introducing errors in representing model quality across diverse domains.

Researchers from NVIDIA and the National University of Singapore introduced MaskLLM, a learnable pruning method that applies N: M sparsity to LLMs, reducing computational overhead during inference. Unlike traditional methods, MaskLLM uses Gumbel Softmax sampling to model sparsity as a learnable distribution, enabling efficient end-to-end training on large datasets. This approach enhances mask accuracy and transferability, allowing the learned sparsity patterns to be applied across different tasks or domains. Experiments on models like LLaMA-2 and GPT-3 show significant performance improvements, with MaskLLM achieving a perplexity of 6.72 compared to 10.42 in SparseGPT.

Pruning methods are effective in compressing LLMs by removing redundant parameters. These methods can be categorized into structured, unstructured, and semi-structured pruning. Structured pruning eliminates substructures like attention heads, while unstructured pruning zeros out individual parameters, offering more flexibility but less acceleration efficiency. Semi-structured pruning, such as N: M sparsity, strikes a balance by combining structured patterns with fine-grained sparsity to enhance efficiency and flexibility. Recently, learnable sparsity methods have gained attention, particularly in vision models, and this work pioneers the application of learnable N: M masks in frozen LLMs, addressing the challenge of large-scale parameters.

The MaskLLM framework introduces N: M sparsity to optimize LLMs by selecting binary masks for parameter blocks, ensuring efficient pruning without significantly degrading model performance. Focusing on 2:4 sparsity, it selects masks where two out of four values remain non-zero. The challenge of non-differentiable mask selection is tackled through Gumbel Softmax, enabling differentiable sampling and mask optimization via gradient descent. MaskLLM learns masks from large-scale data, transferring them to downstream tasks. Sparse weight regularization maintains post-pruning quality, and prior masks improve the learning process, ensuring efficient and effective model compression.

The researchers evaluated MaskLLM on multiple LLMs (LLaMA-2, Nemotron-4, GPT-3 multilingual) ranging from 843M to 15B parameters. MaskLLM learns 2:4 sparsity masks through end-to-end training, outperforming baselines like SparseGPT and Wanda in accuracy and perplexity. The method improves mask quality with large datasets and shows robustness in low-resource settings. Transfer learning using pre-computed masks accelerates training while maintaining large remaining weights enhances downstream task performance. MaskLLM’s stochastic exploration ensures high-quality mask discovery, with results surpassing SparseGPT in perplexity after training with 1280 samples.

MaskLLM introduces a learnable pruning method for applying N: M sparsity in LLMs to reduce computational costs during inference. Instead of using a predefined importance criterion, it models N: M sparsity patterns through Gumbel Softmax sampling, enabling end-to-end training on large datasets. MaskLLM offers high-quality mask learning and transferability across domains. Tested on LLaMA-2, Nemotron-4, and GPT-3, with sizes ranging from 843M to 15B parameters, MaskLLM outperformed state-of-the-art methods in perplexity and efficiency. Its masks can be customized for lossless downstream task performance.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 50k+ ML SubReddit.

Subscribe to the fastest-growing ML Newsletter with over 26k+ subscribers.

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

Source link

Microsoft Researchers Unveil RadEdit: Stress-testing Biomedical Vision Models via Diffusion Image Editing to Eliminate Dataset Bias

Previous post Microsoft Researchers Unveil RadEdit: Stress-testing Biomedical Vision Models via Diffusion Image Editing to Eliminate Dataset Bias

Next post Is Your Next Outfit Created by a Robot?

Microsoft Open-Sources GitHub Copilot Chat Extension for VS Code—Now Free for All Developers

Microsoft has officially open-sourced the GitHub Copilot Chat extension for Visual Studio...

admin3 Mins read

OpenAI

Hugging Face Releases SmolLM3: A 3B Long-Context, Multilingual Reasoning Model

Hugging Face just released SmolLM3, the latest version of its “Smol” language...

admin3 Mins read

OpenAI

A Code Implementation for Designing Intelligent Multi-Agent Workflows with the BeeAI Framework

BeeAI FrameworkIn this tutorial, we explore the power and flexibility of the...

admin10 Mins read

OpenAI

Anthropic Proposes Targeted Transparency Framework for Frontier AI Systems

As the development of large-scale AI systems accelerates, concerns about safety, oversight,...

admin3 Mins read

This Week

OMEGA: A Structured Math Benchmark to Probe the Reasoning Limits of LLMs

LongWriter-Zero: A Reinforcement Learning Framework for Ultra-Long Text Generation Without Synthetic Data

Building Advanced Multi-Agent AI Workflows by Leveraging AutoGen and Semantic Kernel

Weekly Newsletter

MaskLLM: A Learnable AI Method that Facilitates End-to End Training of LLM Sparsity on Large-Scale Datasets

Leave a comment

Leave a Reply Cancel reply

Latest Posts

LongWriter-Zero: A Reinforcement Learning Framework for Ultra-Long Text Generation Without Synthetic Data

Building Advanced Multi-Agent AI Workflows by Leveraging AutoGen and Semantic Kernel

TabArena: Benchmarking Tabular Machine Learning with Reproducibility and Ensembling at Scale

DSRL: A Latent-Space Reinforcement Learning Approach to Adapt Diffusion Policies in Real-World Robotics

Microsoft Open-Sources GitHub Copilot Chat Extension for VS Code—Now Free for All Developers

Hugging Face Releases SmolLM3: A 3B Long-Context, Multilingual Reasoning Model

A Code Implementation for Designing Intelligent Multi-Agent Workflows with the BeeAI Framework

Anthropic Proposes Targeted Transparency Framework for Frontier AI Systems

Get to Know Us

keep in touch