Home OpenAI MaskLLM: A Learnable AI Method that Facilitates End-to End Training of LLM Sparsity on Large-Scale Datasets
OpenAI

MaskLLM: A Learnable AI Method that Facilitates End-to End Training of LLM Sparsity on Large-Scale Datasets

Share
MaskLLM: A Learnable AI Method that Facilitates End-to End Training of LLM Sparsity on Large-Scale Datasets
Share


LLMs, characterized by their massive parameter sizes, often lead to inefficiencies in deployment due to high memory and computational demands. One practical solution is semi-structured pruning, particularly the N: M sparsity pattern, which enhances efficiency by maintaining N non-zero values among M parameters. While hardware-friendly, such as for GPUs, this approach faces challenges due to the vast parameter space in LLMs. Methods like SparseGPT and Wanda use small calibration sets and importance criteria to select redundant parameters. Still, these are limited in scope, hindering generalization and introducing errors in representing model quality across diverse domains.

Researchers from NVIDIA and the National University of Singapore introduced MaskLLM, a learnable pruning method that applies N: M sparsity to LLMs, reducing computational overhead during inference. Unlike traditional methods, MaskLLM uses Gumbel Softmax sampling to model sparsity as a learnable distribution, enabling efficient end-to-end training on large datasets. This approach enhances mask accuracy and transferability, allowing the learned sparsity patterns to be applied across different tasks or domains. Experiments on models like LLaMA-2 and GPT-3 show significant performance improvements, with MaskLLM achieving a perplexity of 6.72 compared to 10.42 in SparseGPT.

Pruning methods are effective in compressing LLMs by removing redundant parameters. These methods can be categorized into structured, unstructured, and semi-structured pruning. Structured pruning eliminates substructures like attention heads, while unstructured pruning zeros out individual parameters, offering more flexibility but less acceleration efficiency. Semi-structured pruning, such as N: M sparsity, strikes a balance by combining structured patterns with fine-grained sparsity to enhance efficiency and flexibility. Recently, learnable sparsity methods have gained attention, particularly in vision models, and this work pioneers the application of learnable N: M masks in frozen LLMs, addressing the challenge of large-scale parameters.

The MaskLLM framework introduces N: M sparsity to optimize LLMs by selecting binary masks for parameter blocks, ensuring efficient pruning without significantly degrading model performance. Focusing on 2:4 sparsity, it selects masks where two out of four values remain non-zero. The challenge of non-differentiable mask selection is tackled through Gumbel Softmax, enabling differentiable sampling and mask optimization via gradient descent. MaskLLM learns masks from large-scale data, transferring them to downstream tasks. Sparse weight regularization maintains post-pruning quality, and prior masks improve the learning process, ensuring efficient and effective model compression.

The researchers evaluated MaskLLM on multiple LLMs (LLaMA-2, Nemotron-4, GPT-3 multilingual) ranging from 843M to 15B parameters. MaskLLM learns 2:4 sparsity masks through end-to-end training, outperforming baselines like SparseGPT and Wanda in accuracy and perplexity. The method improves mask quality with large datasets and shows robustness in low-resource settings. Transfer learning using pre-computed masks accelerates training while maintaining large remaining weights enhances downstream task performance. MaskLLM’s stochastic exploration ensures high-quality mask discovery, with results surpassing SparseGPT in perplexity after training with 1280 samples. 

MaskLLM introduces a learnable pruning method for applying N: M sparsity in LLMs to reduce computational costs during inference. Instead of using a predefined importance criterion, it models N: M sparsity patterns through Gumbel Softmax sampling, enabling end-to-end training on large datasets. MaskLLM offers high-quality mask learning and transferability across domains. Tested on LLaMA-2, Nemotron-4, and GPT-3, with sizes ranging from 843M to 15B parameters, MaskLLM outperformed state-of-the-art methods in perplexity and efficiency. Its masks can be customized for lossless downstream task performance. 


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 50k+ ML SubReddit.

Subscribe to the fastest-growing ML Newsletter with over 26k+ subscribers.


Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.





Source link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

By submitting this form, you are consenting to receive marketing emails and alerts from: techaireports.com. You can revoke your consent to receive emails at any time by using the Unsubscribe link, found at the bottom of every email.

Latest Posts

Related Articles
Build an Intelligent Multi-Tool AI Agent Interface Using Streamlit for Seamless Real-Time Interaction
OpenAI

Build an Intelligent Multi-Tool AI Agent Interface Using Streamlit for Seamless Real-Time Interaction

In this tutorial, we’ll build a powerful and interactive Streamlit application that...

This AI Paper from Google Introduces a Causal Framework to Interpret Subgroup Fairness in Machine Learning Evaluations More Reliably
OpenAI

This AI Paper from Google Introduces a Causal Framework to Interpret Subgroup Fairness in Machine Learning Evaluations More Reliably

Understanding Subgroup Fairness in Machine Learning ML Evaluating fairness in machine learning...

From Backend Automation to Frontend Collaboration: What’s New in AG-UI Latest Update for AI Agent-User Interaction
OpenAI

From Backend Automation to Frontend Collaboration: What’s New in AG-UI Latest Update for AI Agent-User Interaction

Introduction AI agents are increasingly moving from pure backend automators to visible,...