Home OpenAI MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments

OpenAI

MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments

adminUpdated 9 months Ago3 Mins read52 Views

MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments

In natural language processing (NLP), a central question is how well the probabilities generated by language models (LMs) align with human linguistic behavior. This alignment is often assessed by comparing LM scores with human acceptability judgments, which evaluate how natural a sentence feels. Previous studies, such as those using SLOR (Syntactic Log-Odds Ratio), have attempted to bridge this gap, but significant issues remain. SLOR assumes uniform correction for factors such as sequence length and unigram frequency across different models, which can lead to inaccuracies. A more dynamic method is needed, one that can better adapt to differences between models and the complexities of human language processing.

MORCELA: A New Linking Theory

A team of researchers from NYU and CMU propose MORCELA (Magnitude-Optimized Regression for Controlling Effects on Linguistic Acceptability), which introduces a new linking theory that addresses these challenges. Unlike SLOR, which applies static adjustments for length and unigram frequency, MORCELA estimates the optimal level of adjustment from data, using learned parameters specific to these effects. By incorporating parameters—β for unigram frequency and γ for sentence length—MORCELA adjusts the LM scores, resulting in improved correlation with human judgments. This approach better accounts for how LMs perceive the rarity of words and the length of sentences compared to human expectations. The core idea behind MORCELA is that not all language models should receive the same correction, as models differ in how well they predict linguistic acceptability.

Technical Overview

MORCELA works by incorporating parameters that are trained on human acceptability judgments. These parameters control the extent of correction applied to LM log probabilities, making MORCELA more adaptable than its predecessors like SLOR. Specifically, the learned parameter β adjusts the impact of unigram frequency, while γ controls the correction for sentence length. The flexibility of these adjustments allows MORCELA to better match human acceptability ratings, especially for larger models. For example, larger models, which tend to have a more nuanced understanding of language, often require less adjustment for unigram frequency due to their improved ability to predict less common words in context.

Performance and Significance

The significance of MORCELA becomes evident when considering its performance across different LM sizes. MORCELA outperformed SLOR in predicting human acceptability judgments for models from two well-known families: Pythia and OPT. Results showed that as models grew larger, MORCELA’s correlation with human judgments improved. The optimal parameter values estimated by MORCELA revealed that larger LMs are more robust to frequency and length effects, requiring less correction. This suggests that larger LMs have a better understanding of linguistic context, allowing them to predict the acceptability of rare words more accurately, thereby reducing the impact of unigram frequency as a confounding factor. MORCELA improved the correlation between LM-generated scores and human judgments by up to 46% compared to SLOR, demonstrating its ability to fine-tune corrections more precisely.

This advancement is important for several reasons. First, it suggests that current LMs may be more capable of reflecting human language processing than previously thought, provided the right corrections are applied. Second, the insights from MORCELA can be valuable in psycholinguistic studies that utilize LMs as proxies for human language comprehension. By providing a more accurate linking theory, MORCELA ensures that LMs are evaluated in a way that aligns more closely with human linguistic intuition. For instance, a key result from MORCELA’s implementation showed that larger LMs had a lower reliance on unigram frequency corrections, indicating that these models have a better grasp of less frequent, context-specific words. This characteristic could significantly impact how we interpret LMs in tasks involving rare or domain-specific language.

Conclusion

MORCELA represents an important development in aligning language models with human acceptability judgments. Using learned parameters to adjust dynamically for length and frequency addresses critical flaws in previous approaches like SLOR. The results show that, with proper adjustment, LMs can better reflect human linguistic intuition, particularly as the models scale in size. Future work could explore further adjustments or new parameters that could bring LMs even closer to human-like language understanding. MORCELA not only enhances the evaluation process for LMs but also provides valuable insights into how these models process language, bridging the gap between machine-generated probabilities and human language behavior.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI VIRTUAL CONFERENCE] SmallCon: Free Virtual GenAI Conference ft. Meta, Mistral, Salesforce, Harvey AI & more. Join us on Dec 11th for this free virtual event to learn what it takes to build big with small models from AI trailblazers like Meta, Mistral AI, Salesforce, Harvey AI, Upstage, Nubank, Nvidia, Hugging Face, and more.

Aswin AK is a consulting intern at MarkTechPost. He is pursuing his Dual Degree at the Indian Institute of Technology, Kharagpur. He is passionate about data science and machine learning, bringing a strong academic background and hands-on experience in solving real-life cross-domain challenges.

🐝🐝 Read this AI Research Report from Kili Technology on ‘Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques’

Source link

Previous post Artificial Intelligence AI and Quantum Computing: Transforming Computational Frontiers

Next post Ubitium Secures $3.7M to Revolutionize Computing with Universal RISC-V Processor

What is MLSecOps(Secure CI/CD for Machine Learning)?: Top MLSecOps Tools (2025)

Machine learning (ML) is transforming industries, powering innovation in domains as varied...

admin5 Mins read

OpenAI

Your LLM is 5x Slower Than It Should Be. The Reason? Pessimism—and Stanford Researchers Just Showed How to Fix It

In the fast-paced world of AI, large language models (LLMs) like GPT-4...

admin4 Mins read

OpenAI

Building a Reliable End-to-End Machine Learning Pipeline Using MLE-Agent and Ollama Locally

We begin this tutorial by showing how we can combine MLE-Agent with...

admin5 Mins read

OpenAI

Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers

Microsoft’s latest open source release, VibeVoice-1.5B, redefines the boundaries of text-to-speech (TTS)...

admin4 Mins read

This Week

Features, Benefits, Review and Alternatives • AI Parabellum

What is a Voice Agent in AI? Top 9 Voice Agent Platforms to Know (2025)

Large Language Models LLMs vs. Small Language Models SLMs for Financial Institutions: A 2025 Practical Enterprise AI Guide

Weekly Newsletter

MORCELA: A New AI Approach to Linking Language Models LM Scores with Human Acceptability Judgments

MORCELA: A New Linking Theory

Technical Overview

Performance and Significance

Conclusion

Leave a comment

Leave a Reply Cancel reply

Latest Posts

What is a Voice Agent in AI? Top 9 Voice Agent Platforms to Know (2025)

Large Language Models LLMs vs. Small Language Models SLMs for Financial Institutions: A 2025 Practical Enterprise AI Guide

Google AI Proposes Novel Machine Learning Algorithms for Differentially Private Partition Selection

I Tested Mydreamcompanion Video Generator for 1 Month

What is MLSecOps(Secure CI/CD for Machine Learning)?: Top MLSecOps Tools (2025)

Your LLM is 5x Slower Than It Should Be. The Reason? Pessimism—and Stanford Researchers Just Showed How to Fix It

Building a Reliable End-to-End Machine Learning Pipeline Using MLE-Agent and Ollama Locally

Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers

Get to Know Us

keep in touch