Home OpenAI Are LLMs Failing to Match with Suffix in Fill-in-the-Middle (FIM) Code Completion? Horizon-Length Prediction: A New AI Training Task to Advance FIM by Teaching LLMs to Plan Ahead over Arbitrarily Long Horizons
OpenAI

Are LLMs Failing to Match with Suffix in Fill-in-the-Middle (FIM) Code Completion? Horizon-Length Prediction: A New AI Training Task to Advance FIM by Teaching LLMs to Plan Ahead over Arbitrarily Long Horizons

Share
Are LLMs Failing to Match with Suffix in Fill-in-the-Middle (FIM) Code Completion? Horizon-Length Prediction: A New AI Training Task to Advance FIM by Teaching LLMs to Plan Ahead over Arbitrarily Long Horizons
Share


While writing the code for any program or algorithm, developers can struggle to fill gaps in incomplete code and often make mistakes while trying to fit new pieces into existing code snippets or structures. These challenges arise from the difficulty of fitting the latest code with the prior and following parts, especially when the broader part of the context is not taken into consideration. In recent years, Fill-in-the-Middle (FIM) has become integral to code language models, enabling the generation of missing code given both left and right contexts. Currently, the Fill-in-the-Middle (FIM) model works by rearranging code sequences and using next-token prediction (NTP) to fill the gaps in incomplete code. FIM also requires planning capabilities and lack of it can hinder the prediction of the missing code.

The current methods for FIM rely mainly on NLP techniques in order to estimate the missing part of the code and rely on reordering training sequences and performing next-token prediction (NTP). However,  these methods don’t work well in real-world coding scenarios because they rely on strict rules, like generating the exact number of lines present in the original code, etc. Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. Standard NTP training does not efficiently prepare models for this long-horizon planning task. Consequently, models often struggle to maintain coherence over the longer sequences required in FIM, particularly when approaching the transition to the right context. We believe that next-token prediction (NTP) alone doesn’t help models plan well enough when dealing with the distant part of the code that comes after the missing section, which is crucial for generating accurate code in the middle.

To mitigate this issue, an auxiliary training objective, namely horizon-length prediction (HLP) is added, to improve the planning capabilities of LLMs over long horizons. Specifically, given the hidden state of current token, the model is tasked by HLP to predict the number of future tokens required to complete the middle.

To solve this problem researchers from the University of Illinois Urbana-Champaign and AWS-AI Labs collaborated to propose Horizon-Length Prediction (HLP), as an efficient solution. HLP is a novel training approach that teaches models to predict the number of remaining middle tokens (horizon length) at each step. It is implemented as a linear layer on top of the transformer model with weight, whose input is the hidden state from the last attention layer. It improves Fill-in-the-Middle (FIM) by teaching models to plan and consider broader part. This helps the models naturally learn how to fill in gaps from any left and right code sections, without needing special rules or extra adjustments. Unlike rule-based post-processing, HLP is generalizable as it does not require any task-specific knowledge.

The evaluation conducted by the researchers also shows that HLP not only improves code in filling by up to 24% across various benchmarks without using any rule-based and/or dataset-specific post-processing but also enhances performance on code reasoning. They also found HLP super efficient as it only incurs negligible training overhead while not adding any inference overhead. In addition, HLP adds minimal overhead during training and no additional cost during inference, making it practical for real-world applications.

In conclusion, this paper introduces Horizon-Length Prediction (HLP), a novel training objective designed to enhance Fill-in-the-Middle (FIM) capabilities in code language models. By teaching models to predict the number of remaining tokens, HLP significantly improves the planning and coherence of generated code, achieving up to 24% performance gains on diverse benchmarks without relying on restrictive post-processing methods. Moreover, the enhanced planning capability acquired through HLP training also boosts models’ performance on code reasoning tasks, suggesting that HLP may broadly improve language models’ reasoning capabilities. Besides, HLP is also efficient as it does not cause any inference overhead and the training overhead is negligible as well. This research marks a significant step in developing more effective code language models for real-world applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit

[Upcoming Event- Oct 17 202] RetrieveX – The GenAI Data Retrieval Conference (Promoted)


Divyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges.





Source link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

By submitting this form, you are consenting to receive marketing emails and alerts from: techaireports.com. You can revoke your consent to receive emails at any time by using the Unsubscribe link, found at the bottom of every email.

Latest Posts

Related Articles
s1: A Simple Yet Powerful Test-Time Scaling Approach for LLMs
OpenAI

s1: A Simple Yet Powerful Test-Time Scaling Approach for LLMs

Language models (LMs) have significantly progressed through increased computational power during training,...

Meta AI Introduces MILS: A Training-Free Multimodal AI Framework for Zero-Shot Image, Video, and Audio Understanding
OpenAI

Meta AI Introduces MILS: A Training-Free Multimodal AI Framework for Zero-Shot Image, Video, and Audio Understanding

Large Language Models (LLMs) are primarily designed for text-based tasks, limiting their...

Enhancing Mobile Ad Hoc Network Security: A Hybrid Deep Learning Model for Flooding Attack Detection
OpenAI

Enhancing Mobile Ad Hoc Network Security: A Hybrid Deep Learning Model for Flooding Attack Detection

Ad hoc networks are decentralized, self-configuring networks where nodes communicate without fixed...

4 Open-Source Alternatives to OpenAI’s 0/Month Deep Research AI Agent
OpenAI

4 Open-Source Alternatives to OpenAI’s $200/Month Deep Research AI Agent

OpenAI’s Deep Research AI Agent offers a powerful research assistant at a...