Large Language Models (LLMs) have gained significant attention in AI research due to their impressive capabilities. However, their limitation lies with long-term planning and complex problem-solving. While explicit search methods like Monte Carlo Tree Search (MCTS) have been employed to enhance decision-making in various AI systems, including chess engines and game-playing algorithms, they present challenges when applied to LLMs. The recursive use of value models during searching leads to error accumulation and increased computational costs, especially for long-horizon tasks. So, it is necessary to enable LLMs to predict and utilize future information without depending on explicit search methods, aiming to improve their performance on complex tasks that require long-term planning and decision-making.
Existing methods to address the challenges in AI-powered chess and decision-making systems include neural networks for chess, diffusion models, and world models. In chess AI, the field has evolved from handcrafted search algorithms and heuristics to neural network-based approaches. AlphaZero marked a significant shift in using deep reinforcement learning with MCTS to develop its own heuristics. Diffusion models have emerged as a powerful class of generative models applied to various fields, including image and text generation, and reinforcement learning. Further, World models in model-based reinforcement learning aim to capture environment dynamics and predict future outcomes, however, conventional world models often rely on single-step prediction, leading to compounding errors.
This paper introduces a method, called DIFFUSEARCH, which performs an implicit search by predicting future states using discrete diffusion modeling. This method is applied to the chess game, a domain where explicit search has traditionally been considered essential. Moreover, DIFFUSEARCH shows superior performance when compared to searchless policies and those enhanced by explicit search techniques. It also outperforms the one-step policy by 19.2% and the Monte Carlo Tree Search (MCTS)-enhanced policy by 14% in action accuracy. Further, the model shows an improvement of 30% in puzzle-solving capabilities compared to explicit search methods, with a substantial 540 Elo rating increase in evaluating game-playing strength.
DIFFUSEARCH’s architecture is based on a decoder-only GPT-2 transformer model, modified to use full attention instead of causal attention. It is compared with three baseline Transformer models, (a) State-action (S-A), (b) State-value (S-V), and (c) Action-value (SA-V), where the S-A and S-V models are integrated into Monte Carlo Tree Search (MCTS) following the AlphaZero approach for comparison. Diffusion models, including DIFFUSEARCH, are trained for a maximum of 200 epochs due to their slower convergence rate, which allows for a rigorous comparison between DIFFUSEARCH and existing approaches. Moreover, three metrics to evaluate the policies are Action Accuracy, Puzzle Accuracy, and Tournament Elo where the Elo ratings are calculated using BayesElo.
DIFFUSEARCH demonstrates remarkable performance improvements compared to baseline models in prediction accuracy, and playing strength. The model outperforms the (S-A) model by a significant margin of 653 Elo points and 19% in action accuracy, highlighting its effectiveness in enhancing next action prediction through future forecasting. Further, it achieves 10% higher action accuracy than the (SA-V) model, despite using 20 times less training data. Compared to the MCTS-based agent, DIFFUSEARCH shows superior performance with a 542 Elo rating increase and a 14% improvement in action accuracy. This highlights the model’s ability to simulate multi-step scenarios, exceeding the MCTS-enhanced policy that relies on a carefully balanced combination of policy and value models.
In conclusion, the paper presents DIFFUSEARCH, a model that shows the potential shift from explicit search on one-step policies to implicit search within future-aware policies in the chess domain. DIFFUSEARCH outperforms both searchless policies and those enhanced by explicit search methods, as evidenced by experiments and analyses. The principles and techniques developed in this controlled task can be applied to natural language settings, improving current next-token prediction in LLMs. However, DIFFUSEARCH depends on an oracle (Stockfish) for future supervision, and integrating it with self-play techniques could be an exciting direction for future work. Also, the model’s search depth is limited by context length, so, adopting long-context models could enable more efficient training and deeper searches.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.
[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.
Leave a comment