Home OpenAI Building an Interactive Bilingual (Arabic and English) Chat Interface with Open Source Meraj-Mini by Arcee AI: Leveraging GPU Acceleration, PyTorch, Transformers, Accelerate, BitsAndBytes, and Gradio
OpenAI

Building an Interactive Bilingual (Arabic and English) Chat Interface with Open Source Meraj-Mini by Arcee AI: Leveraging GPU Acceleration, PyTorch, Transformers, Accelerate, BitsAndBytes, and Gradio

Share
Building an Interactive Bilingual (Arabic and English) Chat Interface with Open Source Meraj-Mini by Arcee AI: Leveraging GPU Acceleration, PyTorch, Transformers, Accelerate, BitsAndBytes, and Gradio
Share


In this tutorial, we implement a Bilingual Chat Assistant powered by Arcee’s Meraj-Mini model, which is deployed seamlessly on Google Colab using T4 GPU. This tutorial showcases the capabilities of open-source language models while providing a practical, hands-on experience in deploying state-of-the-art AI solutions within the constraints of free cloud resources. We’ll utilise a powerful stack of tools including:

  1. Arcee’s Meraj-Mini model
  2. Transformers library for model loading and tokenization
  3. Accelerate and bitsandbytes for efficient quantization
  4. PyTorch for deep learning computations
  5. Gradio for creating an interactive web interface
# Enable GPU acceleration
!nvidia-smi --query-gpu=name,memory.total --format=csv


# Install dependencies
!pip install -qU transformers accelerate bitsandbytes
!pip install -q gradio

First we enable GPU acceleration by querying the GPU’s name and total memory using the nvidia-smi command. It then installs and updates key Python libraries—such as transformers, accelerate, bitsandbytes, and gradio—to support machine learning tasks and deploy interactive applications.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig


quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)




model = AutoModelForCausalLM.from_pretrained(
    "arcee-ai/Meraj-Mini",
    quantization_config=quant_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("arcee-ai/Meraj-Mini")

Then we configures 4-bit quantization settings using BitsAndBytesConfig for efficient model loading, then loads the “arcee-ai/Meraj-Mini” causal language model along with its tokenizer from Hugging Face, automatically mapping devices for optimal performance.

chat_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
    do_sample=True
)

Here we create a text generation pipeline tailored for chat interactions using Hugging Face’s pipeline function. It configures maximum new tokens, temperature, top_p, and repetition penalty to balance diversity and coherence during text generation.

def format_chat(messages):
    prompt = ""
    for msg in messages:
        prompt += f"<|im_start|>{msg['role']}n{msg['content']}<|im_end|>n"
    prompt += "<|im_start|>assistantn"
    return prompt


def generate_response(user_input, history=[]):
    history.append({"role": "user", "content": user_input})
    formatted_prompt = format_chat(history)
    output = chat_pipeline(formatted_prompt)[0]['generated_text']
    assistant_response = output.split("<|im_start|>assistantn")[-1].split("<|im_end|>")[0]
    history.append({"role": "assistant", "content": assistant_response})
    return assistant_response, history

We define two functions to facilitate a conversational interface. The first function formats a chat history into a structured prompt with custom delimiters, while the second appends a new user message, generates a response using the text-generation pipeline, and updates the conversation history accordingly.

import gradio as gr


with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox(label="Message")
    clear = gr.Button("Clear History")
   
    def respond(message, chat_history):
        response, _ = generate_response(message, chat_history.copy())
        return response, chat_history + [(message, response)]


    msg.submit(respond, [msg, chatbot], [msg, chatbot])
    clear.click(lambda: None, None, chatbot, queue=False)


demo.launch(share=True)

Finally, we build a web-based chatbot interface using Gradio. It creates UI elements for chat history, message input, and a clear history button, and defines a response function that integrates with the text-generation pipeline to update the conversation. Finally, the demo is launched with sharing enabled for public access.


Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 80k+ ML SubReddit.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

By submitting this form, you are consenting to receive marketing emails and alerts from: techaireports.com. You can revoke your consent to receive emails at any time by using the Unsubscribe link, found at the bottom of every email.

Latest Posts

Related Articles
HybridNorm: A Hybrid Normalization Strategy Combining Pre-Norm and Post-Norm Strengths in Transformer Architectures
OpenAI

HybridNorm: A Hybrid Normalization Strategy Combining Pre-Norm and Post-Norm Strengths in Transformer Architectures

Transformers have revolutionized natural language processing as the foundation of large language...

This AI Paper Introduces R1-Searcher: A Reinforcement Learning-Based Framework for Enhancing LLM Search Capabilities
OpenAI

This AI Paper Introduces R1-Searcher: A Reinforcement Learning-Based Framework for Enhancing LLM Search Capabilities

Large language models (LLMs) models primarily depend on their internal knowledge, which...

Google AI Releases Gemma 3: Lightweight Multimodal Open Models for Efficient and On‑Device AI
OpenAI

Google AI Releases Gemma 3: Lightweight Multimodal Open Models for Efficient and On‑Device AI

In the field of artificial intelligence, two persistent challenges remain. Many advanced...

Hugging Face Releases OlympicCoder: A Series of Open Reasoning AI Models that can Solve Olympiad-Level Programming Problems
OpenAI

Hugging Face Releases OlympicCoder: A Series of Open Reasoning AI Models that can Solve Olympiad-Level Programming Problems

In the realm of competitive programming, both human participants and artificial intelligence...