This Week

OpenAI

Google Releases Mangle: A Programming Language for Deductive Database Programming

2 Mins read

DeepMind

AI Revives Speech After 25-Year Silence

1 Mins read

DeepMind

Roleplay AI Chatbot Apps with the Best Memory: Tested

4 Mins read

Weekly Newsletter

Excepteur sint occaecat cupidatat non proident

Home OpenAI IBM Releases Granite 3.3 8B: A New Speech-to-Text (STT) Model that Excels in Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST)

OpenAI

IBM Releases Granite 3.3 8B: A New Speech-to-Text (STT) Model that Excels in Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST)

adminUpdated 4 months Ago3 Mins read39 Views

IBM Releases Granite 3.3 8B: A New Speech-to-Text (STT) Model that Excels in Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST)

As artificial intelligence continues to integrate into enterprise systems, the demand for models that combine flexibility, efficiency, and transparency has increased. Existing solutions often struggle to meet all these requirements. Open-source models may lack domain-specific capabilities, while proprietary systems sometimes limit access or adaptability. This shortfall is especially pronounced in tasks involving speech recognition, logical reasoning, and retrieval-augmented generation (RAG), where technical fragmentation and toolchain incompatibility create operational bottlenecks.

IBM Releases Granite 3.3 with Updates in Speech, Reasoning, and Retrieval

IBM has introduced Granite 3.3, a set of openly available foundation models engineered for enterprise applications. This release delivers upgrades across three domains: speech processing, reasoning capabilities, and retrieval mechanisms. Granite Speech 3.3 8B is IBM’s first open speech-to-text (STT) and automatic speech translation (AST) model. It achieves higher transcription accuracy and improved translation quality compared to Whisper-based systems. The model is designed to handle long audio sequences with reduced artifact introduction, enhancing usability in real-world scenarios.

Granite 3.3 8B Instruct extends the capabilities of the core model with support for fill-in-the-middle (FIM) text generation and improvements in symbolic and mathematical reasoning. These enhancements are reflected in benchmark performance, including outperforming Llama 3.1 8B and Claude 3.5 Haiku on the MATH500 dataset.

Technical Foundations and Architecture

Granite Speech 3.3 8B uses a modular architecture consisting of a speech encoder and LoRA-based audio adapters. This design allows for efficient domain-specific fine-tuning while retaining the generalization capacity of the base model. The model supports both transcription and translation tasks, enabling cross-lingual content processing.

The Granite 3.3 Instruct models incorporate fill-in-the-middle generation, supporting tasks such as document editing and code completion. Alongside, IBM introduces five LoRA adapters tailored for RAG workflows. These adapters support better integration of external knowledge, improving factual accuracy and contextual relevance during generation.

A notable addition is adaptive LoRA (aLoRA), which reuses the key-value (KV) cache across inference sessions. This leads to a reduction in memory consumption and latency, particularly in streaming or multi-hop retrieval environments. aLoRA is designed to offer better trade-offs between computational overhead and performance in retrieval-heavy workloads.

Benchmark Results and Platform Support

Granite Speech 3.3 8B demonstrates superior performance over Whisper-style baselines in transcription and translation across multiple languages. The model performs reliably on extended audio inputs, maintaining coherence and accuracy without significant drift.

In symbolic reasoning, Granite 3.3 Instruct shows improved accuracy on the MATH500 benchmark, outperforming comparable models at the 8B parameter scale. The RAG-specific LoRA and aLoRA adapters demonstrate enhanced retrieval integration and grounding, which are critical for enterprise applications involving dynamic content and long-context queries.

IBM has made all models, LoRA variants, and associated tools open-source and accessible via Hugging Face. Additionally, deployment options are available through IBM’s watsonx.ai, as well as third-party platforms including Ollama, LMStudio, and Replicate.

Conclusion

Granite 3.3 marks a step forward in IBM’s effort to develop robust, modular, and transparent AI systems. The release targets critical needs in speech processing, logical inference, and retrieval-augmented generation by offering technical upgrades grounded in measurable improvements. The inclusion of aLoRA for memory-efficient retrieval, support for fill-in-the-middle tasks, and advancements in multilingual speech modeling make Granite 3.3 a technically sound choice for enterprise environments. Its open-source release further encourages adoption, experimentation, and continued development across the broader AI community.

Check out the Model Series on Hugging Face and Technical details. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

Source link

Previous post Google Unveils Gemini 2.5 Flash in Preview through the Gemini API via Google AI Studio and Vertex AI.

Next post OpenAI Releases a Practical Guide to Building LLM Agents for Real-World Applications

Latest Posts

DeepMind

OpenAI

How Do GPUs and TPUs Differ in Training Large Transformer Models? Top GPUs and TPUs with Benchmark

Both GPUs and TPUs play crucial roles in accelerating the training of...

admin4 Mins read

OpenAI

Google AI Introduced Guardrailed-AMIE (g-AMIE): A Multi-Agent Approach to Accountability in Conversational Medical AI

Recent advances in large language model (LLM)-powered diagnostic AI agents have yielded...

admin3 Mins read

OpenAI

A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations

def plot_advanced_forecasts(test_data, forecasts_dict, series_idx=0): """Advanced plotting with multiple models and uncertainty bands"""...

admin3 Mins read

This Week

Google Releases Mangle: A Programming Language for Deductive Database Programming

AI Revives Speech After 25-Year Silence

Roleplay AI Chatbot Apps with the Best Memory: Tested

Weekly Newsletter

IBM Releases Granite 3.3 8B: A New Speech-to-Text (STT) Model that Excels in Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST)

IBM Releases Granite 3.3 with Updates in Speech, Reasoning, and Retrieval

Technical Foundations and Architecture

Benchmark Results and Platform Support

Conclusion

Leave a comment

Leave a Reply Cancel reply

Latest Posts

AI Revives Speech After 25-Year Silence

Roleplay AI Chatbot Apps with the Best Memory: Tested

I Tested WriteHuman: Some Features Surprised Me

What Is Speaker Diarization? A 2025 Technical Guide: Top 9 Speaker Diarization Libraries and APIs in 2025

How to Implement the LLM Arena-as-a-Judge Approach to Evaluate Large Language Model Outputs

How Do GPUs and TPUs Differ in Training Large Transformer Models? Top GPUs and TPUs with Benchmark

Google AI Introduced Guardrailed-AMIE (g-AMIE): A Multi-Agent Approach to Accountability in Conversational Medical AI

A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations

Get to Know Us

keep in touch