OpenAI

1595 Articles
Advancing MLLM Alignment Through MM-RLHF: A Large-Scale Human Preference Dataset for Multimodal Tasks
OpenAI

Advancing MLLM Alignment Through MM-RLHF: A Large-Scale Human Preference Dataset for Multimodal Tasks

Multimodal Large Language Models (MLLMs) have gained significant attention for their ability to handle complex tasks involving vision, language, and audio integration. However,...

DeepSeek AI Introduces NSA: A Hardware-Aligned and Natively Trainable Sparse Attention Mechanism for Ultra-Fast Long-Context Training and Inference
OpenAI

DeepSeek AI Introduces NSA: A Hardware-Aligned and Natively Trainable Sparse Attention Mechanism for Ultra-Fast Long-Context Training and Inference

In recent years, language models have been pushed to handle increasingly long contexts. This need has exposed some inherent problems in the standard...

Moonshot AI Research Introduce Mixture of Block Attention (MoBA): A New AI Approach that Applies the Principles of Mixture of Experts (MoE) to the Attention Mechanism
OpenAI

Moonshot AI Research Introduce Mixture of Block Attention (MoBA): A New AI Approach that Applies the Principles of Mixture of Experts (MoE) to the Attention Mechanism

Efficiently handling long contexts has been a longstanding challenge in natural language processing. As large language models expand their capacity to read, comprehend,...

Microsoft AI Releases OmniParser V2: An AI Tool that Turns Any LLM into a Computer Use Agent
OpenAI

Microsoft AI Releases OmniParser V2: An AI Tool that Turns Any LLM into a Computer Use Agent

In the realm of artificial intelligence, enabling Large Language Models (LLMs) to navigate and interact with graphical user interfaces (GUIs) has been a...

Mistral AI Introduces Mistral Saba: A New Regional Language Model Designed to Excel in Arabic and South Indian-Origin Languages such as Tamil
OpenAI

Mistral AI Introduces Mistral Saba: A New Regional Language Model Designed to Excel in Arabic and South Indian-Origin Languages such as Tamil

As artificial intelligence (AI) continues to gain traction across industries, one persistent challenge remains: creating language models that truly understand the diversity of...

ViLa-MIL: Enhancing Whole Slide Image Classification with Dual-Scale Vision-Language Multiple Instance Learning
OpenAI

ViLa-MIL: Enhancing Whole Slide Image Classification with Dual-Scale Vision-Language Multiple Instance Learning

Whole Slide Image (WSI) classification in digital pathology presents several critical challenges due to the immense size and hierarchical nature of WSIs.  WSIs...

A Stepwise Python Code Implementation to Create Interactive Photorealistic Faces with NVIDIA StyleGAN2‑ADA
OpenAI

A Stepwise Python Code Implementation to Create Interactive Photorealistic Faces with NVIDIA StyleGAN2‑ADA

In this tutorial, we will do an in-depth, interactive exploration of NVIDIA’s StyleGAN2‑ADA PyTorch model, showcasing its powerful capabilities for generating photorealistic images....

All You Need to Know about Vision Language Models VLMs: A Survey Article
OpenAI

All You Need to Know about Vision Language Models VLMs: A Survey Article

Vision Language Models have been a revolutionizing milestone in the development of language models, which overcomes the shortcomings of predecessor pre-trained LLMs like...