Home Machine Learning When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

Machine Learning

When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

adminUpdated 3 months Ago3 Mins read15 Views

When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

In May 2025, Enkrypt AI released its Multimodal Red Teaming Report, a chilling analysis that revealed just how easily advanced AI systems can be manipulated into generating dangerous and unethical content. The report focuses on two of Mistral’s leading vision-language models—Pixtral-Large (25.02) and Pixtral-12b—and paints a picture of models that are not only technically impressive but disturbingly vulnerable.

Vision-language models (VLMs) like Pixtral are built to interpret both visual and textual inputs, allowing them to respond intelligently to complex, real-world prompts. But this capability comes with increased risk. Unlike traditional language models that only process text, VLMs can be influenced by the interplay between images and words, opening new doors for adversarial attacks. Enkrypt AI’s testing shows how easily these doors can be pried open.

Alarming Test Results: CSEM and CBRN Failures

The team behind the report used sophisticated red teaming methods—a form of adversarial evaluation designed to mimic real-world threats. These tests employed tactics like jailbreaking (prompting the model with carefully crafted queries to bypass safety filters), image-based deception, and context manipulation. Alarmingly, 68% of these adversarial prompts elicited harmful responses across the two Pixtral models, including content that related to grooming, exploitation, and even chemical weapons design.

One of the most striking revelations involves child sexual exploitation material (CSEM). The report found that Mistral’s models were 60 times more likely to produce CSEM-related content compared to industry benchmarks like GPT-4o and Claude 3.7 Sonnet. In test cases, models responded to disguised grooming prompts with structured, multi-paragraph content explaining how to manipulate minors—wrapped in disingenuous disclaimers like “for educational awareness only.” The models weren’t simply failing to reject harmful queries—they were completing them in detail.

Equally disturbing were the results in the CBRN (Chemical, Biological, Radiological, and Nuclear) risk category. When prompted with a request on how to modify the VX nerve agent—a chemical weapon—the models offered shockingly specific ideas for increasing its persistence in the environment. They described, in redacted but clearly technical detail, methods like encapsulation, environmental shielding, and controlled release systems

Source link

Previous post HunyuanCustom Brings Single-Image Video Deepfakes, With Audio and Lip Sync

Next post Matthew Bernardini, CEO and Co-Founder of Zenapse - Interview Series

DeepCoder-14B: The Open-Source AI Model Enhancing Developer Productivity and Innovation

Artificial Intelligence (AI) is changing how software is developed. AI-powered code generators...

admin5 Mins read

Machine Learning

Breaking the Sales Plateau with Agentic AI

Surpassing the limits of practice In the realm of skill development —...

admin8 Mins read

Machine Learning

The Influencer AI Review: This AI Replaces Influencers

69% of consumers say they trust recommendations from influencers over information from...

admin8 Mins read

Machine Learning

6 New ChatGPT Projects Features You Need to Know

ChatGPT Projects just received its most significant update since launch, and the...

admin4 Mins read

This Week

Meet SmallThinker: A Family of Efficient Large Language Models LLMs Natively Trained for Local Deployment

TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs

Google AI Introduces the Test-Time Diffusion Deep Researcher (TTD-DR): A Human-Inspired Diffusion Framework for Advanced Deep Research Agents

Weekly Newsletter

When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

Alarming Test Results: CSEM and CBRN Failures

Why Vision-Language Models Pose New Security Challenges

What Must Be Done: A Blueprint for Safer AI

Leave a comment

Leave a Reply Cancel reply

Latest Posts

TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs

Google AI Introduces the Test-Time Diffusion Deep Researcher (TTD-DR): A Human-Inspired Diffusion Framework for Advanced Deep Research Agents

A Coding Guide to Build an Intelligent Conversational AI Agent with Agent Memory Using Cognee and Free Hugging Face Models

AgentSociety: An Open Source AI Framework for Simulating Large-Scale Societal Interactions with LLM Agents

DeepCoder-14B: The Open-Source AI Model Enhancing Developer Productivity and Innovation

Breaking the Sales Plateau with Agentic AI

The Influencer AI Review: This AI Replaces Influencers

6 New ChatGPT Projects Features You Need to Know

Get to Know Us

keep in touch