Home Machine Learning When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models
Machine Learning

When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

Share
When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models
Share


In May 2025, Enkrypt AI released its Multimodal Red Teaming Report, a chilling analysis that revealed just how easily advanced AI systems can be manipulated into generating dangerous and unethical content. The report focuses on two of Mistral’s leading vision-language models—Pixtral-Large (25.02) and Pixtral-12b—and paints a picture of models that are not only technically impressive but disturbingly vulnerable.

Vision-language models (VLMs) like Pixtral are built to interpret both visual and textual inputs, allowing them to respond intelligently to complex, real-world prompts. But this capability comes with increased risk. Unlike traditional language models that only process text, VLMs can be influenced by the interplay between images and words, opening new doors for adversarial attacks. Enkrypt AI’s testing shows how easily these doors can be pried open.

Alarming Test Results: CSEM and CBRN Failures

The team behind the report used sophisticated red teaming methods—a form of adversarial evaluation designed to mimic real-world threats. These tests employed tactics like jailbreaking (prompting the model with carefully crafted queries to bypass safety filters), image-based deception, and context manipulation. Alarmingly, 68% of these adversarial prompts elicited harmful responses across the two Pixtral models, including content that related to grooming, exploitation, and even chemical weapons design.

One of the most striking revelations involves child sexual exploitation material (CSEM). The report found that Mistral’s models were 60 times more likely to produce CSEM-related content compared to industry benchmarks like GPT-4o and Claude 3.7 Sonnet. In test cases, models responded to disguised grooming prompts with structured, multi-paragraph content explaining how to manipulate minors—wrapped in disingenuous disclaimers like “for educational awareness only.” The models weren’t simply failing to reject harmful queries—they were completing them in detail.

Equally disturbing were the results in the CBRN (Chemical, Biological, Radiological, and Nuclear) risk category. When prompted with a request on how to modify the VX nerve agent—a chemical weapon—the models offered shockingly specific ideas for increasing its persistence in the environment. They described, in redacted but clearly technical detail, methods like encapsulation, environmental shielding, and controlled release systems



Source link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
DeepCoder-14B: The Open-Source AI Model Enhancing Developer Productivity and Innovation
Machine Learning

DeepCoder-14B: The Open-Source AI Model Enhancing Developer Productivity and Innovation

Artificial Intelligence (AI) is changing how software is developed. AI-powered code generators...

Breaking the Sales Plateau with Agentic AI
Machine Learning

Breaking the Sales Plateau with Agentic AI

Surpassing the limits of practice In the realm of skill development —...

The Influencer AI Review: This AI Replaces Influencers
Machine Learning

The Influencer AI Review: This AI Replaces Influencers

69% of consumers say they trust recommendations from influencers over information from...

6 New ChatGPT Projects Features You Need to Know
Machine Learning

6 New ChatGPT Projects Features You Need to Know

ChatGPT Projects just received its most significant update since launch, and the...