Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Multimodal AI safety isn’t as stable as you think

New research from Appen reveals how model updates can introduce unexpected safety shifts across modalities, even in leading LLMs.

Multimodal LLMs are rapidly advancing, but their safety behavior isn’t static.

In this study, Appen evaluated multiple generations of leading multimodal models using adversarial prompts and human review. What we found challenges a common assumption:

Newer models don’t always behave more safely.

Some improved.
Some regressed.
Most changed in ways that aren’t obvious without structured evaluation.

Alignment can drift across model updates
Model upgrades introduced measurable shifts in how systems respond to adversarial inputs — not always in the direction you’d expect.

Multimodal doesn’t mean more robust
In some cases, text-only prompts were more effective than multimodal ones. In others, the gap disappeared entirely depending on the model generation.

The pattern isn’t consistent and that’s the risk.

“Safer” outputs can be misleading
Some models appear safer because they refuse more often, not because they generate better responses.

Understanding the difference is critical for real-world deployment.

If you’re building or deploying AI, this impacts you

Safety behavior can change between model versions
Vulnerabilities can shift across modalities
Improvements in one area may introduce risks in another

Without consistent evaluation, these changes are easy to miss.

Download the full paper to explore:

A two-phase evaluation across multiple model generations
Adversarial testing across text and multimodal inputs
Human-rated harmfulness at scale
Model-by-model comparison of safety behavior
What alignment drift means for enterprise AI teams

Measure what actually changes
As AI systems evolve, static benchmarks aren’t enough.

Appen helps teams evaluate real-world model behavior through:

human-led red teaming
multimodal evaluation
longitudinal testing frameworks

See how safety actually changes across models

Download the full research report

White Paper from

Complete the form below to download the content:

You have been directed to this site by Global IT Research. For more details on our information practices, please see our Privacy Policy, and by accessing this content you agree to our Terms of Use. You can unsubscribe at any time.

If your Download does not start Automatically, Click Download Whitepaper

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Complete the form below to download the content:

Related Articles

PBX – 7 benefits of switching from on-premises PBX to the cloud

Driving Efficiency and Better Patient Experience in the Healthcare Industry Through the Use of AI

Get AI-Ready with Microsoft Fabric: The Future of Intelligent, Scalable Data Platforms