Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
Multimodal AI safety isn’t as stable as you think
New research from Appen reveals how model updates can introduce unexpected safety shifts across modalities, even in leading LLMs.
Multimodal LLMs are rapidly advancing, but their safety behavior isn’t static.
In this study, Appen evaluated multiple generations of leading multimodal models using adversarial prompts and human review. What we found challenges a common assumption:
Newer models don’t always behave more safely.
Some improved.
Some regressed.
Most changed in ways that aren’t obvious without structured evaluation.
Alignment can drift across model updates
Model upgrades introduced measurable shifts in how systems respond to adversarial inputs — not always in the direction you’d expect.
Multimodal doesn’t mean more robust
In some cases, text-only prompts were more effective than multimodal ones. In others, the gap disappeared entirely depending on the model generation.
The pattern isn’t consistent and that’s the risk.
“Safer” outputs can be misleading
Some models appear safer because they refuse more often, not because they generate better responses.
Understanding the difference is critical for real-world deployment.
If you’re building or deploying AI, this impacts you
- Safety behavior can change between model versions
- Vulnerabilities can shift across modalities
- Improvements in one area may introduce risks in another
Without consistent evaluation, these changes are easy to miss.
Download the full paper to explore:
- A two-phase evaluation across multiple model generations
- Adversarial testing across text and multimodal inputs
- Human-rated harmfulness at scale
- Model-by-model comparison of safety behavior
- What alignment drift means for enterprise AI teams
Measure what actually changes
As AI systems evolve, static benchmarks aren’t enough.
Appen helps teams evaluate real-world model behavior through:
- human-led red teaming
- multimodal evaluation
- longitudinal testing frameworks
See how safety actually changes across models
Download the full research report
