Work-in-Progress: Towards Adaptive Contextual Safety in Multi-Modal LLMs

ABSTRACT

Multi-modal Large Language Models (MLLMs) have demonstrated strong capabilities across a broad spectrum of visual reasoning tasks; however, their susceptibility to safety risks remains a critical challenge. Existing efforts have largely concentrated on jailbreak defenses that aim to identify and refuse explicitly harmful inputs. But such approaches often fall short in addressing contextual safety, where models must differentiate between scenarios that are visually or linguistically similar yet diverge fundamentally in user intent and safety implications. In this work, we introduce MM-SafetyBench++, a carefully constructed benchmark designed for evaluating contextual safety in multi-modal settings. For each unsafe image–text instance, we create a paired safe counterpart through minimal, intent-altering modifications that preserve the original contextual semantics. This paired design enables controlled and fine-grained assessment of whether models can adapt their safety behaviors basedon contextual understanding rather than surface-level cues. In addition, we propose EchoSafe, a training-free framework that equips MLLMs with a self-reflective memory mechanism for accumulating and retrieving safety-relevant insights from prior interactions. By incorporating contextually similar past experiences into the inference prompt, EchoSafe facilitates context-aware safety reasoning and supports the continual refinement of safety behavior during deployment. Extensive evaluations across multiple multi-modal safety benchmarks show that EchoSafe consistently improves contextual safety performance while maintaining response quality, establishing a strong and practical baseline for advancing contextual safety in MLLMs. All benchmark resources and code will be released publicly upon acceptance.

BIO

Ce Zhang is a Ph.D. student at the Robotics Institute, Carnegie Mellon University (CMU), advised by Prof. Katia Sycara. His research focuses on vision-language models, with particular emphasis on their safety, efficiency, and responsible deployment. Prior to his Ph.D., he received his M.S. in Machine Learning from CMU and his B.Eng. in Communication Engineering from Southern University of Science and Technology (SUSTech).

Submitted by Katie Dey on