Research Team Status
- Michael W. Mahoney (Research Scientist)
- N. Benjamin Erichson (Research Scientist)
- Zhipeng Wei (Postdoc)
Project Goals
- This quarter, we expanded our focus to the study of information propagation and durability of prompt injections in multi-agent systems. In standard multi-agent settings, malicious injections are often diluted or neutralized by summarization agents, limiting their impact on final decision-making. This suggests that multi-agent architectures exhibit an inherent robustness to such attacks.
Accomplishments
- To challenge the limits of this robustness, we investigated how to design durable prompt injections that persist through multiple stages of agent interaction. We employed an optimization-based method inspired by the Greedy Coordinate Gradient (GCG) approach, tuning adversarial strings so that intermediate agents are explicitly instructed to preserve and propagate the injected content.
- Initial experiments demonstrate that these optimized injections successfully survive summarization and continue to influence downstream agents, highlighting both a novel threat model and a promising research direction.