Prompt Injection

An unsolved challenge in LLM applications

whoami

Donato Capitella
Software Engineer and Principal Security Consultant at WithSecure (Cyber Security Consultancy)
But mostly a tech enthusiast who likes to discover how things work by breaking them apart.
Recent YouTube channel on AI: LLM Chronicles

If building or testing LLM-powered applications, you'll learn:

How attackers leverage prompt injection / jailbreaking
What's the impact when LLMs are given access to tools/plugins
Guidelines to secure your LLM applications against injection/jailbreaking

Threat Landscape

MITRE ATLAS™

(Adversarial Threat Landscape for Artificial-Intelligence Systems)

Terminology

Adversarial Prompts

Google Docs - Summary

Browser Extensions

Access to Tools / Plugins

Prompt Injection can become a very serious issue when LLMs are given access to tools/plugins

*Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629

Case Studies

Hands-on examples of Prompt Injection with synthetic Langchain applications.

Bank Agent

Example of Direct Prompt Injection with a chat agent that helps bank customers with their transactions.

Email Agent

Example of Indirect Prompt Injection with a chat agent that helps users with their mailbox.

Defense Strategies

Prompt Injection

UNSOLVED CHALLENGE?

References

Artificial Intelligence Threat Landscape:
- MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems)
- OWASP Top 10 for Large Language Model Applications
Attack Techniques:
- Johann Rehberger.
  - Google Docs AI Features: Vulnerabilities and Risks
  - Prompt Injection in the Wild
- Kai Greshake at al. (2023) Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- Donato Capitella. Synthetic Recollections - A Case Study in Prompt Injection for ReAct LLM Agents
- Zou, A. et al. (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models
Defence Techniques:
- OWASP. Content Security Policy. Defence-in-depth measure to "sandbox" and control what resources and code a web page can load and run
- Paul Röttger. http://SafetyPrompts.com: catalogue of open datasets for evaluating and improving LLM safety
- Jingwei Yi et al. (2023) Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
WithSecure. Damn Vulnerable LLM Agent (GitHub, sample application to test prompt injection)