Researchers claim breakthrough in fight against AI’s frustrating security hole

April 23, 2025

Since 2022, prompt injection, a vulnerability where malicious instructions override AI system behavior, has plagued large language models (LLMs). No reliable solution existed until Google DeepMind introduced CaMeL (CApabilities for MachinE Learning), a novel approach that shifts away from self-policing AI models. Instead, CaMeL treats LLMs as untrusted components within a secure software framework, using established security principles like Control Flow Integrity, Access Control, and Information Flow Control.

Prompt injections occur because LLMs cannot distinguish trusted user commands from malicious content in their context window, leading to exploits like misdirected emails or unauthorized actions. CaMeL addresses this with dual-LLM architecture: a privileged LLM (P-LLM) generates code for user instructions, while a quarantined LLM (Q-LLM) parses untrusted data without execution privileges. This separation ensures malicious content cannot influence actions. CaMeL converts prompts into secure Python code, monitored by an interpreter that tracks data flow and enforces security policies, akin to preventing contaminated water from spreading.

Tested on the AgentDojo benchmark, CaMeL resisted previously unsolvable attacks and showed potential to mitigate insider threats and data exfiltration. However, it requires users to define and maintain security policies, which could lead to user fatigue and approval complacency. While not perfect, CaMeL’s principled approach marks a significant step toward secure AI assistants, with hopes for future refinement to balance security and usability.

More from Blackwired

June 11, 2025

Hacktivist Groups Transition to Ransomware-as-a-Service Operations

Hacktivist groups shift to ransomware as motives blur, driven by profit and easier access to malware tools around 2024.

Read more
June 4, 2025

New Browser Exploit Technique Undermines Phishing Detection

New phishing method exploits browser fullscreen mode, especially in Safari, to steal logins without showing the true URL.

Read more
May 28, 2025

Scattered Spider snared financial orgs before targeting shops in Britain, America

Scattered Spider resurfaces, hitting UK/US retailers; next targets may be crypto firms via social engineering attacks.

Read more