Ten independent research teams published papers this week showing that AI agents can be compromised through their plugins, memory stores, debug logs, reasoning chains, and phone interfaces — and that existing safety audits test none of these. The attacks work because agents are built to trust the tools and data they read. That trust is not verified anywhere in the current review process.
The pattern
AI agents are useful precisely because they delegate work outward — to plugins, to memory, to external tools — and trust what comes back. Every paper here found an attack that enters through one of those delegation points. The safety testing that approved these systems evaluated the model in isolation, in a chat window, with no external inputs. The architecture changed. The audit did not. What's missing is not a patch — it's an entire category of security review that does not yet exist as standard practice for agentic systems.
Watch: Whether any major AI company revises its pre-deployment security checklist to include supply chain and memory-layer testing before the next major agent release.