LLM coding agents can be hijacked through poisoned skill documentation — and defenses miss 2.5% of attacks

What happened

Researchers found that malicious code hidden in skill documentation can execute automatically when LLM agents use those skills for normal tasks, bypassing existing safeguards. This means a single compromised skill in an open marketplace can give an attacker control over file writes, shell commands, and network requests on the host system.

Why this matters

LLM coding agents are starting to be deployed in production systems where they execute code with system-level privileges. Right now, the skill marketplaces that extend their capabilities have no mandatory security review — anyone can publish. This paper shows that the attack surface is larger than the field assumed: you don't need to trick the agent with a prompt, you can hide malicious logic in the documentation examples the agent naturally reuses. The gap is real: static analysis catches most cases, but 2.5% of attacks evade both detection and alignment safeguards. As these agents move into production, that 2.5% becomes a liability problem for whoever deploys them.

The signal

What happens next

Watch whether major LLM platforms (OpenAI, Anthropic, Google) add mandatory security review or sandboxing to their skill marketplaces within the next 12 months, or whether the first reported supply-chain compromise of a deployed agent comes first.