The world is being quietly rearranged by people who write very long documents.


April 2, 2026
arXiv
The title they went with
Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning Noisy translates that to

AI models that reason in silence can be hijacked through a single hidden vector — and token-level defenses miss it entirely


Researchers found a way to poison the internal reasoning of language models that work entirely in hidden states, leaving no visible tokens to audit. A single perturbation at the input layer gets amplified through the model's own reasoning process to reliably produce a chosen wrong answer, while appearing clean to every existing defense.
This exposes a fundamental vulnerability in the next generation of AI systems — the ones designed to reason privately, without showing their work. If a model's reasoning happens entirely in hidden space, you cannot see an attack happening, and you cannot defend against what you cannot see. The paper shows the attack survives fine-tuning, transfers to new tasks without retraining, and defeats every defense they tested. This matters because deployment of reasoning models is accelerating, and the security model everyone built assumes you can audit token outputs. You cannot audit what has no tokens.
What happens next
Watch whether production deployments of reasoning models (OpenAI o1, similar systems) add interpretability layers or monitoring for latent-space anomalies before scaling to high-stakes domains like medical diagnosis or financial decisions.

If you insist
Read the original →