Large language models built into scientific writing tools produce citations with wrong details about 50% of the time, even when they can search the web. A two-stage process that first generates citations, then revises them against actual databases cuts that failure rate to 22%, meaning most citations now check out completely.
Why this matters
Scientists increasingly use AI to write papers and generate reference lists. Right now, those AI systems confidently cite papers that don't exist, cite real papers with wrong volume numbers or page ranges, or cite the wrong paper entirely — mistakes that would make the paper itself unreliable or unpublishable. This paper demonstrates the problem at scale (nearly 23,000 citation attempts) and shows a concrete fix: pipe AI outputs through a verification step against CrossRef and Zotero, the databases librarians use. The fix is mechanical, not architectural — it doesn't require smarter AI, just deterministic lookup after generation. This matters because it suggests a whole class of AI-in-publishing problems might be solvable not by training better models but by building better handoffs between AI and databases that already have the truth.
The signal
What happens next
Track whether journals and preprint servers actually adopt two-stage verification pipelines in their submission systems — the tool exists, but adoption decides whether this stays academic or becomes infrastructure.