NOW BEING MEASURED
· 10 items
· April 9, 2026
Measurement Finally Reaching the Unmeasured
Across medicine, AI, law enforcement, and development economics, systems have been running on untested assumptions for decades. This week, several of those assumptions got their first rigorous tests — and most failed. The harder question is not what the measurements found, but who absorbed the cost of the error while no one was looking.
10 documents
arXiv
EU regulators just said 'accurate enough' is a choice, not a fact — and nobody agreed on what it means
arXiv
Monitoring deforestation in real time cut murders in the Amazon by 15 percent
arXiv
Your smartphone camera can now measure body fat as accurately as clinical machines
arXiv
How to test if a medical AI is actually getting better, or just drifting with the data
arXiv
Language models refuse to stereotype on obvious tests but reliably stereotype on hidden ones
arXiv
AI coding agents score 20 points higher on today's tests than they actually perform in real software projects
arXiv
A kidney disease screening tool built for South Asia actually works there — existing tools don't
arXiv
Smallholder maize farmers in Ghana earn 5% more profit using tractor services — but most can't afford them
arXiv
Telling people their neighbors have toilets actually works — Indian sanitation jumped 8 points
The pattern
The structural driver is the same in each case: a system was built, deployed, and evaluated using a proxy metric that was easier to collect than the thing it was supposed to represent. Triage scores stood in for fairness. Benchmark scores stood in for real-world performance. BMI stood in for metabolic risk. The proxies persisted not because they were accurate but because measuring the underlying reality was harder or more politically inconvenient. What remains unknown is how many current policies, medical protocols, and regulatory decisions are still resting on proxies that have simply not yet been tested against the thing they claim to measure.
Watch: Track whether the FDA issues a formal methodology for evaluating medical AI update safety within the next 12 weeks — its absence would confirm that the measurement gap in that domain is structural, not accidental.