NOW BEING MEASURED · 10 items · April 9, 2026

We finally measured things everyone assumed we understood

Across medicine, AI, law enforcement, and development economics, systems have been running on untested assumptions for decades. This week, several of those assumptions got their first rigorous tests — and most failed. The harder question is not what the measurements found, but who absorbed the cost of the error while no one was looking.

10 documents

arXiv EU regulators just said 'accurate enough' is a choice, not a fact — and nobody agreed on what it means

arXiv Monitoring deforestation in real time cut murders in the Amazon by 15 percent

arXiv Your smartphone camera can now measure body fat as accurately as clinical machines

arXiv How to test if a medical AI is actually getting better, or just drifting with the data

arXiv Language models refuse to stereotype on obvious tests but reliably stereotype on hidden ones

arXiv AI coding agents score 20 points higher on today's tests than they actually perform in real software projects

arXiv A kidney disease screening tool built for South Asia actually works there — existing tools don't

arXiv Emergency room triage shows measurable bias by race, age, and insurance — first time tested on real hospital data

arXiv Smallholder maize farmers in Ghana earn 5% more profit using tractor services — but most can't afford them

arXiv Telling people their neighbors have toilets actually works — Indian sanitation jumped 8 points

The pattern

The structural driver is the same in each case: a system was built, deployed, and evaluated using a proxy metric that was easier to collect than the thing it was supposed to represent. Triage scores stood in for fairness. Benchmark scores stood in for real-world performance. BMI stood in for metabolic risk. The proxies persisted not because they were accurate but because measuring the underlying reality was harder or more politically inconvenient. What remains unknown is how many current policies, medical protocols, and regulatory decisions are still resting on proxies that have simply not yet been tested against the thing they claim to measure.

Track whether the FDA issues a formal methodology for evaluating medical AI update safety within the next 12 weeks — its absence would confirm that the measurement gap in that domain is structural, not accidental.