Mental Models Explained: Metric Traps

Goodhart's Law

"When a measure becomes a target, it ceases to be a good measure." — Marilyn Strathern's well-known restatement of Goodhart

In Depth

A metric is "good" only because, under some natural distribution, it correlates strongly with the goal you actually care about. The moment you set it as a target and attach rewards, the people measured start climbing the gradient of the metric — not the gradient of the true goal. The very correlation you relied on is the thing you tore apart: the act of optimizing the metric shifts the distribution that produced it, so the correlation breaks.

Non-trivial: (1) this is reward hacking from machine learning — an RLHF model learning to please the reward model rather than be useful is structurally identical to a worker gaming a quota, one in silicon, one in carbon. (2) The distortion has distinct mechanisms: selecting extremes on a noisy metric mostly picks lucky noise (regression to the mean); pushing a metric to its extreme snaps a correlation that only held in the middle; an adversary will reverse-engineer your metric on purpose. (3) Key corollary: the more singular the target, the stronger the reward, and the smarter the agent, the faster it breaks.

Practice: don't drive strong incentives off a single metric. Use a basket of counterweights (quantity paired with quality, speed paired with rework rate), and rotate or add noise so no one can stably optimize one number; more fundamentally, decouple the metric from large rewards — sense with metrics, don't steer with them.

Goodhart: under optimization pressure, the measured metric keeps rising while the true goal turns down

Classic example

The Soviet nail factory — judged by weight, it produced giant heavy nails; judged by quantity, it produced tiny useless ones. The metric was always satisfied; the factory's real purpose (making usable nails) was always missed.

BigCat scenario

Judge a large model on a benchmark score and the team unconsciously aligns training and selection to that benchmark — MMLU climbs while real-task performance degrades (data contamination, leaderboard overfit). Same shape: measuring engineers by "lines of code / story points" breeds padded code; measuring a child's learning by test scores produces exam-only skills that collapse when the question changes. Whatever number you reward heavily, people sever its link to the real goal.

AI Prompt

English Prompt

I plan to use metric [metric] to measure/incentivize [goal or group]. Stress-test it with Goodhart's Law: 1. Over what range does this metric track the true goal, and where might it decouple? 2. If the agents are smart, what are 3 ways they could inflate the metric without advancing the real goal? 3. Give me a balanced basket of 2-3 counterweight metrics, and explain how to decouple measurement from large rewards.

Campbell's Law

"The more a quantitative social indicator is used for decision-making, the more corruption pressure it attracts, and the more it distorts the process it monitors." — Donald Campbell, 1976

In Depth

Campbell's Law is a close cousin of Goodhart, but it adds two crucial things: (1) the degree of distortion scales with the stakes you attach — the higher the decision weight (promotion, funding, life and death), the fiercer the corruption; (2) what gets corrupted is not just the metric but the process it was meant to monitor. Goodhart says "the number decouples"; Campbell says "the thing you wanted to measure gets destroyed by your measuring of it."

Non-trivial: (1) this is why, once exams, KPIs, or performance rankings are bound to major consequences, the accompanying cheating, teaching-to-the-test, and data fraud become systemic rather than isolated — the pressure is structural, not a matter of personal morality. (2) Corruption has two layers: a shallow one of gaming and fraud (faking the number) and a deep one of reverse-shaping (actually rebuilding hospitals, schools, and teams to "live for the metric," sacrificing what they should do). (3) Control-theory corollary: keep measurement loosely coupled from high-stakes decisions. Wiring a sensor straight to an actuator at high gain destabilizes any control system — organizations included.

Practice: treat metrics as a "dashboard," not a "steering wheel." In major decisions let the metric cast only one vote, alongside qualitative judgment, on-site observation, and peer review; and give the measured a channel to report what the metric can't see, or you'll only receive a world already filtered and distorted by it.

Classic example

Education dominated by standardized testing — schools narrow the curriculum to "teach what's tested," squeeze out untested subjects, and in extreme cases descend into mass exam-tampering. Scores rise while "education" itself is hollowed out. A hospital ER's hard "treat within four hours" target, meanwhile, pushes ambulances to queue outside without unloading patients, because "the clock hasn't started until they're through the door."

BigCat scenario

Bind promotion directly to "tickets closed / deploy count" and the engineering process gets corrupted — big tasks split into many small tickets, hard and invisible refactors go untouched, trivial changes pile up to pad the deploy count; the metric rises while real engineering health falls. Same in parenting: tie allowance and privileges to grades and the child optimizes "making the test look good," not "actually understanding it." The bigger the stakes, the more you are training people to game your metric.

AI Prompt

English Prompt

I'm binding metric [metric] to a high-stakes decision [promotion/funding/ranking]. Analyze with Campbell's Law: 1. As stakes rise, what do the two layers of corruption (gaming/fraud vs. reverse-shaping) look like here? 2. Which underlying process I actually care about is most likely to be damaged? 3. Give me a loose-coupling plan: how to lower this metric's weight in the decision and what qualitative inputs to add.

Surrogation

Mistaking the finger pointing at the moon for the moon — the proxy quietly replaces the true goal in your mind

In Depth

Goodhart and Campbell are about others gaming your metric; surrogation is what happens inside your own head — you unwittingly substitute the concrete metric for the abstract goal, then forget it was ever a proxy. A strategy of "make customers depend on us," once quantified as NPS (Net Promoter Score), silently turns the goal in everyone's mind into "maximize NPS." The map replaces the territory, and it happens cognitively, with no cheating motive required.

Non-trivial: (1) this is a subtler distortion than gaming — it happens even when no one wants to cheat, because the goal is abstract and the metric is concrete, and the mind is built to grasp the concrete. (2) It is structurally identical to the Zen image of mistaking the finger for the moon: the finger (metric) was only a convenient pointer to the moon (goal); cling to the finger and you lose the moon entirely. (3) The more frequently and conveniently a metric is reported, the more completely it takes over — stare at it daily and it starts to feel like reality itself.

Practice: periodically re-state the true goal in language independent of the metric, forcing yourself to separate "finger" from "moon." Ask one diagnostic question: "If this number rose but the thing I actually care about didn't, would I notice?" If not, you've already replaced the goal with its proxy. Then use several metrics from different angles, so no single number can monopolize the "goal" slot.

Classic example

Wells Fargo replaced the goal of "building deep customer relationships" with the metric "accounts per customer" — everyone, top to bottom, fixated on account counts, and employees ended up opening some 3.5 million fake accounts. The goal was wholly consumed by its proxy: they genuinely treated account count as the relationship itself.

BigCat scenario

In personal growth this trap is sneakiest — replacing "growth / learning" with "books read / problems drilled / GitHub green-square streak." You start optimizing the dashboard instead of your life: phoning it in to protect a streak, capability flat while the numbers look great. Health gets replaced by "10,000 steps a day," so you pad steps instead of getting healthier. The more disciplined you are, the easier you fall in — because your ability to execute on a metric is so strong.

AI Prompt

English Prompt

My team/I are mainly tracking metric [metric] to advance goal [true goal]. Help me check for surrogation: 1. Re-state my true goal in language entirely independent of this metric. 2. List 3 concrete cases where the metric rises but the true goal doesn't — would I notice each? 3. Suggest a multi-metric mix or review ritual that prevents any single number from monopolizing the "goal" slot.

Motivation Crowding

"Put a price on a love and the love starts to vanish" — extrinsic rewards and measurement crowd out intrinsic motivation (the overjustification effect)

In Depth

The first three models describe how metrics distort behavior; this one describes how they distort motivation itself. Add an extrinsic reward or assessment to something done out of love (intrinsic motivation), and you often lower rather than raise intrinsic motivation — the overjustification effect / motivation crowding-out. Once "I do this because I enjoy it" is rewritten as "I do this for the reward/number," removing the reward drops the behavior below its original baseline.

Non-trivial: (1) measurement is itself a form of extrinsic control — putting a number on something quietly changes what it means to you. (2) What matters is not the presence of a reward but whether it's experienced as "controlling" or "informational": feedback felt as judgment and manipulation crowds out intrinsic motivation; feedback felt as "information to help me improve" nourishes it. Same number, two framings, opposite results (Self-Determination Theory, Deci & Ryan). (3) Corollary: in the domains where intrinsic motivation is strongest (creation, research, parenting, practice), heavy assessment is the most destructive — you're killing the goose that lays the golden eggs.

Practice: for intrinsically driven activities, use informational feedback ("this helps you see your progress") over controlling reward and punishment ("hit the target and get rewarded, miss it and get penalized"); keep measurement small, quiet, and optional. And anchor the narrative of "why I do this" firmly in love and meaning, refusing to let the number rewrite it.

Classic example

An Israeli daycare fined parents for picking up children late — and lateness rose. The fine redefined "being late" from a moral lapse into a purchasable service, so parents arrived late with a clear conscience. Worse, after the fine was scrapped, lateness didn't return to its old level: a rewritten norm doesn't snap back.

BigCat scenario

Gamify a child's reading with stickers and points and it works short-term, but long-term turns "reading" into "a means to points" — remove the points and the appetite for reading is lower than at the start. The same goes for putting streaks and check-ins on your own writing, meditation, or exercise: once a love, it slowly becomes a KPI you have to clear. It holds in engineering too: micromanage and over-quantify developers and you crowd out the very craftsman's drive that produced the quality. Not all motivation responds to metrics the same way — for intrinsically driven things, measure like salt: a little, and optional.

AI Prompt

English Prompt

I plan to use a reward/streak/review mechanism [mechanism] to drive [activity/person], which is currently intrinsically loved. Analyze with motivation crowding: 1. How likely is this mechanism to rewrite "doing it because I love it" into "doing it for the reward"? How big is the risk? 2. How exactly would I redesign it from controlling to informational feedback? 3. Give me one meaning-anchoring narrative to protect intrinsic motivation even with measurement present.