用基准测试分数考核大模型,团队会无意识地把训练与筛选对齐到那个 benchmark——MMLU 刷得很高,真实任务却退化(数据污染、过拟合榜单)。同构的还有:用"代码行数 / story point"衡量工程师,催生注水代码;用考试分数衡量孩子"学会了",教出只会应试、一换题就垮的能力。你重奖哪个数字,人就把哪个数字和真实目标的连接剪断。
English Summary
Goodhart's Law — "When a measure becomes a target, it ceases to be a good measure." A metric is only good because it correlates with the true goal under a natural distribution. The act of optimizing the metric shifts that distribution, so agents climb the proxy's gradient instead of the goal's, decoupling the two. This is the human-org version of reward hacking in RLHF. Failure modes vary (regressional, extremal, adversarial); the sharper the single target, the stronger the reward, and the smarter the agent, the faster it breaks. Defense: use a balanced basket of metrics, rotate or add noise, and decouple measurement from large rewards — sense with metrics, don't steer with them.
AI Prompts
中文提示词
我打算用指标 [指标] 来衡量/激励 [目标或人群]。请用古德哈特定律压力测试:
① 这个指标和我真正在意的目标,在哪段范围相关、从哪里开始可能脱钩?
② 如果被考核者很聪明,他能用哪 3 种方式把指标做高却不推进真实目标?
③ 给我一组 2-3 个互相制衡的替代指标,并说明如何把测量和重奖解耦。
English Prompt
I plan to use metric [metric] to measure/incentivize [goal or group]. Stress-test it with Goodhart's Law:
1. Over what range does this metric track the true goal, and where might it decouple?
2. If the agents are smart, what are 3 ways they could inflate the metric without advancing the real goal?
3. Give me a balanced basket of 2-3 counterweight metrics, and explain how to decouple measurement from large rewards.
坎贝尔定律 · Campbell's Law
"一个量化社会指标被用于决策的权重越高,它受到的腐蚀压力越大,它要监测的过程就越会被它扭曲。" — Donald Campbell, 1976
Campbell's Law — the more weight a quantitative indicator carries in high-stakes decisions, the more corruption pressure it attracts, and the more it distorts the very process it was meant to monitor. Two additions beyond Goodhart: (1) distortion scales with the stakes attached, and (2) what gets corrupted is the underlying process, not just the number. Gaming and data fraud become systemic, not moral failures — the pressure is structural. Control-theory reading: wiring a sensor straight to an actuator at high gain destabilizes any system. Keep measurement loosely coupled from consequential decisions; let metrics be a dashboard, not a steering wheel, and give people a channel to report what the metric can't see.
AI Prompts
中文提示词
我正在把指标 [指标] 绑定到高利害决策 [晋升/拨款/排名]。请用坎贝尔定律分析:
① 随着赌注升高,这个指标会被腐蚀的两层路径(造假博弈 / 逆向塑形)分别长什么样?
② 它最可能损害我真正在意的哪个底层过程?
③ 给我一个"松耦合"方案:如何降低这个指标在决策中的权重、补上哪些定性输入。
English Prompt
I'm binding metric [metric] to a high-stakes decision [promotion/funding/ranking]. Analyze with Campbell's Law:
1. As stakes rise, what do the two layers of corruption (gaming/fraud vs. reverse-shaping) look like here?
2. Which underlying process I actually care about is most likely to be damaged?
3. Give me a loose-coupling plan: how to lower this metric's weight in the decision and what qualitative inputs to add.
Surrogation — while Goodhart and Campbell are about others gaming your metric, surrogation is what happens inside your own head: you quietly substitute the concrete measure for the abstract goal and forget it was ever a proxy. A strategy of "make customers depend on us" silently becomes "maximize NPS" in everyone's mind. It needs no cheating incentive — it happens because goals are abstract and metrics are concrete, and the mind grabs the concrete. Structurally identical to mistaking the finger for the moon. Defense: periodically re-state the true goal in language independent of the metric, and ask: "If this number rose but the thing I care about didn't, would I notice?" If not, the goal has already been replaced by its proxy.
AI Prompts
中文提示词
我团队/我自己现在主要盯着指标 [指标] 来推进目标 [真实目标]。请帮我查代理指标失真:
① 用与这个指标完全无关的语言,把我的真实目标重新讲清楚;
② 列出 3 个"指标涨了但真实目标没涨"的具体情形,我是否察觉得到?
③ 建议一个让单一数字无法独占目标位置的多指标组合或复盘习惯。
English Prompt
My team/I are mainly tracking metric [metric] to advance goal [true goal]. Help me check for surrogation:
1. Re-state my true goal in language entirely independent of this metric.
2. List 3 concrete cases where the metric rises but the true goal doesn't — would I notice each?
3. Suggest a multi-metric mix or review ritual that prevents any single number from monopolizing the "goal" slot.
Motivation Crowding (Overjustification) — while the first three models distort behavior, this one distorts motivation itself. Adding extrinsic rewards or measurement to an intrinsically motivated activity often lowers intrinsic motivation; remove the reward and behavior drops below baseline. Measurement is itself a form of extrinsic control — putting a number on something quietly changes its meaning. What matters is whether feedback is experienced as controlling (judgment) or informational (mastery support) — same number, opposite effect (Self-Determination Theory, Deci & Ryan). Corollary: heavy metrics are most destructive precisely where intrinsic drive is strongest (creation, research, parenting, practice). Use informational feedback over controlling rewards; keep measurement small, quiet, and optional — like salt.
AI Prompts
中文提示词
我打算用奖励/打卡/考核 [机制] 来推动 [活动/对象],这件事原本带有内在热爱。请用动机挤出分析:
① 这个机制最可能把"因为喜欢而做"改写成"为了奖励而做"吗?风险有多大?
② 把它从"控制型"重新设计成"信息型反馈",具体该怎么改?
③ 给我一句锚定意义的叙事,帮我(或对方)在有度量的情况下守住内在动机。
English Prompt
I plan to use a reward/streak/review mechanism [mechanism] to drive [activity/person], which is currently intrinsically loved. Analyze with motivation crowding:
1. How likely is this mechanism to rewrite "doing it because I love it" into "doing it for the reward"? How big is the risk?
2. How exactly would I redesign it from controlling to informational feedback?
3. Give me one meaning-anchoring narrative to protect intrinsic motivation even with measurement present.