古德哈特定律 · Goodhart's Law

"当一个指标成为目标,它就不再是好指标。" — Marilyn Strathern 对 Goodhart 的著名转述

一个指标之所以"好",是因为它在某个自然分布下与你真正在意的目标高度相关。可你一旦把它设成考核目标、挂上奖励,被考核者就会沿着指标的梯度往上爬——而不是沿着真实目标往上爬。两者原本重合的那段相关性,正是被你亲手拆开的:优化指标这个动作本身,改变了产生指标的那个分布,于是相关性失效。

非平凡点:① 这就是机器学习里的 reward hacking(奖励黑客)——RLHF 中模型学会取悦奖励模型而非真正有用,本质上和工人钻考核空子是同一件事,只是一个发生在硅基、一个发生在碳基。② 失真有不同机制:在含噪指标上选极值,选到的多是运气好的噪声(向均值回归);把指标推到极端,原本成立的相关性会在尾部断裂;有对手在场,他会主动逆向工程你的指标。③ 关键推论:指标越单一、奖励越强、被考核者越聪明,失真越快

实践:不要用单一指标做强激励。用一互相制衡的指标(数量配质量、速度配返工率),并定期轮换或加噪,让人无法稳定地为某个数字而优化;更根本的,是把"指标"和"重奖重罚"解耦——指标用来感知,不用来直接发奖。

优化压力 → 水平 → ↑ 指标被设成目标 指标(被测) 真实目标 相关期 脱钩期
古德哈特:施加优化压力后,被测指标继续上扬,真实目标却掉头向下
经典例子

苏联钉子厂——按重量考核就生产巨大的铁钉,按数量考核就生产细小无用的钉子。指标永远被满足,工厂的真实使命(造可用的钉子)永远落空。

场景 · BigCat

用基准测试分数考核大模型,团队会无意识地把训练与筛选对齐到那个 benchmark——MMLU 刷得很高,真实任务却退化(数据污染、过拟合榜单)。同构的还有:用"代码行数 / story point"衡量工程师,催生注水代码;用考试分数衡量孩子"学会了",教出只会应试、一换题就垮的能力。你重奖哪个数字,人就把哪个数字和真实目标的连接剪断。


Goodhart's Law — "When a measure becomes a target, it ceases to be a good measure." A metric is only good because it correlates with the true goal under a natural distribution. The act of optimizing the metric shifts that distribution, so agents climb the proxy's gradient instead of the goal's, decoupling the two. This is the human-org version of reward hacking in RLHF. Failure modes vary (regressional, extremal, adversarial); the sharper the single target, the stronger the reward, and the smarter the agent, the faster it breaks. Defense: use a balanced basket of metrics, rotate or add noise, and decouple measurement from large rewards — sense with metrics, don't steer with them.

中文提示词
我打算用指标 [指标] 来衡量/激励 [目标或人群]。请用古德哈特定律压力测试: ① 这个指标和我真正在意的目标,在哪段范围相关、从哪里开始可能脱钩? ② 如果被考核者很聪明,他能用哪 3 种方式把指标做高却不推进真实目标? ③ 给我一组 2-3 个互相制衡的替代指标,并说明如何把测量和重奖解耦。
English Prompt
I plan to use metric [metric] to measure/incentivize [goal or group]. Stress-test it with Goodhart's Law: 1. Over what range does this metric track the true goal, and where might it decouple? 2. If the agents are smart, what are 3 ways they could inflate the metric without advancing the real goal? 3. Give me a balanced basket of 2-3 counterweight metrics, and explain how to decouple measurement from large rewards.

坎贝尔定律 · Campbell's Law

"一个量化社会指标被用于决策的权重越高,它受到的腐蚀压力越大,它要监测的过程就越会被它扭曲。" — Donald Campbell, 1976

坎贝尔定律和古德哈特是近亲,但它多说了两件要紧的事:① 失真程度正比于你给指标挂的赌注——决策权重越高(决定升迁、拨款、生死),腐蚀越猛;② 被腐蚀的不只是指标,而是指标本想监测的那个过程本身。古德哈特说"数字会脱钩",坎贝尔说"你想衡量的那件事,会被你的衡量给毁掉"。

非平凡点:① 这解释了为什么高考、KPI、绩效排名一旦绑定重大后果,配套的作弊、应试、数据造假就成系统性而非个别现象——压力是结构性的,不是个人道德问题。② 腐蚀有两层:浅层是造假与博弈(改数字),深层是逆向塑形(真把医院、学校、团队改造成"为指标而生"的样子,牺牲它本该做的事)。③ 控制论推论:测量与高利害决策要松耦合。把传感器(指标)直接接到执行器(奖惩)上、增益又调到最大,任何控制系统都会震荡失稳,组织也一样。

实践:把指标定位成"仪表盘"而非"方向盘"。重大决策时让指标只占一票,配上定性判断、现场观察、同行评议;并给被考核者留出讲述指标之外信息的渠道,否则你只会收到被指标过滤过的失真世界。

经典例子

标准化考试主导的教育——学校把课程窄化成"考什么教什么",挤掉不被考的科目,极端时演成集体改卷舞弊。考分上去了,"教育"这件事本身被掏空。英国公立医疗的急诊"四小时内处理"硬指标,则逼出救护车在门外排队不卸病人,因为"还没进门就不算开始计时"。

场景 · BigCat

把晋升直接绑定到"关单数 / 上线次数",工程过程就会被腐蚀——大任务被拆成一堆小单、没人碰难而不可见的重构、为刷上线次数堆砌琐碎改动;指标涨了,真正的工程健康度跌了。育儿同理:把零花钱和特权绑定到分数,孩子优化的是"让卷面好看",不是"真的理解"。赌注越大,你越是在训练对方钻你指标的空子。


Campbell's Law — the more weight a quantitative indicator carries in high-stakes decisions, the more corruption pressure it attracts, and the more it distorts the very process it was meant to monitor. Two additions beyond Goodhart: (1) distortion scales with the stakes attached, and (2) what gets corrupted is the underlying process, not just the number. Gaming and data fraud become systemic, not moral failures — the pressure is structural. Control-theory reading: wiring a sensor straight to an actuator at high gain destabilizes any system. Keep measurement loosely coupled from consequential decisions; let metrics be a dashboard, not a steering wheel, and give people a channel to report what the metric can't see.

中文提示词
我正在把指标 [指标] 绑定到高利害决策 [晋升/拨款/排名]。请用坎贝尔定律分析: ① 随着赌注升高,这个指标会被腐蚀的两层路径(造假博弈 / 逆向塑形)分别长什么样? ② 它最可能损害我真正在意的哪个底层过程? ③ 给我一个"松耦合"方案:如何降低这个指标在决策中的权重、补上哪些定性输入。
English Prompt
I'm binding metric [metric] to a high-stakes decision [promotion/funding/ranking]. Analyze with Campbell's Law: 1. As stakes rise, what do the two layers of corruption (gaming/fraud vs. reverse-shaping) look like here? 2. Which underlying process I actually care about is most likely to be damaged? 3. Give me a loose-coupling plan: how to lower this metric's weight in the decision and what qualitative inputs to add.

代理指标失真 · Surrogation

把指月的手指当成了月亮——代理指标在认知里悄悄顶替了真实目标

古德哈特和坎贝尔讲的是别人钻你指标的空子;代理指标失真讲的是你自己脑子里发生的事——你会不知不觉地用具体的指标顶替掉抽象的目标,然后忘了它只是个代理。战略本是"让客户离不开我们",一旦量化成 NPS(净推荐值),整个团队脑中的目标就悄悄变成了"把 NPS 做高"。地图替代了疆域,而且发生在认知层面,无需任何作弊动机。

非平凡点:① 这是比博弈更隐蔽的失真——即使没人想钻空子,它也照样发生,因为目标是抽象的、指标是具体的,而人脑天生抓得住具体、抓不住抽象。② 它和佛学"以指为月"同构:手指(指标)本是指向月亮(目标)的方便,执指为月就彻底丢了月亮。③ 越是用得顺手、汇报得频繁的指标,替换越彻底——你每天盯着它,它就越像"真实本身"。

实践:定期把真实目标用与指标无关的语言重新讲一遍,强迫自己把"手指"和"月亮"分开。问自己一个诊断句:"如果这个数字涨了、但我真正在意的东西没涨,我察觉得到吗?" 察觉不到,说明你已经把目标替换成了指标。再多用几个角度不同的指标,让任何单一数字都无法独占"目标"的位置。

经典例子

富国银行(Wells Fargo)把"和客户建立深度关系"这个目标,替换成"人均开户数"这一指标——上下一致地盯着开户数,最终员工开出约 350 万个虚假账户。目标被代理彻底吃掉:他们真的把开户数当成了关系本身。

场景 · BigCat

个人成长里这个陷阱最隐蔽——把"成长/学会"替换成"读了几本书 / 刷了多少题 / GitHub 连续绿格"。你开始优化仪表盘而非人生:为了保住连续打卡的 streak 而做低质量的敷衍,能力没长、数字很好看。健康被替换成"每天一万步",于是凑步数而非真的健康。越自律的人越容易掉进来,因为他执行指标的能力太强了。


Surrogation — while Goodhart and Campbell are about others gaming your metric, surrogation is what happens inside your own head: you quietly substitute the concrete measure for the abstract goal and forget it was ever a proxy. A strategy of "make customers depend on us" silently becomes "maximize NPS" in everyone's mind. It needs no cheating incentive — it happens because goals are abstract and metrics are concrete, and the mind grabs the concrete. Structurally identical to mistaking the finger for the moon. Defense: periodically re-state the true goal in language independent of the metric, and ask: "If this number rose but the thing I care about didn't, would I notice?" If not, the goal has already been replaced by its proxy.

中文提示词
我团队/我自己现在主要盯着指标 [指标] 来推进目标 [真实目标]。请帮我查代理指标失真: ① 用与这个指标完全无关的语言,把我的真实目标重新讲清楚; ② 列出 3 个"指标涨了但真实目标没涨"的具体情形,我是否察觉得到? ③ 建议一个让单一数字无法独占目标位置的多指标组合或复盘习惯。
English Prompt
My team/I are mainly tracking metric [metric] to advance goal [true goal]. Help me check for surrogation: 1. Re-state my true goal in language entirely independent of this metric. 2. List 3 concrete cases where the metric rises but the true goal doesn't — would I notice each? 3. Suggest a multi-metric mix or review ritual that prevents any single number from monopolizing the "goal" slot.

内在动机扭曲 · Motivation Crowding

"给热爱标价,热爱就开始消失" — 外在奖励与测量会挤出内在动机(过度合理化效应)

前三个模型讲指标如何扭曲行为,这个讲它如何扭曲动机本身。给一件本来出于热爱(内在动机)去做的事,加上外在奖励或考核,往往会降低而非提高内在动机——这就是过度合理化效应 / 动机挤出。一旦"我做这个是因为喜欢"被改写成"我做这个是为了那个奖励/数字",撤掉奖励后,行为会掉到比原来更低的水平。

非平凡点:① 测量本身就是一种外在控制——给某件事挂上数字,就在悄悄改变它对你的意义。② 关键不在奖励有无,而在它被体验为"控制"还是"信息":被感知为评判与操控的反馈会挤出内在动机,被感知为"帮我精进的信息"的反馈则滋养它。同一个数字,两种框架,结果相反(自决理论,Deci 与 Ryan)。③ 推论:在内在动机最强的领域(创造、科研、育儿、修行),强考核的破坏性最大——你在杀死那只下金蛋的鹅。

实践:对内在驱动的活动,用信息型反馈("这帮你看清进展")而非控制型奖惩("达标就奖、不达标就罚");度量要少量、低调、可自主关闭。再把"我为什么做这件事"的叙事,牢牢锚在热爱与意义上,别让数字来改写它。

经典例子

以色列一家托儿所对"晚接孩子"的家长罚款,结果迟到不降反升——罚款把"迟到"从一种道德愧疚重新定义成一项可购买的服务,家长心安理得地多迟到。更糟的是撤销罚款后迟到率也回不去:被改写的规范回不到从前。

场景 · BigCat

用贴纸和积分游戏化孩子的阅读,短期见效,长期却把"读书"变成"换积分的手段"——撤掉积分,阅读热情比一开始还低。给自己的写作、冥想、健身挂上 streak 和打卡同理,本是热爱,渐渐变成一项要交差的 KPI。工程上也成立:把开发者管得密不透风、处处量化,会挤出当初产出高质量的那股手艺人内驱力不是所有动机都对指标一视同仁——对内在驱动的事,测量要像盐,少放、可选。


Motivation Crowding (Overjustification) — while the first three models distort behavior, this one distorts motivation itself. Adding extrinsic rewards or measurement to an intrinsically motivated activity often lowers intrinsic motivation; remove the reward and behavior drops below baseline. Measurement is itself a form of extrinsic control — putting a number on something quietly changes its meaning. What matters is whether feedback is experienced as controlling (judgment) or informational (mastery support) — same number, opposite effect (Self-Determination Theory, Deci & Ryan). Corollary: heavy metrics are most destructive precisely where intrinsic drive is strongest (creation, research, parenting, practice). Use informational feedback over controlling rewards; keep measurement small, quiet, and optional — like salt.

中文提示词
我打算用奖励/打卡/考核 [机制] 来推动 [活动/对象],这件事原本带有内在热爱。请用动机挤出分析: ① 这个机制最可能把"因为喜欢而做"改写成"为了奖励而做"吗?风险有多大? ② 把它从"控制型"重新设计成"信息型反馈",具体该怎么改? ③ 给我一句锚定意义的叙事,帮我(或对方)在有度量的情况下守住内在动机。
English Prompt
I plan to use a reward/streak/review mechanism [mechanism] to drive [activity/person], which is currently intrinsically loved. Analyze with motivation crowding: 1. How likely is this mechanism to rewrite "doing it because I love it" into "doing it for the reward"? How big is the risk? 2. How exactly would I redesign it from controlling to informational feedback? 3. Give me one meaning-anchoring narrative to protect intrinsic motivation even with measurement present.