前几期讲的是"如何思考",本期讲"如何把好思考变成可执行的动作"。这四个工具刚好覆盖一次重大决策的完整生命周期:预先验尸在动手前压力测试方案,检查清单在执行中守住已知关口,红队对抗把"挑战结论"制度化,决策日记在事后把结果反喂回判断力。它们共同的底层逻辑是:不要相信"我会记得 / 我会客观 / 我会反思",而是把这些意图固化成无法绕过的流程。

检查清单 · Checklists

"The volume and complexity of what we know has exceeded our ability to deliver it correctly." — Atul Gawande

检查清单最常被误解的地方:它不是用来帮你记住"你不知道的事",而是对抗"你明明知道、却在疲劳/压力/分心下会跳过的事"。它针对的是"已知项的执行失败",不是知识缺口。一个资深外科医生不是不懂消毒,而是在第 8 台手术、凌晨两点时会无意识省掉一步——清单把这一步从"靠记忆"变成"靠核对"。

非平凡点:① 清单分两类——DO-CONFIRM(凭经验先做,到关口暂停核对有没有漏)和 READ-DO(边读边做,像菜谱)。高手用前者,新手用后者,用错类型清单就会被嫌弃。② 清单必须短——每个暂停点 5–9 项,只放"killer items":被跳过会致命、且最容易被遗忘的步骤。塞进所有"应该做的事" = 没人会用的清单。删项比加项更难也更重要。③ 隐藏的不是记忆功能而是社会功能:清单制造了一个合法的"暂停点",让团队里地位低的人(护士、初级工程师)获得发言许可去指出"医生你漏了一步"。这是社会-技术干预,不只是备忘录。

经典例子

世界卫生组织手术安全清单只有 19 项、3 个暂停点(麻醉前/切皮前/出室前)。在全球 8 家医院试点后,手术相关死亡率下降约 47%、并发症下降三分之一。关键不是医生学到了新知识,而是"确认每个人都报了名字与职责""确认抗生素已给"这种谁都懂、却常在忙乱中被跳过的步骤被强制核对了。

场景 · BigCat

给 AI 系统上线写一份 deploy checklist:不是写"要小心",而是列出那几个一旦漏掉就会出事、又最容易在赶版本时跳过的关口——"回滚脚本已验证""token 成本上限已设""prompt 版本已打 tag""敏感数据已脱敏"。控制在 7 项以内。在家也成立:孩子上学出门清单贴在门口,把"靠妈妈记"换成"孩子自己核对",顺便把责任和能力一起交还给孩子,这正是清单的社会功能在家庭里的版本。


Checklists — they don't help you remember what you don't know; they guard against skipping what you do know under fatigue, pressure, or distraction. They target execution failures on known items, not knowledge gaps. Two types: DO-CONFIRM (act from memory, pause at checkpoints to verify) for experts, and READ-DO (read-then-do) for novices. Keep them short — 5–9 "killer items" per pause point; cramming in everything makes a checklist nobody uses. The hidden lever is social, not mnemonic: a checklist creates a legitimate pause and gives lower-status members permission to flag a missed step. It's a socio-technical intervention, not a memo.

中文提示词
我要为 [流程/操作,如某系统上线、某次手术准备、某类家务] 设计一份检查清单。请: ① 区分这个场景该用 DO-CONFIRM 还是 READ-DO,并说明理由; ② 只挑出 5–9 个"killer items"——漏掉会致命、且在忙乱中最易被跳过的步骤,删掉所有"锦上添花"项; ③ 标出每个暂停点的位置,以及哪一项是给"地位较低成员"开口纠错用的。
English Prompt
I want to design a checklist for [process/operation]. Please: 1. Decide whether this case calls for DO-CONFIRM or READ-DO, and explain why. 2. Select only 5–9 "killer items" — steps that are fatal if skipped and most likely to be dropped under pressure; cut every nice-to-have. 3. Mark the pause points, and flag which item exists to give a lower-status member permission to catch an error.

预先验尸 · Pre-mortem

"想象计划已经彻底失败——然后回头解释它为什么失败。" — Gary Klein 提出

常规复盘是"事后验尸"——出事了才查死因。预先验尸把时钟拨到未来:假设这个项目已经惨败,现在请每个人写下"它为什么失败"。同一个担忧,换个时态说出来,威力完全不同。

非平凡点:① 它真正破解的是群体一致性压力。在正常会议里说"我担心这事会黄"=扫兴、不合群、像在唱衰团队,于是没人说。预先验尸把"找失败原因"变成被正式授权、甚至要比谁找得多的任务——它把质疑从社会惩罚反转成了社会奖励。② 心理机制叫前瞻性后见(prospective hindsight):把"可能会失败"改写成"已经失败了",大脑就从抽象概率切换到具体叙事模式。研究显示,这种"事情已成定局"的措辞能让人识别未来结果原因的能力提升约 30%。③ 它是逆向思维(见 Day 1)的团队化、时间化落地版——不止"反过来想",而是给反过来想配上流程、角色和时间锚点。

经典例子

这一方法由决策研究者 Gary Klein 提出、并经 Kahneman 大力推荐进入主流。做法极简:项目启动会上,主持人说"现在是一年后,这个项目彻底失败了,请各位用 2 分钟独立写下所有可能的死因",然后轮流朗读。它之所以有效,是因为独立书写避开了从众,而"已经失败"的设定让最资深、最该被听见的悲观声音终于敢出声。

场景 · BigCat

发布一个 AI agent 产品前,召集团队做预先验尸:"半年后这产品死了,为什么?"——你会比任何乐观路演都更快听到"幻觉拖垮了信任""token 成本压垮了毛利""用户根本不会写 prompt"这些真问题。个人也能做单人版:重大技术选型前,写一段"这个架构两年后被全员唾弃的悼词"。它甚至与佛学的"念死"同构——观想终局来倒逼当下的优先级,让真正重要的浮上来。


Pre-mortem — before launching, assume the project has already failed catastrophically, then have everyone independently write down why. What it really defeats is conformity pressure: in a normal meeting, voicing "I'm worried this will fail" reads as disloyal, so no one does; the pre-mortem reframes finding failure causes as an authorized, even competitive task, flipping dissent from social punishment to social reward. The mechanism is prospective hindsight — rewriting "might fail" as "has failed" shifts the brain from abstract probability to concrete narrative, improving the ability to identify causes of outcomes by roughly 30%. It's the team-and-time-anchored implementation of inversion.

中文提示词
我即将推进 [项目/决策/计划],目标是 [描述]。请帮我做一次预先验尸: ① 设定"一年后它已经彻底失败",列出 6–8 个最可能的"死因",按"杀伤力 × 发生概率"排序; ② 指出其中哪些是团队此刻因为"不想扫兴"而不会主动说出口的; ③ 针对排名前 2 的死因,各给一个现在就能采取的预防动作。
English Prompt
I'm about to pursue [project/decision/plan], aiming to [describe]. Run a pre-mortem: 1. Assume it's one year later and the plan has failed completely; list 6–8 likely "causes of death," ranked by impact × probability. 2. Identify which of these the team would avoid voicing now to not seem like a downer. 3. For the top 2 causes, give one preventive action I can take right now for each.

红队对抗 · Red Teaming

源自军事兵推与网络安全——指派一组人,职责就是攻破你的方案

红队是一组被正式指派、任务就是攻击你的方案/系统/结论的人。它和预先验尸的区别在于:预先验尸是一次性的发散,红队是持续的、有编制的对抗角色

非平凡点:① 红队真正的价值不在"找到了哪个漏洞",而在它把对抗性思维制度化——把"挑战权威结论"从需要勇气的个人英雄行为,变成有岗位、有授权、甚至有 KPI 的日常工作,从而抹掉了唱反调的社会成本。② 它和"魔鬼代言人"有本质差别:魔鬼代言人是临时让一个人唱反调,大家都知道"他只是在演",反对会流于形式;真红队有结构性独立真实激励去赢。③ 最致命的失败模式是红队被收编(co-opted)——如果红队向蓝队汇报、或想讨好被攻击方,它立刻失去全部价值。红队必须独立、且"赢了有奖"。④ 在 LLM 时代这极其当代:AI red teaming 已是对齐与安全的标配——专门有人想方设法诱导模型越狱、输出有害内容,以此暴露防护的缝隙。

经典例子

以色列军方的"第十人原则":据传源于赎罪日战争的情报误判——如果有九个人看完同样的情报都得出同一结论,第十个人有义务持相反意见,哪怕只为反对而反对。它用强制规则消灭从众,制度化地保证"总有一个声音在攻击共识"。这正是红队思维的极简内核。

场景 · BigCat

对自己的技术方案设红队:找一个不在这个项目里、且"挑出问题会被表扬"的同事,专门来拆你的架构假设——独立性和激励缺一不可,否则就退化成走过场。育儿决策也能用轻量版:当全家都觉得"该给孩子报这个班"时,主动指定一个人(或自己分饰)扮演真诚的反对者,逼出"我们是不是只是焦虑驱动""孩子的真实意愿被忽略了吗"。关键永远是:反对必须被奖励,而不是被容忍。


Red Teaming — a group formally tasked with attacking your plan, system, or conclusion. Unlike a one-shot pre-mortem, it's a standing, adversarial role. Its real value isn't the specific flaw found but institutionalizing adversarial thinking: it turns "challenging authority" from an act of personal courage into an authorized job, erasing the social cost of dissent. It differs from a devil's advocate (a temporary, often performative contrarian) by requiring structural independence and a genuine incentive to win. The fatal failure mode is co-optation — a red team that reports to or seeks to please the blue team is worthless. Highly current in the LLM era via AI red teaming for safety and alignment.

中文提示词
请你扮演我的红队,任务是击溃以下方案,而不是改良它:[描述方案/结论]。请: ① 找出 3 个最致命的隐含假设,并说明各自在什么条件下会崩; ② 设计一个最可能让它失败的对抗场景(攻击者视角); ③ 诚实评估:如果我反驳你,你最强的反击是什么?最后说明哪个攻击我无法轻易化解。
English Prompt
Act as my red team. Your job is to defeat the following, not improve it: [describe plan/conclusion]. Please: 1. Surface the 3 most fatal hidden assumptions and the conditions under which each collapses. 2. Design the adversarial scenario most likely to make it fail (attacker's view). 3. Honestly assess: if I push back, what's your strongest counter? End by naming the one attack I cannot easily neutralize.

决策日记 · Decision Journal

在决策的那一刻,冻结你当时的真实认知状态——以对抗记忆的篡改

做重大决策时,写下四样东西:你预期会发生什么、你的核心理由、你当时的情绪状态、你的置信度(如"70% 确定")。事后用真实结果回头对照。

非平凡点:① 它对抗的是两个隐形杀手——后见之明偏误(事后觉得"我早就知道")和结果偏误(用结果好坏倒推决策对错)。没有日记,你的大脑会偷偷篡改对当时的记忆,让你永远"事后诸葛亮",于是你永远学不到决策质量,只学到结果的情绪。日记把决策当下的真实状态冻结成无法抵赖的证据。② 核心区分:好决策 ≠ 好结果。在不确定环境里,必须评判"过程"而非"结果"——一个 70% 胜率的好赌注,输了仍是好决策。日记是你把实力(skill)和运气(luck)分离开来的唯一数据源。③ 它还是贝叶斯校准(见 Day 7)的训练场:把所有你标过"70% 确定"的事捞出来,看是否真有约 70% 发生——这是把模糊自信变成可量化、可改进的预测能力。④ 记录情绪也非闲笔:你会发现自己在某些状态下(疲惫、被冒犯、急于求成)系统性地做坏决策。

决策时刻 理由·情绪·置信度 真实结果 时间揭晓 对照复盘 过程≠结果 校准 更新判断 把结果反喂回判断力(闭环)
决策日记 = 把决策的真实意图冻结,再用结果闭环校准
经典例子

桥水基金的 Ray Dalio 把每次重大决策的原则与依据系统记录、事后复盘,逐渐沉淀出可复用的"原则"。Kahneman 也反复强调:人对自己过去判断的记忆极不可靠,唯有当场写下,才能在事后逃离后见之明的扭曲,真正校准自己的判断力。

场景 · BigCat

为重大技术架构决策建一份日记:"选 Kafka 而非 RabbitMQ,理由是 X,置信度 65%,当时在赶 Q3 deadline、有点焦躁。"半年后回看:如果结果好但理由全错,那是运气,别学;如果理由对、结果坏,过程是对的,继续。投资决策同理——它能帮你戒掉"赚到了就以为自己很懂"的危险幻觉。对追求"AI 超级个体"的人,这是把个人判断力变成可迭代系统的关键基础设施:没有冻结的输入,就没有可信的反馈信号。


Decision Journal — at the moment of a major decision, record four things: what you expect to happen, your core reasoning, your emotional state, and your confidence level (e.g., "70% sure"). Later, compare against the real outcome. It defeats two hidden killers: hindsight bias ("I knew it all along") and outcome bias (judging decision quality by results). Without it, your brain rewrites the memory and you only learn the emotion of outcomes, never decision quality. Core distinction: a good decision ≠ a good outcome — under uncertainty you must judge process, not result. The journal is your only data source for separating skill from luck, and a training ground for Bayesian calibration: check whether things you marked "70% sure" actually happened ~70% of the time.

中文提示词
这是我决策当时记录的日记条目:预期 [X],理由 [Y],情绪 [Z],置信度 [N%]。现在的真实结果是 [描述]。请帮我复盘: ① 分离实力与运气——结果的好坏有多少来自我的理由正确,多少来自偶然? ② 我标的置信度是否校准(过度自信还是过度保守)? ③ 我的情绪状态是否系统性影响了这次判断,未来该设什么触发器提醒自己?
English Prompt
Here is the journal entry from when I decided: expectation [X], reasoning [Y], emotion [Z], confidence [N%]. The actual outcome is now [describe]. Please review: 1. Separate skill from luck — how much of the outcome came from correct reasoning vs chance? 2. Was my stated confidence calibrated (over- or under-confident)? 3. Did my emotional state systematically bias this decision, and what trigger should I set to flag it next time?