你发现"用了某 AI 工具的工程师产出更高"。这是第一层。但很可能是"本来就强的人才会主动尝鲜"(自选择混杂)。要爬到第二层,得做 A/B:随机分配谁先用工具。否则贸然全员推广,可能颗粒无收。育儿同理——"爱读书的孩子成绩好"是第一层观察;"让一个孩子多读书会不会提分"才是你真正想知道的第二层干预,两者答案未必一致。
English Summary
The Ladder of Causation (Pearl) sorts every "does X cause Y?" question into three rungs, each needing information the rung below cannot supply. Association — P(Y|X), seeing X makes Y likelier; pure observation, where curve-fitting and most ML stop. Intervention — P(Y|do(X)), what happens if I actively set X; generally ≠ P(Y|X) because the latter can be confounded. Counterfactual — "had X not happened, would Y differ?", the realm of attribution, blame, regret. You cannot climb the ladder from data alone: observational P(Y|X) never yields P(Y|do(X)) without an injected causal model. "Big data" ≠ "causal knowledge." First locate which rung a claim stands on.
AI Prompts
中文提示词
我看到一个结论:[X 与 Y 的关系论断 / 数据/标题]。请用因果阶梯帮我审视:
① 它实际站在第几层——关联、干预,还是反事实?给出判断依据;
② 若它被当作因果(第二层)使用,列出 2 个最可能的混杂因素,它们能让相关看起来像因果;
③ 给出一个能把它"升一层"验证的最小方案(随机实验 / 准实验 / 需要哪张因果图)。
English Prompt
Here is a claim I encountered: [assertion about X and Y / a dataset / a headline]. Apply the Ladder of Causation:
1. Which rung does it actually stand on — association, intervention, or counterfactual? Justify.
2. If it's being used causally (rung 2), list 2 likely confounders that could make mere correlation look causal.
3. Propose the minimal design to "climb one rung" and verify it (randomized experiment / natural experiment / which causal diagram is required).
Counterfactual Reasoning — the top rung: for an event that already happened, ask "had X not occurred, would Y differ?" Attribution, blame, regret, and fairness are all counterfactual judgments. Its core difficulty is missing data — for one individual you observe only the factual outcome, never the counterfactual half (the "fundamental problem of causal inference"), so individual counterfactuals are estimated, not seen. Distinguish necessary from sufficient causes: the last straw is a sufficient trigger, but the accumulated load is the necessary cause. Practical test: remove the factor — would the outcome still occur? If yes, it's mere accompaniment, not a cause (the legal "but-for" test).
AI Prompts
中文提示词
我要给这个结果做归因:[事件/故障/成败的结果],我目前归因于 [我认定的原因]。请用反事实推理压力测试:
① 做"若非检验":抽掉我认定的原因,结果还会发生吗?据此判断它是真正的因还是只是伴随/扳机;
② 区分这里的"必要负载"与"最后一根稻草",指出我是否把扳机错当成了病根;
③ 列出 1-2 个被我忽略、但抽掉后结果会改变的更深层原因。
English Prompt
I'm attributing this outcome: [event / incident / success or failure], currently to [the cause I believe]. Stress-test with counterfactual reasoning:
1. Run the but-for test: remove my proposed cause — would the outcome still happen? Decide if it's a real cause or mere accompaniment / trigger.
2. Separate the "necessary load" from the "last straw" here; tell me if I've mistaken a trigger for the root cause.
3. Name 1–2 deeper causes I'm overlooking whose removal would actually change the outcome.
工具变量 · Instrumental Variables
"当你不能做实验,就找一个老天替你随机的'撬棍'。"
中文详解
当 X 与 Y 之间藏着看不见的混杂因素(U 同时影响 X 和 Y),直接回归 X→Y 得到的系数是有偏的。工具变量(IV)是一根巧妙的撬棍:找一个变量 Z,满足三个条件——① 相关性:Z 影响 X;② 排他性:Z 只通过 X 影响 Y,不走任何别的路;③ 独立性:Z 与混杂 U 无关(Z 像被随机分配的)。于是 Z 引起的那部分 X 变化是"干净的",用它解释 Y,估出的 X→Y 才是因果效应。
直觉:Z 就像老天爷帮你做的一次自然实验——它随机地推了 X 一把,却没碰那些脏的混杂。你只用"Z 引起的那部分 X 波动"去解释 Y,等于把混杂屏蔽在外。
非平凡点:① 最脆弱的是排他性假设,且无法被数据检验——你只能用领域知识论证"Z 真的不走后门"。一旦 Z 有第二条通往 Y 的路,IV 估计全盘崩坏。② 弱工具问题:若 Z 对 X 的影响很微弱,估计会被放大的偏差和方差吞没——弱工具比没工具更危险。③ IV 估的是"局部平均处理效应"(LATE):只对那些"被 Z 推动而改变了 X"的人群成立,未必能外推到所有人。
你想知道"公司内部用 AI 助手是否真的提升绩效"。混杂:积极进取的人既爱用 AI 又绩效高。找工具变量——比如公司分批发放 license,按工号尾号或部门随机决定先后开通。开通时机像抽签一样外生,与个人动机无关,就能用它撬出 AI 的因果效应,而不是"强者恒强"的自选择假象。关键是先论证:开通早晚真的没走别的后门影响绩效。
English Summary
Instrumental Variables (IV) — when an unseen confounder U drives both X and Y, the raw X→Y regression is biased. An instrument Z is a lever satisfying three conditions: (1) relevance — Z affects X; (2) exclusion — Z affects Y only through X; (3) independence — Z is unrelated to U, as if randomly assigned. The Z-induced variation in X is "clean," recovering the true causal effect. Caveats: the exclusion restriction is the most fragile assumption and is untestable by data — defend it with domain knowledge; weak instruments (Z barely moves X) are worse than none; IV estimates a Local Average Treatment Effect (LATE), valid only for those whose X was moved by Z.
AI Prompts
中文提示词
我想估计 [X] 对 [Y] 的因果效应,但不能做随机实验,担心混杂 [可能的混杂因素]。请帮我找工具变量:
① 头脑风暴 2-3 个候选"准随机外生冲击"(政策生效、地理边界、抽签、分批开通等)作为工具 Z;
② 逐一检验三条件:相关性、排他性(最关键,是否有别的后门)、独立性,指出哪个最可疑;
③ 提醒我估出的是哪部分人群的效应(LATE),能否外推到我真正关心的对象。
English Prompt
I want the causal effect of [X] on [Y], but can't randomize, and worry about confounding by [suspected confounders]. Help me find an instrument:
1. Brainstorm 2–3 candidate quasi-random exogenous shocks (policy start dates, geographic borders, lotteries, staggered rollout) as instrument Z.
2. Check each against the three conditions — relevance, exclusion (most critical: any back door?), independence — and flag the weakest.
3. Remind me which subpopulation the estimate applies to (LATE) and whether it generalizes to my real target.
你 A/B 测两个模型版本,整体看 B 转化率更高,于是想全量上 B。先拆——按用户类型分层。很可能每个细分群里 A 都更好,只是 B 的流量恰好被分到了容易转化的高活跃用户身上(分流不均=混杂)。不分层就全量,你会推广一个其实更差的模型。看孩子成绩也一样:别只盯班级平均分的升降,拆到每个能力档,趋势可能完全相反。
English Summary
Simpson's Paradox — a trend that holds within every subgroup can reverse once the groups are pooled (e.g., each department admits women at an equal-or-higher rate, yet the school total favors men). It's not an arithmetic error but a lurking grouping variable. The deep point: data alone can't tell you whether to look pooled or split — only the causal structure can. If the grouping variable is a confounder, stratify; if it's a mediator, stratifying wrongly blocks the real effect. Same numbers, different causal story, opposite correct answer. The cure is a causal model, not more statistics: draw who-affects-whom first, then decide what to control for.
AI Prompts
中文提示词
我看到一个整体趋势/对比结论:[描述数据和结论,如 B 方案整体优于 A]。请帮我排查辛普森悖论:
① 列出 2-3 个最可能"在子群内讲相反故事"的潜藏分组变量(用户类型、科系、时间段等);
② 对每个变量判断它是混杂还是中介——据此决定该不该分层;
③ 给出一个最小的分层核对方案,告诉我若结论反转,正确的行动该是什么。
English Prompt
I see an aggregate trend / comparison: [describe the data and conclusion, e.g. B beats A overall]. Help me screen for Simpson's Paradox:
1. List 2–3 lurking grouping variables most likely to tell the opposite story within subgroups (user type, department, time period).
2. For each, decide whether it's a confounder or a mediator — and thus whether to stratify.
3. Give a minimal stratified-check plan, and tell me the correct action if the conclusion reverses.