Calibration vs Resolution — forecast quality has two orthogonal virtues. Calibration: when you say 70%, the event happens 70% of the time — your probabilities tell the truth. Resolution (sharpness): you dare to leave the base rate and commit to decisive probabilities rather than hugging 50%. They trade off: a forecaster who always reports the base rate is perfectly calibrated but useless; shouting 95% to seem bold destroys calibration. Elite forecasters achieve both — decisive only when evidence warrants. Forecast error decomposes into a calibration term and a resolution term, so debriefs should ask two separate questions: were my probabilities honest, and did I dare to move off the middle? Calibration is cured by feedback; resolution by domain knowledge.
AI Prompts
中文提示词
这是我最近对 [领域/项目] 做的一批带概率的预测:[列出"事件 + 我给的概率 + 实际结果"]。请:
① 估计我的校准——把预测按概率分桶,对比每桶的实际兑现率,判断我是过度自信还是过度保守;
② 估计我的锐度——我是否大量预测都挤在 40%–60%,不敢果断?
③ 分别给出提升校准和提升锐度的一条具体动作。
English Prompt
Here is a batch of my recent probabilistic forecasts about [domain/project]: [list "event + my probability + actual outcome"]. Please:
1. Assess my calibration — bucket forecasts by probability, compare each bucket's stated probability to its realized hit rate, and judge whether I'm overconfident or underconfident.
2. Assess my resolution — are most of my forecasts clustered in 40%–60%, afraid to commit?
3. Give one concrete action to improve calibration and one to improve resolution.
训练分类模型时常用的对数损失(log loss / 交叉熵)本质上也是一个恰当评分规则,和布里尔同源——它逼着模型不仅猜对类别,还要把置信度调到诚实,这正是模型被"训练得校准"的数学原因。把它用到个人决策上,和第 42 期的决策日记正好咬合:日记记下当时的概率与理由,布里尔分数则给这本日记一个可量化的成绩单。坚持一年,你对自己判断力的认知会从"感觉还行"变成"有据可查"。
English Summary
Brier Score — collapses a probabilistic forecast into one number: (your probability − outcome)², where outcome is 1 if it happened, 0 if not; average over many forecasts, lower is better. It's a proper scoring rule: math proves the only way to maximize your expected score is to report your true belief — honesty becomes the optimal, ungameable strategy. The quadratic penalty is asymmetric: being confidently wrong (95% that fails) is punished far more than admitting 60%, so it suppresses overconfidence. It replaces binary right/wrong grading, which rewards the loud and punishes nuance. Practice: keep a forecast ledger, score it, and compare against a dumb base-rate baseline — if you can't beat the baseline, your "insight" is noise.
AI Prompts
中文提示词
我在为 [决策/项目] 做一组预测,想用布里尔分数评估。这是我的预测:[事件 + 概率 + 已知结果]。请:
① 算出我的平均布里尔分数;
② 用"永远报基础率"作为笨基准,对比我有没有超过它;
③ 指出哪几条是"自信地错了"(高概率却落空),它们贡献了多少惩罚,下次该如何收敛。
English Prompt
I'm forecasting for [decision/project] and want to evaluate with the Brier score. Here are my forecasts: [event + probability + known outcome]. Please:
1. Compute my average Brier score.
2. Compare it against a dumb baseline that always reports the base rate — did I beat it?
3. Identify the "confidently wrong" forecasts (high probability that failed), how much penalty they contributed, and how I should rein in next time.
集成学习的类比对技术人最顺手:别迷信"一个优雅大理论解释整个领域"(比如"scaling law 解释 AI 的一切"),那是刺猬陷阱;真正稳健的判断来自把多个视角加权集成。育儿同理——别皈依任何单一教养流派(依恋、虎妈、蒙氏),各取一瓢、按孩子的实际反馈不断调权,才是狐狸式父母。当你越笃定一个框架能解释全部,越要怀疑自己在过拟合。
English Summary
Fox vs Hedgehog — "The fox knows many things; the hedgehog knows one big thing." The hedgehog holds one grand theory, forces everything into it, is confident and media-friendly. The fox is eclectic, runs many competing models, self-doubts, and updates often. A landmark 20-year study found foxes forecast far better than hedgehogs — and the more famous and telegenic the expert, the worse the accuracy, because TV rewards simplicity and certainty, which are forecasting poison. The fox's edge is essentially an ensemble model: like a random forest beating a single tree, averaging many biased weak views cancels variance. The hedgehog fails not from stupidity but from a strong prior that refuses to update, explaining away every counter-example. Alarm bell: when one theory explains everything, you've become a hedgehog.
AI Prompts
中文提示词
我对 [议题] 的判断目前主要建立在这一套核心理论/框架上:[描述]。请帮我做"狐狸化"压力测试:
① 找出 3 个与我不同、甚至冲突的解释视角,各自会如何预测结局;
② 指出我是否在把反例都解释成"例外"(刺猬的典型症状);
③ 给出一个把这几种视角加权集成的综合判断,而不是单一断言。
English Prompt
My judgment on [issue] currently rests mainly on this core theory/framework: [describe]. Run a "foxification" stress test:
1. Surface 3 different, even conflicting, explanatory lenses and how each would forecast the outcome.
2. Point out whether I'm explaining away counter-examples as "exceptions" (the classic hedgehog symptom).
3. Give a combined judgment that weighs and ensembles these lenses, rather than a single confident claim.
给一个 AI 功能排期,团队的内视是"这次需求清楚,两周够"。外视:翻出过去 10 个 sprint 的实际交付,基础率可能是"声称两周的活,中位数花了五周"——这才是你的先验,再据本次的真特殊性微调。和第 7 期贝叶斯思维正好接续:基础率就是先验。分布式系统里估某类节点的故障率也一样——别对"这台服务器"凭空推理,去看一整群同型号节点的历史故障分布,那是远更可靠的起点。外视不是悲观,是把先验放回它该在的位置。
English Summary
Outside View & Base Rate — the inside view reasons from a case's specifics ("my project is special, three weeks"). The outside view first finds a reference class and its base rate ("of 10 similar projects, how many finished on time?"). The superforecaster's first commandment: anchor on the base rate, then adjust for this case's specifics — never start from a blank slate. We ignore base rates because the inside story is vivid and the reference class is dull — the root of the planning fallacy. Structurally this is Bayesian updating: the base rate is the prior, the case-specific evidence is the likelihood. Starting from the inside view discards the prior and explodes variance. The key skill is choosing the right reference class: similar on the structural features that decide success. Outside view isn't pessimism — it's putting the prior back where it belongs.
AI Prompts
中文提示词
我要预测/估计 [具体事件:结果、时长或成本]。我的内视判断是 [我的直觉估计 + 理由]。请帮我切换到外视:
① 提出 2–3 个合适的参照类,说明各自的相似性在哪;
② 给出每个参照类的历史基础率作为先验起点;
③ 从基础率出发,按我这件的真实特殊性做有节制的调整,给出最终概率/区间,并提醒我别滑回内视。
English Prompt
I need to forecast/estimate [specific event: outcome, duration, or cost]. My inside-view take is [my gut estimate + reasoning]. Help me switch to the outside view:
1. Propose 2–3 suitable reference classes and explain the relevant similarity of each.
2. Give the historical base rate of each as a prior starting point.
3. Starting from the base rate, make a disciplined adjustment for this case's genuine specifics, give a final probability/range, and warn me if I'm sliding back into the inside view.