① 评估会议与资料:开两小时会后,你对决策的概率判断毫无变化 → 这两小时的熵贡献为零,是纯仪式。② AI:大模型的困惑度(perplexity)就是交叉熵的指数——模型对下一个 token 的"平均意外度";困惑度低 = 模型对语言的不确定性小。③ 育儿:孩子说"我不知道"可能恰恰是高熵的诚实信号;逼她给一个低熵的确定答案,反而是在制造假信息。先看清不确定性有多大,再决定要不要急着消除它。
English Summary
Shannon Entropy — entropy isn't "disorder," it's the average surprise of a probability distribution: the less certain you are about an outcome, the higher its entropy. H = −Σ p·log p measures how much uncertainty is resolved, on average, when the outcome is revealed (in bits). Key: information ≠ meaning — Shannon stripped semantics deliberately. Surprise = −log p, so rare events carry more information. Entropy is maximal under a uniform distribution. It's the same mathematical object as Boltzmann's thermodynamic entropy, and the same quantity the predictive brain minimizes as "surprise." Practical test of any message: by how much did it shrink your uncertainty? If your probability estimate didn't move, its information content was zero.
AI Prompts
中文提示词
我要评估 [一份报告/一次会议/一个数据源] 的信息价值。请用香农熵的视角帮我:
① 在接触它之前,我对 [关键问题] 的判断分布大致是什么?接触之后又是什么?
② 据此估计它真正消除了多少不确定性(高/中/零);
③ 如果信息量接近零,指出它是"纯仪式"还是"只在确认已知",并给出一个能带来高信息量的替代来源。
English Prompt
Help me assess the information value of [a report / meeting / data source] through the lens of Shannon entropy:
1. Before encountering it, what was my belief distribution over [the key question]? After?
2. Estimate how much uncertainty it actually removed (high / medium / zero).
3. If the information content is near zero, say whether it's pure ritual or mere confirmation of the known, and propose one higher-information alternative source.
Channel Capacity — every noisy channel has a hard ceiling C: transmit below it and coding can drive the error rate arbitrarily low; transmit above it and no code, however clever, can remove the errors. Shannon's most counterintuitive result: near-perfect communication over an unreliable channel is possible — up to a wall. It's a sharp phase transition, not gradual decay. Reliability is bought with redundancy plus latency (longer codes). In C = B·log(1 + S/N), gains from raising signal-to-noise are only logarithmic, so brute-forcing power has fast-diminishing returns. The concept transfers to any noise-limited transport — teaching, org communication, human-AI collaboration. When communication keeps failing, first ask: is the content wrong, or is the channel over capacity?
AI Prompts
中文提示词
我在 [某个沟通/教学/协作场景] 里反复失败。请用信道容量的视角诊断:
① 这条"信道"的容量大致由什么限制(注意力/工作记忆/带宽/上下文窗口)?
② 我是不是在超容量发送(一次塞太多)?给出"降速率"和"加冗余"两条具体改法;
③ 我有没有在用"提功率"(说得更大声/更频繁)去解一个容量问题?指出回报为什么递减。
English Prompt
I keep failing at [a communication / teaching / collaboration setting]. Diagnose it via channel capacity:
1. What limits this "channel's" capacity (attention / working memory / bandwidth / context window)?
2. Am I transmitting over capacity (too much at once)? Give one "lower the rate" and one "add redundancy" fix.
3. Am I throwing "more power" (louder / more often) at what is really a capacity problem? Explain why the returns diminish.
编码与压缩 · Coding & Compression
"Compression is comprehension." — 信源编码定理把压缩与理解画上了等号
摩尔斯电码早在 1838 年就凭直觉用上了最优编码思想——最高频的字母 E 编成最短的一个点"·",罕见的 Q 编成长串。这与一个世纪后香农信源编码定理"短码给高频符号"的结论完全一致:好的编码,就是让常见的东西更省力。
场景 · BigCat
① 笔记:好笔记不是抄全,而是有损压缩——写摘要的动作逼你去找结构,找不到结构就压不动;压缩失败本身就是"我还没懂"的诚实信号。② AI:大模型本质上是把整个互联网做了一次有损压缩,"理解"在信息论意义上就等同于压缩能力;这也是为什么"会复述"和"会压成一句话"是两种不同的能力。③ 知识体系:读完 50 篇论文后能画出一张地图(少数原理生成多数结论)= 你把这个领域压缩了;若只能逐篇复述 = 还停在零压缩。能压缩,才算真的拥有。
English Summary
Coding & Compression — compression is the removal of redundancy: short codes for frequent symbols, long codes for rare ones, with average length bounded below by entropy (Shannon's source-coding theorem). Lossless compression can never beat the entropy; truly random data is incompressible. The deep equation: compression = comprehension. A model that compresses data well has found its structure; memorization is zero compression, understanding is high compression. Kolmogorov complexity pushes this to the limit — an object's complexity is the length of the shortest program that generates it. Abstraction is lossy compression (throwing away detail you don't need); science and Occam's razor are the shortest encoding of observations; generalization in learning is compression. Test of real understanding: can you compress it to one sentence without distortion?
AI Prompts
中文提示词
我想检验自己是否真懂 [某个概念/领域/系统]。请用"压缩=理解"的框架考我:
① 让我先把它压成一句话的"生成规则",再指出我压掉的是不是真正的冗余、有没有失真;
② 如果我只能罗列细节、压不短,诊断我是卡在"零压缩的记忆"阶段,还是抓错了核心;
③ 给出一个把它压得更短、又不丢关键结构的更优"编码"。
English Prompt
I want to test whether I really understand [a concept / field / system]. Use the "compression = comprehension" frame:
1. Have me compress it to a one-sentence generative rule, then judge whether what I dropped was real redundancy or a distortion.
2. If I can only list details and can't compress, diagnose whether I'm stuck at "zero-compression memorization" or grasping the wrong core.
3. Offer a shorter "encoding" that loses no essential structure.
互信息 · Mutual Information
I(X;Y) — 知道一个变量,能让你对另一个变量的不确定性减少多少
中文详解
互信息 I(X;Y) 度量知道 X 能让你对 Y 的不确定性减少多少:I(X;Y) = H(Y) − H(Y|X)。它对称、非负,当且仅当 X 与 Y 相互独立时为零。
非平凡点:① 互信息是相关系数的彻底升级版。相关系数只能捕捉线性关系,两个变量可以相关系数为零却高度依赖(如 Y=X²);互信息捕捉任意形式的依赖,是"X 到底含不含 Y 的信息"的终极判据。② 一个信号或指标的价值 = 它与你真正关心的目标之间的互信息。KPI 之所以会失效(古德哈特定律,见 D50),正是因为你优化的代理指标与真实目标之间的互信息被亲手破坏了。③ 信道容量在数学上就是互信息的最大值——这把四个模型缝成了一体:通信、编码、不确定性、依赖,本是同一套语言。④ 与表征学习同源:"信息瓶颈"原理说,好的学习 = 把输入 X 里无关的细节压掉、同时尽量保留与目标 Y 的互信息;大脑与神经网络都在做这件事——构造与"对未来有用的东西"互信息最大的内部表征。
① 选指标:盯一个与最终结果互信息低的代理指标(比如"代码行数"之于"软件价值"),越优化越跑偏——这就是古德哈特定律的信息论根源。② 注意力分配:信息过载时代,真正稀缺的不是信息,而是"与你的决策高互信息的信息";过滤的本质,是按互信息给信息排序。③ 提问:一个好问题,是答案与你真正想知道的事互信息最大的那个问题;多数无效沟通,都耗在问互信息接近零的问题上。不是信息越多越好,而是互信息越高越好。
English Summary
Mutual Information — I(X;Y) = H(Y) − H(Y|X) measures how much knowing X reduces your uncertainty about Y. It's symmetric, non-negative, and zero iff X and Y are independent. It's a strict upgrade over correlation: correlation catches only linear dependence (Y=X² can have zero correlation yet full dependence), while mutual information catches dependence of any form. The value of any signal or metric equals its mutual information with the outcome you actually care about — which is exactly why proxy metrics fail (Goodhart's Law) when you destroy that shared information. Channel capacity is, mathematically, the maximum of mutual information — uniting all four models. The Information Bottleneck frames learning as compressing X while preserving its mutual information with the target Y. Before collecting data or asking a question, ask: how high is its mutual information with what I must decide?
AI Prompts
中文提示词
我在用 [某个指标/数据源/问题] 来支撑 [某个决策]。请用互信息的视角审查:
① 这个信号与我真正关心的结果之间,互信息估计是高、中,还是接近零?
② 它有没有"线性相关看着不错、其实依赖很弱"或反之的情况?
③ 若互信息低,指出我是否落入了古德哈特陷阱,并提出一个与目标互信息更高的替代信号。
English Prompt
I'm using [a metric / data source / question] to support [a decision]. Audit it via mutual information:
1. Is this signal's mutual information with the outcome I truly care about high, medium, or near zero?
2. Is there a "looks linearly correlated but barely dependent" mismatch, or the reverse?
3. If it's low, tell me whether I've fallen into a Goodhart trap, and propose a substitute signal with higher mutual information with the goal.