Day 12 · Mental Models Deep Dive

Scientific Method

2026-05-11

MODEL 01

Hypothesis Testing

From vague intuition to testable claim.

Deep Dive

Hypothesis testing is the central loop of the scientific method: state a testable hypothesis, design an experiment, gather data, and judge whether the evidence supports or refutes the claim. The value of the loop is not "proving things right" — it is approaching truth through relentless attempts to falsify. The cognitive payoff: it forces a hazy intuition into a precise statement that reality can defeat. "I think this product has a market" is not a hypothesis. "Conversion rate exceeds 3% within one week on channel A" is.

The non-trivial insight: the most common error is not "the hypothesis failed" but "we tested the wrong hypothesis." People spend enormous effort validating "does this solution work?" without testing the more fundamental premise — "does the problem actually exist?" In a startup, founders carefully validate feature feasibility but skip the deeper assumption "do users really have this pain?" The correct order in hypothesis testing is bottom-up: test the most fundamental, cheapest-to-overturn assumptions first.

How to apply it: (1) decompose any decision into 3-5 layered hypotheses ranked from foundational to derived, and start at the bottom; (2) define a "killer metric" for each hypothesis up front — a number that, below the threshold, you abandon the hypothesis, and above which you commit; this eliminates after-the-fact rationalization; (3) cultivate the habit of pre-registration — write down your predicted outcome before the experiment so you cannot "reinterpret" the data after seeing it (the HARKing trap).

Classic Example

In the 19th century, Hungarian physician Ignaz Semmelweis observed that puerperal fever mortality differed wildly between wards. He proposed the hypothesis: "doctors going directly from the autopsy room to delivery carry 'cadaverous particles' on their hands, which are the lethal cause." Experiment: require doctors to wash with chlorinated lime water before delivery. Result: mortality in the washed ward plunged from 12% to 1.3%. A textbook hypothesis-test loop — testable conjecture, then let the data speak.

Scenario · BigCat

BigCat is considering replacing a manual team process with an AI workflow. Rather than swapping in full, the hypothesis-testing approach is:
Hypothesis: "The AI workflow can cut this task's processing time by 60% without lowering quality."
Experiment: pick one week of real tasks, run dual-track (manual vs. AI), and log time plus error rate.
Decision: compare the data. If the gain is under 30%, revisit the hypothesis — wrong tool, or wrong task for AI?

English Summary

Hypothesis-Testing is the engine of scientific progress: form a specific, falsifiable prediction, design an experiment, collect data, and update your beliefs accordingly. In business and investment contexts, it translates to disciplined small-scale testing before committing major resources — replacing "I think this will work" with "here's how we'll know if it works."

AI Prompts

English Prompt

I need to test this hypothesis: [your hypothesis]. Design the minimum viable experiment: (1) convert the hypothesis into a measurable, specific prediction, (2) outline the simplest possible test, (3) define clear pass/fail criteria in advance, and (4) identify the confounding variables most likely to corrupt the results.

MODEL 02

Falsifiability

Popper's criterion — only refutable theories are scientific.

Deep Dive

Karl Popper argued that to be scientific a theory must be falsifiable — there must exist some possible observation that could refute it. A theory that no evidence can refute is not science; it is faith. The key distinction: a scientific claim says "I have not been refuted yet" (provisional), while a pseudoscientific claim says "I can never be refuted" (eternally true). This is why "the market may go up or down tomorrow" is worthless — it can be falsified by no outcome, so it contains no predictive information.

The non-trivial insight: falsifiability is not only a yardstick for scientific theories but also a sharp blade for testing the quality of personal beliefs. The more non-falsifiable a person's belief system, the more fragile their cognitive system — because it has refused all feedback from reality. The most dangerous narratives in investing share the non-falsifiable feature: "it will rise in the long run" — no matter how long it drops, you can always say "not long enough." Falsifiable thinking demands you answer one question before forming a belief: what evidence would make me give up this belief? If you cannot answer, the belief has slipped out of the rational domain.

How to apply it: (1) for every important judgment you hold, write a "falsification checklist" — concrete, observable, time-bounded ("if X does not occur within 6 months, I abandon this judgment"); (2) watch for "non-falsifiability drift" — when you find yourself constantly revising the conditions to fit new data, your belief may have slid from science into faith; (3) when evaluating others' predictions or theories, the first question is always "under what circumstances would this prediction be proven wrong?" — if they cannot answer, the prediction carries no information.

Classic Example

Popper used "all swans are white" to illustrate falsifiability: the statement is scientific because finding a single black swan would refute it. In 1697, Europeans discovered black swans in Australia — the claim was falsified. Contrast that with "fate has a plan": no observation can refute it, and it has no scientific value. Falsifiability draws the boundary that says "this is the evidence that could overturn this belief."

Scenario · BigCat

BigCat holds a long-term bullish thesis on an AI chip company. Falsifiability demands an explicit statement: "If, over the next two quarters, the company's data-center business grows under 25% while competitors gain more than five percentage points of share, my core thesis must be re-evaluated." That statement is falsifiable — it draws the boundary of the belief and prevents the "explain anything" trap of confirmation bias.

English Summary

Falsifiability, proposed by Karl Popper, is the criterion that separates science from pseudoscience: a genuine theory must make predictions that could, in principle, be proven wrong. In investing, strategy, and everyday reasoning, falsifiability forces you to define in advance what evidence would change your mind — preventing the intellectual trap of theories that can explain everything and therefore predict nothing.

AI Prompts

English Prompt

I hold this view or thesis: [your view/investment thesis/belief]. Apply Popper's falsifiability test: (1) restate it as a concrete, testable prediction, (2) specify 3 observable outcomes that would force me to abandon or significantly revise this view, and (3) assess whether this belief has genuine predictive power or is merely an unfalsifiable narrative.

MODEL 03

Controlled Variables

Hold everything else constant to see the true effect.

Deep Dive

Controlling variables is the foundational move of experimental science: when you want to know A's effect on B, every other factor must be held constant — otherwise you cannot distinguish whether the change came from A or from something else. The mathematical basis is the partial derivative — to measure A's effect on B, you fix all other variables and take the derivative. This principle is routinely ignored in everyday decisions because real-world variables move at the same time, and the brain naturally treats "happens together" as "happens because."

The non-trivial insight: the biggest difficulty in controlling variables in practice is not "knowing what to control" but "not knowing what you didn't control" — the unknown confounders. That is why the randomized controlled trial (RCT) is the gold standard of causal inference: randomization does not require you to know every confounder; probability guarantees that unknown variables are evenly distributed across treatment and control groups. When an RCT is not possible (investing, education), the fallbacks are natural experiments and difference-in-differences (DiD) — finding quasi-random historical events as the basis for causal inference.

How to apply it: (1) when you change something in life or work, change only one variable at a time and observe for 2-4 weeks before introducing the next; (2) when multiple variables change together (the real-world default), at least list every variable you can identify and admit "I cannot tell which one is the cause"; (3) when evaluating someone else's causal claim, ask "what's the control group?" — a conclusion with no control group is an anecdote, not evidence.

Classic Example

In 1747, British naval surgeon James Lind, facing scurvy ravaging crews, designed one of the earliest controlled experiments in medical history. He took 12 sailors with similar symptoms, split them into 6 pairs, and gave each pair a different supplement — citrus, vinegar, seawater, cider, and so on — keeping the rest of the diet identical. Only the citrus pair recovered quickly. Because he controlled the other variables, he could attribute the cure precisely to the nutrient in citrus (later identified as vitamin C).

Scenario · BigCat

BigCat notices his child's math score rose 15 points in a month after using a new learning app. But during the same period the math teacher also changed, screen time dropped, and a daily practice habit kicked in. So is the gain the app's, the teacher's, or the practice habit's? Without controlling variables, this "experiment" can only show correlation; it cannot attribute. Real judgment requires isolating variables and testing one at a time.

English Summary

Controlled Variables is the experimental principle of isolating cause from effect: to understand the impact of variable A on outcome B, everything else must be held constant. Without this discipline, we mistake correlation for causation and attribute effects to the wrong causes. In product testing, investing, and parenting — wherever you're trying to learn what actually works — controlling variables is the difference between knowledge and noise.

AI Prompts

English Prompt

I'm trying to understand what's driving [outcome/phenomenon]. Help me apply controlled variable thinking: (1) list all variables that could plausibly influence the outcome, (2) design an experiment that isolates the key variable, and (3) identify which confounding variables in my current analysis might be leading me to the wrong causal conclusion.

MODEL 04

Correlation ≠ Causation

Moving together is not the same as causing together.

Deep Dive

Correlation ≠ causation is the most basic statistical principle and the most frequently violated: two variables moving together in the data (correlation) does not mean one causes the other. Correlations can arise from three structures: (1) A really does cause B; (2) B in fact causes A (reverse causation); (3) an invisible third variable C drives both A and B (confounder). The brain is wired to read correlation as causation, because rapidly inferring cause and effect ("the bush moved → predator may be near") gave a huge survival edge to our ancestors, even with occasional false alarms. But in a data-driven modern world this instinct becomes a systematic cognitive trap.

The non-trivial insight: big data has not reduced this trap — it has grown it exponentially. When the dataset is large enough, you can find a statistically significant correlation between almost any two variables (the multiple-comparisons problem). More subtle is the "inverted causal chain" — we see "people who meditate make better decisions" and roll out a meditation program, without testing whether a third variable like "high self-discipline" drives both meditation habits and good decisions. Judea Pearl's ladder of causation makes the point: going from "observation" to "intervention" to "counterfactual" requires fundamentally different reasoning tools at each level; pure data mining is stuck on the first rung.

How to apply it: (1) for any "A and B are correlated" finding, immediately sketch all three causal diagrams (A→B, B→A, C→A+B) and ask which is most plausible; (2) ask "if I intervene on A (rather than passively observe), does B still change?" — that is the gold question for separating correlation from causation; (3) in life and investment decisions, prioritize RCT-based conclusions and stay highly skeptical of causal claims from purely observational studies.

Classic Example

Statistics show ice-cream sales are strongly positively correlated with drowning incidents. The conclusion "eating ice cream causes drowning" is obviously absurd. The real cause is a common third variable — summer heat: when it's hot, people buy more ice cream and swim more, so drownings rise too. The textbook example of "spurious causation," reminding us that correlation is only the starting point of causal inquiry, never the endpoint.

Scenario · BigCat

BigCat notices that people who meditate daily tend to make higher-quality investment decisions. Does meditation improve decision quality (causal), or do already-disciplined people both meditate and make more rational investment decisions (third variable: self-discipline)? Before adopting a "meditation training program" as a personal practice, BigCat must ask: is this correlation backed by an RCT?

English Summary

Correlation ≠ Causation is perhaps the most important statistical principle for clear thinking. Just because two variables move together doesn't mean one causes the other — there may be a third confounding variable, or the relationship may be pure coincidence. In the age of big data, distinguishing genuine causal signals from spurious correlations is one of the highest-value cognitive skills.

AI Prompts

English Prompt

I've observed a correlation between [A and B] and am tempted to conclude that A causes B. Rigorously challenge this causal inference: (1) propose 3 alternative explanations (confounders, reverse causation, coincidence), (2) describe the empirical tests that could distinguish true causation from mere correlation, and (3) assess the current state of evidence for and against a causal relationship.