Mental Models: Algorithmic Thinking

Divide and Conquer

"To solve a problem, find a way to make smaller versions of itself."

In Depth

Divide-and-conquer is not the shallow idea of "break the big problem into smaller ones." It requires that subproblems be structurally isomorphic to the original AND relatively independent. Only when both hold does recursion actually reduce complexity. Otherwise you've just moved code around without saving any work.

Non-trivial: (1) the merge step decides everything. The Master Theorem makes it formal — if merge cost ≥ subproblem cost, divide-and-conquer doesn't pay. This is why "split-then-combine" sounds appealing but often delivers no speedup. (2) The dominant failure mode is "pseudo-divide": subproblems share state, call back into each other, look split but are tightly coupled. Diagnostic: can two subproblems be solved by two people in two rooms without speaking? If no, the boundary is wrong. (3) Structurally isomorphic to the Buddhist deconstruction of self: what looks like one "big problem" is an aggregate; see it as a small number of independent strands and difficulty drops an order of magnitude.

Practice: before dividing, answer two questions — "How expensive is my merge?" and "How many hidden dependencies cross the subproblem boundary?" If either is high, stop dividing and pick a different strategy.

Classic example

Merge sort, O(n log n): split the array in half, recurse, then linearly merge. Merge cost O(n) < subproblem cost → the Master Theorem yields O(n log n). MapReduce is the engineering-scale incarnation — Google indexed petabytes by ensuring "map is independent, reduce is cheap." The moment reduce becomes expensive (cross-shard joins, complex aggregations), the divide-advantage evaporates and the framework turns into a performance killer.

BigCat scenario

(1) Engineering: split an "AI system overhaul" into data layer / model layer / application layer / evaluation layer. If those four layers need a weekly 3-hour sync meeting to move forward, that's pseudo-divide — the boundary is wrong. True divide = the four teams take interface contracts and run independently for 90 days. (2) Learning a new field: split "introduction to quantum mechanics" into math basics / physical intuition / experimental facts / philosophical interpretations. The strands are relatively independent (interpretations don't change the math) and the merge is cheap (draw one concept map at the end). (3) The earliest symptom of pseudo-divide is meeting growth — if cross-subteam meetings climb exponentially, the subproblems aren't independent enough. Redraw boundaries; don't schedule more syncs.

AI Prompt

English Prompt

I'm applying divide-and-conquer to [problem/project], split into subproblems [sub1, sub2, ...]. Stress-test me: 1. Are these subproblems truly "structurally isomorphic + relatively independent"? Flag the two most likely hidden couplings. 2. Estimate the merge-step cost — will it eat the parallelism gain? 3. If you detect pseudo-divide, propose one re-partition that makes subproblems genuinely independent.

Dynamic Programming & Memoization

"Those who cannot remember the past are condemned to recompute it." — paraphrasing Santayana

In Depth

The surface story is "trade space for time" — that's the appearance. The real requirement is that the problem simultaneously exhibits overlapping subproblems (the same computation is needed many times) AND optimal substructure (the global optimum can be assembled from subproblem optima). Miss either and DP degenerates into ordinary recursion or greedy.

Non-trivial: (1) everything hinges on state definition. Pick the wrong state and no clever transition can save you — you'll hit state-space explosion, going from O(n²) to O(n⁴). (2) Top-down memoization vs bottom-up tabulation have different mental shapes — the former is "cache on demand," lazy but close to natural thinking; the latter is "systematically fill the table," memory-hungry but more controllable. Beginners graduate from the first to the second. (3) Non-trivial observation: most life problems are not DP problems — they lack optimal substructure (you don't keep revisiting identical states; the environment drifts). Misapplying DP to life means polishing answers to past states that are no longer relevant, locking yourself into path dependence. (4) Structurally isomorphic to the Bellman equation in reinforcement learning: intelligence = learning which subproblems are worth memoizing and which are noise that shouldn't be cached.

Practice: before coding, ask "what's the dimensionality of my state space?" Each new dimension grows the table exponentially. If state exceeds 3 dimensions, compress or abstract first, then DP.

Classic example

0-1 knapsack: naive recursion is O(2ⁿ) and becomes uncomputable at 30 items. DP compresses it to O(n·W) via a 2D table (item index × remaining capacity) — milliseconds. The same subproblem "best value using items 1..i with capacity w" is hit millions of times; without memoization you re-climb the same tree over and over. Reinforcement learning's Q-learning is DP extended to stochastic environments — the Q-table is noisy memoization.

BigCat scenario

(1) AI agent engineering: every LLM call thinks from scratch = no memoization, exponential token waste. Add a "decision cache" layer to your agent — record "for similar past queries, what did I reason and conclude?" — collapsing repeat queries from O(K tokens) to O(1) lookup. Prerequisite: your queries actually have overlapping subproblems. (2) Personal decision lookup table: write recurring life decisions (take this project? attend this meeting?) as "criteria + past precedents" — next time a similar situation arises, look up, not re-derive, and decision fatigue collapses. (3) But beware: when the problem distribution drifts, the DP table expires — your child enters a new developmental stage, the AI paradigm shifts, the industry reorganizes — old tables instantly become stale. Healthy practice is periodic "rebuild" and dating each cache entry with an expiry.

AI Prompt

English Prompt

I want to apply DP / memoization to [problem/workflow]. Please evaluate: 1. Does this problem truly satisfy both "overlapping subproblems" AND "optimal substructure"? If either is missing, name it explicitly. 2. If DP applies, propose the most parsimonious state definition and estimate the state-space dimensionality — if > 3 dimensions, suggest how to compress. 3. Flag which sub-results should not be cached (distribution drift, one-off events, out-of-distribution inputs).

Greedy & Local Optima Trap

"Each step looks optimal — and that's exactly why you're stuck on the wrong hill."

In Depth

Greedy = pick the locally best move at every step. Tempting because it's simple, fast, and "often good enough." But it's strictly optimal only on the rare problems with matroid structure. Most real problems are not matroids — greedy will reliably ship you to a local-optimum trap, with the far-off global peak forever invisible.

Non-trivial: (1) the mountain metaphor: greedy is "always step uphill," and you finish at the nearest hilltop to your starting point. The hilltop is not the highest peak in the range. (2) Engineering escapes from local optima come in three families — simulated annealing (allow occasional descent), beam search (keep K candidates), random restart (greedy from many starts). All three share the same move: tolerate temporary worsening. (3) The inverse cognitive trap from sunk-cost: sunk-cost is reluctance to let go of the past; greedy local optima is blindness to the far — one is rear-view blindness, the other is far-view blindness. (4) Structurally isomorphic to the Buddhist concept of avidyā (ignorance): blinded by the brightest nearby lamp, you can't see the global terrain. Awareness = switching from "best at each step" to "best overall" as the optimization target.

Greedy hill-climbing only reaches the nearest hilltop; the global peak often requires going downhill first

Classic example

Dijkstra's shortest path is greedy AND globally optimal — because shortest-path on graphs has matroid structure (a confirmed shortest distance can't get shorter). But greedy on 0-1 knapsack fails — "pack the highest value-per-weight item first" misses the optimum at certain capacities. Coin change: greedy works for [1, 5, 10, 25] cents, but on [1, 3, 4] coins to make 6 it returns 4+1+1 (three coins) instead of optimal 3+3 (two coins). Whether greedy works depends on the problem's skeleton, not on your intuition.

BigCat scenario

(1) Career choice: jumping every year to "the next job with the biggest raise" is classic greedy. But the career function has no matroid structure — three years in management dulls technical depth; three years in pure technical work closes the management track. Step-wise local optimality locks you onto a small hilltop. (2) Antidote: deliberately "simulate-anneal" — accept one descent (a temporary pay cut, a hard pivot into AI, going abroad, founding something) to buy the option of jumping out of the local optimum. (3) Parenting: every day choosing "the path that triggers the least fuss" (more screen time, more toys) is greedy. You'll grind into a local optimum where the child permanently avoids hard things. The short-term uncomfortable "do the hard task together" move is the parenting version of simulated annealing. (4) Non-trivial AI corollary: a large language model is itself a giant greedy machine — token-by-token sampling of the most probable next move — which is precisely why beam search, temperature, and long-horizon planning exist as counter-greedy mechanisms.

AI Prompt

English Prompt

My current strategy in [domain/decision] is [describe how I pick each step]. Diagnose: 1. Am I running a greedy algorithm? Does this problem have matroid structure that guarantees greedy is optimal? 2. Which local-optimum hill am I likely stuck on? What does a higher global peak look like from afar? 3. Design one concrete "simulated-annealing" move — which temporary descent to accept, and which cross-peak opportunity it buys.

Heuristics & Approximation

"Perfect is the enemy of good — and in NP-hard problems, perfect doesn't even exist in your lifetime."

In Depth

When a problem is provably NP-hard, the optimum is unreachable — not "try harder" unreachable, but "the universe will end first" unreachable. You must trade optimality for tractability. But how you trade matters: good heuristics carry approximation guarantees (e.g., 1.5-approximate TSP); bad heuristics are guesses dressed up as method.

Non-trivial: (1) the closer a heuristic function tracks the true optimum, the cheaper the search. A*'s efficiency lives or dies by it — a bad heuristic degrades A* into BFS; a perfect heuristic linearizes the search. (2) Heuristic strength = depth of structural understanding of the problem. Counter-intuitive: heuristics aren't shortcuts around knowledge, they are compressed knowledge. (3) Structurally isomorphic to "intuition": human experience-based intuition is a heuristic function learned over a lifetime by the brain. But beware — a good heuristic is valid only inside its training distribution; out-of-distribution (new domain, new context), it silently fails. This is why senior experts crash on cross-domain moves. (4) Monte Carlo Tree Search (MCTS) + neural value function = the strongest general-purpose heuristic of our era (AlphaGo compressed the 10^170 Go search space into a playable game). And this is the lifeline of LLMs too: they are essentially "approximators that learned vast heuristics from human language" — strong inside familiar distributions, failing in the sparse long tail.

Practice: in a new domain, first admit your heuristic is empty and do honest slow search; only after enough data has accumulated should you trust the fast take. Skip this step and you'll exponentially amplify a wrong heuristic.

Classic example

The Traveling Salesman Problem is NP-hard — 50 cities means 50! routes, more than the universe's age can enumerate. Christofides' algorithm gives a 1.5-approximation guarantee (route length ≤ 1.5 × optimum), good enough in practice. AlphaGo doesn't enumerate Go: it's MCTS (heuristic sampling) + a learned value network (the heuristic function), turning the intractable into the tractable. Modern SAT solvers handle billion-variable formulas via stacked heuristics (DPLL, CDCL) that suppress exponential blow-up.

BigCat scenario

(1) AI agent planning: asking an LLM for "the complete optimal plan" is shoving an NP-hard problem into the context window — the result is hallucination collage. The better move is to hand it a set of good heuristic functions ("list your assumptions first," "ask 'why' at each step," "prune immediately on failure") and let it run heuristic search instead of brute force. (2) Major personal decisions: when the problem is too large to enumerate ("will I be satisfied 30 years from now?"), stop hunting for the optimum and switch to "approximately acceptable + preserve optionality" — add a 1.5× buffer around heuristic conclusions to absorb bias. (3) Parenting and education: abandon "the optimal education path" — it's NP-hard and the distribution will shift (the world in 10 years is unrecognizable). Design a small set of good heuristics ("prioritize meta-abilities," "preserve optionality," "periodically retrain your own heuristic function"). (4) Meta-insight: "experience" is essentially a heuristic function — measure experience not by years, but by approximation ratio and applicable distribution. Actively retraining when crossing domains is how senior people stay effective.

AI Prompt

English Prompt

The problem I face: [describe]. My intuition/experience says the answer is [describe]. Audit this heuristic: 1. Is this problem NP-hard or near-NP-hard? If so, name plainly that the optimum is unreachable. 2. Which training distribution did my current heuristic learn from? Is the present problem inside that distribution? Flag 2 out-of-distribution warning signs. 3. Propose one strategy with explicit approximation-ratio awareness — what factor of optimum am I targeting, and how much buffer do I need against bias?