Mental Models: Mechanism Design & Incentives

Incentive Compatibility

"A good rule makes each person's selfish calculation lead exactly to the result you want."

In Depth

Mechanism design is "reverse game theory": game theory takes the rules and predicts how people will play; mechanism design flips it — you fix the outcome you want, then engineer the rules that force that outcome. Its core goal is incentive compatibility — designing the rules of the game so that the action each participant takes in pursuit of their own interest is exactly the action you want them to take. Telling the truth, exerting effort, not exploiting loopholes becomes their dominant strategy, not a moral demand.

Non-trivial: (1) It swaps "rely on people's conscience" for "rely on structure automatically" — you needn't assume participants are noble, only make betrayal unprofitable. (2) There's a powerful revelation principle: any outcome a mechanism can achieve can be reproduced by a simple "truth-telling is optimal" mechanism, so the designer can focus on one question — how to make honesty pay. (3) Its opposite is Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." Once incentives are misaligned, people optimize the metric itself rather than the real purpose behind it — the textbook symptom of a failed mechanism.

Classic example

The "I cut, you choose" rule for splitting a cake: the one cutting knows the other picks first, so to avoid shortchanging themselves they must cut as evenly as possible. No one preaches fairness, yet fairness happens automatically — the rule channels "selfishness" into "justice." That's incentive compatibility at its simplest.

BigCat scenario

(1) When designing KPIs/OKRs, ask first: "If the team optimized only this number and ignored my true intent entirely, what would happen?" If the answer is absurd, the incentive isn't compatible and will be gamed (case counts soar, quality collapses). (2) An RLHF reward model is essentially a mechanism-design problem: if the reward can be reward-hacked, the model learns to please the scorer rather than be genuinely useful — half the difficulty of alignment is designing an ungameable, incentive-compatible reward.

AI Prompt

English Prompt

I'm designing incentives/metrics for [team / product / collaboration], aiming for [the outcome I actually want], with candidate metrics [current metrics]. Use Incentive Compatibility to stress-test: 1. If participants coldly optimize this metric while ignoring my true intent, how would they game it? (Goodhart's Law) 2. How can I redesign the rules so that telling the truth / genuinely exerting effort becomes their dominant strategy, not a matter of goodwill? 3. Give me one more incentive-compatible alternative, and name what it trades off.

Principal-Agent Problem

"You hire someone to act for you, but what they can see — and what they want — differ from yours."

In Depth

The principal-agent problem: a principal hires an agent to act on their behalf, but two cracks sit between them — misaligned goals (the agent has private interests) and asymmetric information (the principal can't see whether the agent truly exerted effort). This structural crack breeds two classic risks: ex-post moral hazard (shirking or self-dealing after the contract, since you can't see it), and ex-ante adverse selection (agents hide their true "type"; the murkiest are keenest to apply).

Non-trivial: (1) The root isn't "bad people" but "invisibility" — with fully transparent information you'd just verify by outcome and the agency problem vanishes. (2) The mainstream fix ties the agent's payoff to observable results (commission, equity, pay-for-performance), but this pushes risk onto the agent — so there's a permanent trade-off between strong incentives and overloading the agent with risk. (3) Agency chains nest: shareholders → board → CEO → middle managers → staff, each link leaking a layer of incentive, so the longer the chain, the further goals drift.

If effort is invisible, tie pay to visible results — at the cost of pushing risk onto the agent

Classic example

Shareholders vs. professional managers: shareholders want long-term value; the manager may prefer short-term optics, a personal bonus, and status. Shareholders can't watch every decision, so they use stock options to bind the manager's wallet to the share price — making "good for the company" also "good for me."

BigCat scenario

(1) You delegate a complex task to an AI agent; its implicit goals (finish fast, burn fewer tokens) need not equal yours (do it thoroughly) — this is AI alignment as a principal-agent problem in miniature. The patch is the same: "tie to outcomes + add monitoring" — define verifiable success criteria and require it to leave inspectable traces at key steps. (2) Outsourcing, hiring, and remote work follow suit: rather than micromanage (information cost is huge), structure pay so the agent benefits most precisely when doing the right thing.

AI Prompt

English Prompt

As the principal, I'm delegating [task / responsibility] to [agent: employee / contractor / AI agent / partner]; what I truly want is [goal]. Use the Principal-Agent model to design: 1. What do the goal misalignment and information asymmetry look like here? Where might moral hazard and adverse selection show up? 2. Which outcomes can I observe? How do I tie pay/acceptance to those observable signals so the agent benefits most by doing the right thing? 3. How much risk does strong incentive shift onto the agent? Give me a design balancing incentive strength against risk-bearing.

Auction Theory

"A well-designed auction extracts true valuations bidders might not even want to state."

In Depth

An auction is far more than "highest bidder wins"; it's a price-discovery and information-revelation machine: when no one wants to show their hand, the rules "force out" the true valuations scattered across everyone's minds. The most counterintuitive move is the second-price auction (sealed bids, highest wins, but pays the second-highest price): because what you pay is decoupled from what you bid, neither lowballing nor inflating helps, and bidding your true value becomes the dominant strategy. That's incentive compatibility applied to pricing.

Non-trivial: (1) The Revenue Equivalence Theorem: under ideal conditions, English, Dutch, first-price, and second-price formats give the seller the same expected revenue — the format war matters less than imagined; what truly counts is attracting enough serious bidders. (2) The Winner's Curse: when the true value is unknown to all (common value — oil fields, M&A), the winner is often exactly whoever overestimated most wildly — winning the auction is itself the bad news that "you bid too high." The rational response is to shade your bid downward in advance.

Classic example

Real-time bidding in internet advertising: every time you open a page, countless advertisers bid in milliseconds for that impression. Platforms commonly use second-price-style rules precisely so advertisers can bid their true value with confidence without second-guessing rivals — mechanism design becomes the bedrock of a multi-hundred-billion-dollar business.

BigCat scenario

(1) Cloud spot instances are a real-time auction: idle compute is won cheaply but reclaimed when prices spike. Understanding the auction lets you place fault-tolerant tasks there to save cost and keep critical tasks on stable resources. (2) To allocate scarce internal resources (GPU quota, expert hours, plum projects), rather than relying on grabbing or a boss's fiat, design a lightweight auction (e.g., bidding with virtual credits) so whoever truly needs it most reveals themselves. (3) In M&A, bidding wars, and talent fights, always beware the Winner's Curse: if you won, it may simply be because you overestimated it more than everyone else.

AI Prompt

English Prompt

I'm facing how to allocate/price a scarce resource: [describe the resource, the participants, and the outcome I want — max efficiency / max revenue / the one who needs it most gets it]. Use Auction Theory to design: 1. Which auction format fits, and why? Would a second-price design let people bid their true value safely? 2. Is there a Winner's Curse risk here (unknown common value, won by grabbing)? How much should I shade bids down? 3. Beyond agonizing over format, what matters more for the outcome (e.g., attracting more serious bidders)?

Signaling

"A credible signal is one a faker can't afford to imitate — its value lies in the cost of faking it."

In Depth

When one party holds private information (I'm truly skilled / my product truly lasts) that the other can't directly verify, how do you transmit it credibly? The answer lies not in "what you say" but in "what you do — and something a faker can't imitate". The key is cost asymmetry: a signal is unfakeable only when it's cheap for the genuine article and expensive enough for the fake to be not worth it, letting the market separate signalers from the silent (a separating equilibrium). Verbal promises anyone can make, so they're worthless; costly actions a fake can't copy, so they carry value.

Non-trivial: (1) The famous education signaling view: a degree's value lies in considerable part not in what you learned but in the fact that "being able to endure it" itself screens out those lacking ability or grit — it's a hard-to-fake ability signal, even if the course content is long forgotten. (2) This explains why so much "seemingly wasteful" behavior is rationally there: warranties and no-questions returns signal quality; a founder staking their own net worth signals skin in the game. (3) A new meaning in the AI era: when text, code, and work can all be mass-generated by large models at near-zero cost, "cheap signals" depreciate en masse, while cost-asymmetric, hard-to-mass-fake signals (real long-term investment, verifiable past track records) become more valuable.

Classic example

The peacock's huge, heavy, predator-attracting tail: precisely because it's a "handicap," a weak peacock simply can't afford it, so it becomes an honest signal of health — peahens choosing by it won't be deceived. Biology calls this the "handicap principle": a signal's credibility comes exactly from a cost too high for fakes to bear.

BigCat scenario

(1) For a technologist, a solid body of open-source contributions and a verifiable project track record is a stronger ability signal than any self-introduction — it's hard to fake because it's the sediment of real long-term investment. (2) In hiring and partnerships, rather than listen to what someone says, watch what cost-asymmetric action they'll take: will they build a small sample first, will they share the risk? (3) When you need to establish credibility amid AI-flooded content, stop piling up cheap output — signal with what others can't imitate: original depth of insight, a public and verifiable long-term record.

AI Prompt

English Prompt

I want to credibly signal [a hard-to-verify quality of mine: skill / quality / sincerity / capability] to [audience: employer / client / partner / market], but talk alone won't do. Use Signaling to design: 1. Which signals are cost-asymmetric — cheap for the real me, prohibitively expensive for a fake to imitate? 2. Are my current signals "cheap signals" (anyone can claim / mass-produce them) and therefore depreciating? 3. Give me 2 hard-to-fake high-credibility signals (e.g., skin in the game, a verifiable track record, doing a small sample first), and rank them by ROI.