A podcast is the only medium that crawls into someone's ear and keeps them company while they wash dishes or commute. No visuals, no rewind, pausable at any second—yet it's the fastest-growing form of expression of our time. Writing for audio follows different rules than writing for the page: you aren't writing words, you're designing a stretch of breathing, spoken language for one person's ear. This week, four tools from the backrooms of This American Life, Fresh Air, and Serial.
Principle 01
Conversation Design: Write for the Ear, Not the Eye
Write to be heard, not read
Spoken script · Intimate medium
The Principle
Print lets readers reread, skim, stop to look up a word; the ear gets none of that. Listeners get one pass, and one wandering thought loses them. So every line of an audio script must be one idea, sayable in a single breath, sounding like a real person. Don't write "in conclusion"—write "so." Don't nest three relative clauses—split them. And always read it aloud: if it's awkward to say, it's awkward to hear.
In Their Words
"Writing is talking to someone else on paper. Write with the same clarity and warmth you'd use to talk to a friend."
— William Zinsser, On Writing Well
Audio pushes this line to its limit: you really are talking to one person—only the paper has become an ear.
Why It Works
The brain processes "seeing" and "hearing" through different channels. Written sentences build structure from punctuation and paragraphs; the ear can't catch a comma—it only follows tone, pause, and rhythm. So writing for the ear means translating visual structure into aural structure: short sentences instead of long, repetition instead of pronouns (don't say "the former," just say the word again), signposts ("first, one thing") instead of headings. Picture phoning one specific person, not addressing a crowd in a square.
Revision in Action
The aforementioned methodology yields a substantial improvement in user retention metrics.So here's what happened. We changed one thing—and people stopped leaving. Retention jumped.
Growth this quarter demonstrated a marked upward trajectory, principally attributable to channel optimization.We grew fast this quarter. Why? We changed one thing—where we ran the ads.
When to Use · Common Traps
✓ Podcast narration, voice memos, audio content, meeting openers—anything meant to be heard, not seen
✗ Reading a blog post or doc aloud verbatim—prose-voice is hard and tiring in the ear
Trap: a sentence with three stacked clauses—by the end, the listener forgot the start
Trap: "the aforementioned," "the latter," "said process"—written pronouns keep dropping the thread
This Week's Practice · Reflection
Take a recent tech explanation or email and read it aloud on a recording. Play it back: which sentence broke your breath, which one left you unsure of the subject? Cut them short, make them spoken, record again. Reflection: where's the line between "sounds like talking" and "imprecise"? Does going spoken cost you accuracy?
Principle 02
Interview Prep: A Good Question Makes Them Tell the Story
The question is the craft
Terry Gross · Question design
The Principle
The quality of an interview is decided before you hit record—by your list of questions. Bad questions buy you "yes" and "it was fine"; good ones send the person back to the scene, into detail and feeling. Three moves: ask open, not yes/no; ask for the scene, not the verdict; then chase the "why." The best questions are usually the shortest.
In Their Words
"The best, most reliable interview question I know is also the simplest: 'Tell me about yourself.' It gives people room to lead you to what matters to them."
— Terry Gross (host, Fresh Air), on the art of the interview
Why It Works
Closed questions ("Were you nervous?") lock the answer into one word—and often smuggle in the answer you want, nudging the person to agree. Open questions ("Walk me back to that moment") hand them the wheel, and the details they give are usually better than any you'd have guessed. And silence is the most underrated tool: when they finish, don't rush to fill the gap—wait three seconds. People can't stand a void; they fill it themselves, and what spills out is often the truest thing they say. Half of Terry Gross's forty-year craft is the question; the other half is holding her tongue.
Revision in Action
You must have felt so proud, right? (closed + leading—only "yes")Take me back to the morning you shipped—where were you, what did you do first? (open + scene, pulls detail)
Were you nervous before the launch? (yes/no)Walk me through the last hour before you shipped. What were you thinking? (scene + emotion)
When to Use · Common Traps
✓ Podcast interviews, user research, 1:1s, gathering material before a promo case, journalistic digging
Trap: stuffing your own view into the question—they can only nod; you're interviewing yourself
Trap: firing three questions at once—they answer only the last
Trap: they hand you a golden line, and you kill the best silence by rushing in
This Week's Practice · Reflection
Find one person and interview them for 10 minutes about one thing they lived through. Rule: no yes/no questions, and force yourself to count three seconds before speaking after each answer. Afterward, replay and mark their most vivid line—which question (or which silence) hooked it out? Reflection: last time you "asked" at work, were you really asking, or waiting for them to agree with you?
Principle 03
Sound Montage: Let Them Hear It, Don't Narrate It
Show with sound, don't narrate
Made to Stick · Concreteness
The Principle
Audio's sharpest weapon isn't narration—it's actuality: ambient sound, the subject's own voice, a real recorded exchange. "The lab was tense" is a report; the listener must imagine it. Swap in the frantic keyboard, someone's sharp intake of breath, the person saying "my palms were soaked"—and tension goes straight into the ear. The rule: if sound can let people hear it firsthand, don't use narration to hand them the conclusion.
In Their Words
"Abstraction makes it harder to understand an idea and to remember it. Abstraction is the luxury of the expert."
— Chip & Dan Heath, Made to Stick
Sound montage is anti-abstraction: don't give the conclusion, give the concrete, hearable evidence.
Why It Works
Narration "tells"; sound "shows"—the aural version of show, don't tell. When narration says "he was proud," you have to trust the narrator; when the subject's own voice shakes with feeling, you believe it involuntarily, because that's evidence, not a verdict. Montage goes further: juxtapose two sounds and let them collide into meaning. A worker says "this job fed three generations," cut immediately to the crashing steel gate of the plant shutting down—no word of explanation needed, the gap speaks for itself. Sound's credibility comes precisely from not passing through the hand of a narrator.
Revision in Action
(Narration) The factory floor was chaotic and loud, and the workers were under pressure.[SFX: clanging metal, a barked order] Worker (on tape): "In here you can't even hear yourself think."—don't say "loud," make them feel deafened.
(Narration) The team was overjoyed when the numbers came in.[TAPE: a gasp, then a room erupting] Engineer, voice cracking: "We... we actually did it."
When to Use · Common Traps
✓ Narrative podcasts, documentaries, product stories, team retros (use a real meeting clip instead of a paraphrase)
✗ Pure news/tutorials—here clean narration beats a pile of sound effects
Trap: sprinkling effects as decoration, unrelated to the story—just noise added
Trap: capturing great tape, then having narration restate it—repetition means you don't trust the sound
This Week's Practice · Reflection
Recall a scene you'd like to tell (a launch, an argument, the moment of a decision). List: if it were audio, which 3 sounds would you capture? Turn at least one "narrated conclusion" into "let them hear it." Reflection: with no visuals, how does sound establish a sense of place—so a listener knows in one second where they are?
Principle 04
Narrative Arc: Action Hooks Them, Reflection Keeps Them
Anecdote & the moment of reflection
Ira Glass · The story engine
The Principle
Ira Glass breaks a "story" into two building blocks. The first is the anecdote—a sequence of "and then?" actions that naturally pulls you forward. The second is the moment of reflection—a pause where you tell the listener what all this means. Only action is a laundry list; only reflection is a lecture. Good audio alternates: an anecdote hooks you, a line of reflection lights it up, then into the next anecdote.
In Their Words
"There are two building blocks of a story. The first is the anecdote — a sequence of actions where one thing leads to another. The second is the moment of reflection: a moment where you say, here's why this was worth your time."
— Ira Glass, This American Life (on storytelling)
Why It Works
The action sequence runs on the suspense of "and then?"—once the brain enters an unfinished action, it's hard to leave midway. That's exactly why Serial ends each episode on a cliffhanger. But pure action leaves people asking "so what?"—and the moment of reflection is the answer to that "so what." Openings matter most: skip the table-of-contents intro ("today, three points"), which is written for the eye; use a concrete anecdote to drag them through the door first, and let the points come out slowly, tucked behind the story.
Actionand then?
Actionand then?
Reflecthere's why…
Actionand then?
Reflectso that's it
Ira Glass's story engine: the action sequence builds suspense (hooks), the moment of reflection supplies meaning (keeps)—the two alternate, looping forward.
Revision in Action
In this episode, we'll cover three principles of behavioral economics.A man once paid $100 to NOT eat a chocolate bar. He wasn't crazy—he was running an experiment on himself. And what he found changes how you should think about willpower. (action first, points later)
Today we'll walk through three technical points about blockchain.In 2010, a programmer bought two pizzas for ten thousand bitcoin. Today those pizzas are worth hundreds of millions. What did he miss?
When to Use · Common Traps
✓ Narrative podcasts, keynotes, product launches, the opener of a promo case—anywhere you must "grab first, argue later"
Trap: all anecdote, no reflection—fun to hear, but "so what?" never gets answered
Trap: all reflection, no action—lecturing from minute one, and no one stays
Trap: opening with the menu ("three points today"), wasting the very spot that should hook
This Week's Practice · Reflection
Pick a point you want to share—don't state it. First write a 60-second "anecdote": one specific person, a chain of actions, one suspense. Then write one line of "reflection" that reveals what it means. Hide the point behind the story. Reflection: last time you gave a report or talk, did you open with an "anecdote" or a "table of contents"?
Going Deeper
If audio scripts must "sound like talking," doesn't that clash with technical precision?
No clash—the trick is separating "spoken" from "imprecise." Going spoken changes the syntax: short sentences, few clauses, "so" instead of "in conclusion." Rigor guards the facts and logic: not one datum, cause, or caveat may drop. Precise content can absolutely be delivered in speech: instead of "the approach is effective in most scenarios," say "this works most of the time—but there's one exception, I'll get to it." The second is both spoken and fully qualified. The real enemy isn't speech—it's prose-voice vagueness: words like "significant," "correlated," "to some degree" sound rigorous but often say nothing.
Where do Chinese and English audio scripts differ in rhythm?
English gets its rhythm from stress and linking; sentences can build suspense by rising intonation, and short monosyllables ("So. Here's. The. Thing.") have a built-in drumbeat. Chinese is a tonal language, so its rhythm leans more on pauses and syllable counts—four-character phrases, parallelism, silence. Writing Chinese audio, exploit the "beat" of a stop: the pause of a single comma, the silence after a rhetorical question, lands harder than in English. Another gap: literary Chinese idioms sound jarring in the ear and must become plain speech—just as English's long Latinate words (utilize, facilitate) should revert to use, help. Both languages obey the same iron law: read it aloud; if it doesn't say smoothly, it won't hear smoothly.
Same story material—podcast, talk, short video: how do you tune each?
Podcast: pure audio, most reliant on sound montage and reflection to "fill in the invisible picture"; pace can be slow, silence and detail allowed. Talk: you're present, slides are there, sound is just one layer—so hold back: one beat per slide, riding live energy and pauses (see Duarte Day 3, Writing for the Ear Day 42). Short video: visuals steal half the attention, so the first 3 seconds must be your strongest anecdote hook, and reflection compresses into a one-line caption. One thread runs through all three: the less a medium can be rewound and the more it depends on grabbing attention instantly, the more Ira Glass's "action first" matters.
AI can already clone voices and generate podcast dialogue—do we still need people for audio?
AI solves "making sound," not "worth hearing." It can generate a fluent exchange in seconds, but it doesn't know who to interview, which second to leave silent, which pause you can't cut, or which raw voice colliding with which crash actually means something—these are narrative judgments, the very core of this week's four cards. Ira Glass has a much-quoted line: the hardest stretch for a beginner is when your taste is already high but your craft hasn't caught up, and the only way out is to finish a large volume of work. AI happens to let you sprint through the "craft" gate (transcription, editing, voicing), freeing people for what machines can't: judging which story is worth telling, and how to tell it so it moves. The better AI gets at making sound, the scarcer the people who can direct it.