ai-ml, mental-models, system-design, cs-papers-deepread — adding one page of in-depth content per day (or every other day / weekly), as a standalone Chinese HTML file plus a standalone English one./), Giscus comments, an image lightbox, language toggle + TTS narration — all provided by shared JavaScript; content pages don't carry a single line of it themselves.The core idea in one sentence: I only maintain each site's roadmap (TOPICS.md); production, publishing, post-processing, aggregation, and inspection are all automated — with the "write the content" step delegated to Claude agents running on a schedule in the cloud.
┌─ Human (me) ────────────────────────────────┐
│ curate TOPICS.md roadmaps · set caps · │
│ review monthly suggestions │
└──────────────┬──────────────────────────────┘
▼
┌── Generation: claude.ai cloud routines (20 triggers) ──┐
│ staggered cron → repo mounted → follow CLAUDE.md spec │
│ pick topic (fs-idempotent) → write zh+en pages → │
│ update index → publish.sh │
└──────────────┬─────────────────────────────────────────┘
▼ git push
┌── Repos: 20+ content repos (GitHub Pages) ─────────────┐
│ TOPICS.md · CLAUDE.md · .maxchars · publish.sh gate │
└──────────────┬─────────────────────────────────────────┘
▼ on push
┌── Post-processing: per-repo GitHub Actions ────────────┐
│ inject shared JS · (mental-models) Azure TTS bake │
└──────────────┬─────────────────────────────────────────┘
▼ daily cron
┌── Aggregation: hub repo GitHub Actions ────────────────┐
│ refresh-hub.yml re-renders index · │
│ build-search.yml rebuilds Pagefind index │
└──────────────┬─────────────────────────────────────────┘
▼
┌── Governance ──────────────────────────────────────────┐
│ auto-pause finished routines + English-page leak scan │
│ · monthly frontier-refresh meta-routine │
└────────────────────────────────────────────────────────┘
The layering principle: each layer trusts only the artifacts of the layer below it, never that layer's process. Generation can go wrong, so the repo layer has a validation gate; the gate can miss things, so the governance layer patrols.
Besides the HTML pages, each content repo contains exactly four things, each with one job.
The only part of the system that needs my ongoing attention. It lists, in order, every topic the site will cover; several sites also state a hard cap (mental models is capped at issue 68). The key design decision is one-way authority: the routine may only read it — publish.sh flatly rejects any commit that modifies TOPICS.md. When topics run out, the routine is not allowed to extend the roadmap itself; it can only send me a push notification asking for a refill.
Drawing the line here is what keeps the whole system on course: the AI decides how to write; the human decides what gets written.
Sites that need precise control over layout and depth (system design, paper deep-reads, daily book deep-reads) carry a detailed execution spec: target reader (senior engineers), length band, required section structure (the paper site's nine-part skeleton: one-liner → glossary → context → problem & motivation → core idea → key results → impact → limitations & critiques → takeaways), color palette (each site has its own visual signature — system design is dark cyan, papers are amber-copper), and honesty requirements — uncertain quotes must be marked as paraphrase, and limitations and counterarguments must be written.
The most useful trick in these specs: anchor on a published page. The prompt points at a specific article ("match read1 for depth, format, and voice") and says: do it like that. A concrete exemplar is far more stable than any prose description of style.
A one-line file containing a single number (3500/4000/5000). publish.sh counts the CJK characters of every new page and enforces this ceiling. This one was earned the hard way: LLM-generated series exhibit a "length ratchet" — each article comes out slightly longer than the last, because the model anchors on the most recent pages, and a few dozen days later the pages have bloated out of control. The fix clamps from both sides: a target band in the prompt, a hard ceiling at the gate.
All repos share the same ~150-line bash script; the routine must publish through it. What it checks:
.maxchars (no bloat)<div> tags (no truncated HTML — truncated LLM output really does happen)git add/commit/push, with the commit message normalized to Add #N: titleThat last convention is no small thing — the commit message itself becomes a machine-readable publishing record; downstream completion detection and hub badges all work by parsing it.
Content is produced by scheduled agents ("routines") running on claude.ai — currently 20 triggers. Each trigger's job consists of:
Bash / Read / Write / Edit / Glob / Grep / WebSearch / WebFetch / PushNotification — enough and no moreTake the daily book deep-read routine. Its prompt is five steps:
ls *-read*.html. The filesystem is the database: no state store anywhere, and repeated firings are naturally idempotent, because the next run sees a different file listing.{slug}-read{N}.html + {slug}-read{N}.en.html, each required to read natively rather than as a stiff translation, cross-linked via a language bar; both language index pages get updated too../publish.sh; through the gate means live.The prompt's last sentence is "complete autonomously, wait for no confirmation" — nobody is present when a cloud routine runs, so any step that waits for approval is a deadlock.
Beyond the 20 content routines there are two meta-routines:
ROADMAP-SUGGESTIONS.md — suggestions only; it never edits any TOPICS.md directly. Classical fields (philosophy, Buddhism, mathematics) default to "nothing new this month." Every suggestion must cite a source verified to exist; when in doubt, leave it out.The routine clocks out after pushing, but the page isn't in final form yet. Each repo's GitHub Actions take over for two kinds of post-processing.
Comments, search, bilingual TTS, navigation buttons, and the lightbox are all provided by shared scripts hosted in the hub repo (comments.js, search.js, i18n-tts.js, index-button.js, lightbox.js). Content pages are forbidden from hard-coding these script tags (publish.sh blocks it); instead, an injection Action scans the HTML after each push, adds whatever is missing, and auto-commits as Auto-inject shared scripts.
Why injection instead of baking the tags into the generation template? Because infrastructure must be able to evolve independently of content. With 20+ repos and hundreds of pages, script tags hard-wired into templates would mean retraining 20 prompts and re-touching hundreds of pages just to upgrade the search script. Under injection, you change the shared script once and the whole fleet picks it up on the next pass — and old pages get retrofitted for free.
Every section of the mental-models site is click-to-listen, and the audio is pre-baked:
bake-tts.yml → runs bake-tts.py<h2> section → hashed → check whether audio/zh/<hash>.mp3 already exists → only call the Azure Speech REST API when it doesn'tdata-tts-zh="<hash>" back onto the HTML element; the front-end i18n-tts.js resolves audio by that attribute. Sites without pre-baked audio degrade gracefully to the browser's Web Speech API[skip bake], breaking the "commit triggers bake, bake makes a commit" loopHash addressing makes the whole chain idempotent: unchanged content costs zero API quota, and editing one section re-bakes only that section. The site has accumulated 500+ mp3 files, about 1.2 GB — all sitting in the git repo, served directly by Pages, at zero extra storage cost.
(This TTS chain originally ran on Volcano Engine and was later migrated wholesale to Azure; the old script survives as a .bak fossil.)
The hub repo is fully automated too, via two daily Actions.
Runs generate_hub.py, a textbook case of "one source of truth, two language renders": card metadata (titles, bilingual blurbs, palettes, sections) lives in a single CARDS array, and both the Chinese and English pages render from it — there are never two HTML files to keep in sync.
The dynamic parts come from the GitHub REST API:
-dayN/-weekN/-readN.html), and render that date on the card — so the badge reflects "content last updated," not a "last commit" date polluted by the bot's injection commits.Add #N from commit messages, and when a site reaches its cap, swaps the date for a "✓ Completed" badge. This is the downstream payoff of the commit-message convention — no one has to mark a site finished; each site announces its own graduation.Timing-wise, this Action runs about 45 minutes after all content routines, so the day's new pages always make it to the landing page.
Static sites have no backend, so search is Pagefind: every day, clone all content repos into _src/ as one tree, run Pagefind to build a sharded index (Chinese and English indexed separately), and publish it to /pagefind/ in the hub repo. The front-end search.js provides a floating search button and opens the search overlay in the page's language. Footers, navigation, and comment containers are excluded from indexing as noise.
One special case: the Thinker Roundtable site (thinker-arena) is client-rendered — its content lives in JSON, invisible to a crawler. The fix is render_search_snapshots.py, which renders the JSON debates into plain HTML snapshots purely for Pagefind to consume before indexing — SSR for the crawler's benefit, just as a daily batch job.
Getting it running is the easy part; running unattended for the long haul is the hard part. The lines of defense:
Every site's roadmap is finite — when the material is learned, the site should graduate, not pad itself for the sake of a daily streak. The cap is recorded in three places: TOPICS.md (visible to the routine), the hub's CAPS table (badges), and a local scheduled inspection task: weekly, it checks each capped site's highest published issue, and any site that has written to its cap gets its cloud trigger flipped to enabled=false via the API.
The pause operation has two safety interlocks, both paid for with real mistakes:
get the trigger and confirm that the repo it mounts really is the one being retired — protection against a trigger ID mapped to the wrong site;{"enabled": false}, because this API's update replaces job_config wholesale rather than merging — sending a partial job_config would wipe out the prompt, the mounted repo, and the model configuration.The most common failure mode of bilingual generation is Chinese leaking into the English page's template slots (subtitles, tags, name fields). The same inspection task runs a set of fingerprint greps (like class="en">[一-鿿]) over every repo's *.en.html and reports any hit. The fingerprints are chosen carefully — legitimate Chinese, like Buddhist scripture alongside its English translation or glossed terms, doesn't use those classes and never false-positives.
.maxchars hard ceiling plus the prompt's target band, as abovels; the publishing record = commit messages; completion status = parsed from commits. There is no independently maintained state store anywhere — so nothing can drift out of sync with reality.Add #N commit format, the {slug}-day{N}.html naming, the one-line .maxchars file — components talk through these humble conventions. There isn't a JSON schema anywhere, yet every convention has at least two consumers.One person's attention is the most expensive resource in this system. The design goal of the pipeline was never "full automation" for its own sake — it was to free my attention from production and operations and spend all of it on the only thing worth spending it on: deciding what to learn next.