Turn Claude Code from "a chat box that writes code" into your personal engineering runtime.
Day 3 argued the harness is the LLM's OS — and Claude Code is today's public SOTA coding harness. Yet 90% of people use only its shallowest layer ("chat + edit files"), treating a programmable platform as fancy autocomplete. This issue isn't "what is Claude Code"; it's how to open up its entire configuration surface: CLAUDE.md is layered memory that costs tokens every turn, not a README; a subagent's real value is context isolation, not "more assistants"; slash commands have merged into skills, so long procedures can load on demand at near-zero resident cost; and claude -p turns the whole harness into a scriptable unix tool, with MCP wiring external systems in as native tools. Master these four layers and your Claude Code becomes a different class of leverage than the default.
Many people write CLAUDE.md like a project README, stuffing in entire architecture docs, style guides, decision logs. That's an anti-pattern: CLAUDE.md is injected and held resident in context at the start of every session. The longer it is, the more it (1) eats your usable context and (2) dilutes adherence — past a certain instruction density, the model starts dropping things. Anthropic officially recommends keeping a single CLAUDE.md under ~200 lines, holding only "high-frequency, cross-task, stable" facts and constraints.
Claude Code's memory is a four-tier hierarchy, merged by priority at startup: (1) enterprise / managed policy; (2) project ./CLAUDE.md or .claude/CLAUDE.md (git-shared with the team); (3) project-local CLAUDE.local.md (gitignored, personal prefs); (4) user-level ~/.claude/CLAUDE.md (across all projects). A large monorepo can also place a CLAUDE.md in subdirectories, loaded only when you enter that directory — the key lever for big repos: let context appear on demand per work area, instead of dumping every repo-wide rule at the root.
The @import syntax (@docs/style.md) pulls in external files, but note a counterintuitive point: import is inline expansion — it does NOT save tokens. Imported content still counts against the active window. Import's value is reuse/organization, not compression. What actually saves context is moving "procedures" into an on-demand skill (see §3), not importing them into resident CLAUDE.md.
A "good" root CLAUDE.md: short, constraints not narration, pushes long content to skills:
# CLAUDE.md — only high-frequency, stable facts and red lines
## Commands
- Test: pnpm test (run after every change) Build: pnpm build
- Never use npm / yarn, this repo standardizes on pnpm
## Style red lines
- TS strict, no any; new code must ship with unit tests
- API layer errors return Result<T>, never throw
## Repo map
- Business logic in packages/core, HTTP in packages/api
- Complex release flow: see /release (skill, loads on demand, not inlined here)
Use the /memory command to edit any tier of CLAUDE.md mid-session, or browse the auto memory Claude saved from your corrections. The rule: facts in CLAUDE.md (resident), procedures in skills (on demand).
@import saves tokens — it's just inline expansion, fully billed. (3) Big monorepo using only a root CLAUDE.md, which gets dumber as it grows; push memory down to subdirectories for on-demand loading.
Day 3 noted a subagent is "a child process with its own context window"; here's the engineering test. Subagents are markdown files in .claude/agents/*.md (project-level, team-shared) or ~/.claude/agents/*.md (user-level), each with its own system prompt, tool permissions, even its own model. The /agents command creates them interactively.
Its single most valuable use has one theme: isolating operations that produce large output. Running a full test suite, reading thousands of log lines, fetching a long doc — these emit huge token volumes. Done in the main conversation, they instantly pollute the main context past usability. Hand them to a subagent: the verbose output stays in the subagent's isolated context, and only the distilled conclusion returns to the main thread. This is trading a "throwaway isolated context" for the main thread's cleanliness — the same token economics as §1.
Conversely, subagents are wrong for tasks needing shared main-thread global state. A subagent can't see the main conversation (that's the cost of isolation), so you must re-feed it enough background in its prompt, or it works in an information vacuum. The "context drift between agents" that Cognition warns about in Don't Build Multi-Agents is exactly this pitfall — isolation is double-edged.
A high-utility subagent: run tests in an isolated context and return only a summary, so the main thread isn't drowned in thousands of lines of test output.
# .claude/agents/test-runner.md
---
name: test-runner
description: Run the test suite and diagnose failures. Use proactively after code changes.
tools: Bash, Read, Grep
model: haiku # noisy work on a cheap model; cost in Day 16
---
You are a test diagnosis expert. Steps:
1. Run `pnpm test`, capture full output (it'll be long — fine, it stays in YOUR context).
2. Only on failure: locate failing cases, read relevant source, give a root-cause hypothesis.
3. Return to the main thread ONLY a structured summary:
{ passed: N, failed: M, failures: [{test, file:line, root_cause_guess}] }
Do not return raw test logs to the main thread.
The key is the last line and model: haiku: a subagent is a mechanism to outsource the dirty work, and the point of outsourcing is that the main thread receives only the clean conclusion — plus token savings from the cheap model.
description, so Claude invokes the subagent when it shouldn't.
A major 2026 change: custom slash commands have merged into skills. Both .claude/commands/deploy.md and .claude/skills/deploy/SKILL.md create /deploy and behave the same; existing commands/ files keep working. Skills add: a directory for supporting files, frontmatter to control "who invokes it", and the ability for Claude to auto-load them when relevant.
The real engineering payoff is context economics: unlike CLAUDE.md, a skill's body loads only when invoked. So those "long but low-frequency" multi-step procedures (release, migration, compliance checklists) — putting them in CLAUDE.md wastefully burns resident budget; putting them in a skill costs nothing until needed. This is the mechanism underneath §1's "facts in memory, procedures in skills" rule.
Key frontmatter: description (decides when Claude auto-invokes), allowed-tools (restrict which tools the procedure can touch), and the $ARGUMENTS placeholder in the body (receives args after /command). You can also control whether it's manual /trigger only or Claude may invoke it autonomously.
A /commit skill that bakes in your team's commit convention — no more restating Conventional Commits rules each time:
# .claude/skills/commit/SKILL.md
---
description: Create one git commit following team convention
allowed-tools: Bash(git add:*), Bash(git commit:*), Bash(git diff:*), Bash(git status:*)
---
Create a commit for the current changes:
1. Run `git status` and `git diff` to see exactly what changed.
2. Write the message in Conventional Commits: type(scope): summary
type ∈ feat|fix|refactor|docs|test|chore
3. If the user passed an argument, use it as a scope hint: $ARGUMENTS
4. Stage the relevant files and commit. Do not commit unrelated, unreviewed changes.
Calling /commit api runs the whole flow with api as the scope hint. Note allowed-tools narrows this skill to a git subset — even if the flow goes off the rails it can't touch rm or push.
descriptions, so Claude auto-invokes the wrong one — same root as Day 4's tool-selection degradation; prefer few and orthogonal. (3) Forgetting allowed-tools, so a read-only skill gets write permission.
claude -p turns the whole harness into a scriptable unix tool; MCP wires external systems in as native tools — together they're what "automation" actually means.Headless mode: claude -p "<prompt>" runs with no interactive UI, prints the result to stdout, and exits. A few flags make it CI/cron-ready: --output-format json for machine-readable output (with tokens and cost), --allowedTools to pre-authorize tools without prompts, and for unattended runs --dangerously-skip-permissions to bypass all gates. The exit code drives pipeline pass/fail. Typical uses: auto-fixing lint in CI, batch-editing a class of files, nightly regression runs.
MCP (Model Context Protocol): wires external systems — GitHub, databases, Notion, internal APIs — in as tools Claude can call directly. claude mcp add --transport http <name> <url> connects a remote server; --transport stdio ... -- <cmd> connects a local process. Config can land in .mcp.json and be shared with the whole team via the repo. MCP is Day 18's main topic; here we just see its power combined with headless.
Together: in CI, run claude -p + a GitHub MCP server, and Claude can read the PR diff, run tests, and post review comments back to the PR — a fully automated engineering pipeline. But the boundary of automation is permissions: the more hands-off headless is, the heavier the security burden.
# Let Claude review a PR in CI: headless + GitHub MCP
claude mcp add --transport http github https://api.githubcopilot.com/mcp/ \
--header "Authorization: Bearer $GH_TOKEN"
claude -p "Review PR #$PR_NUM: read the diff, run pnpm test,
leave inline comments on bugs at their line; approve if clean." \
--allowedTools "Bash(pnpm test:*)" "mcp__github__*" "Read" "Grep" \
--output-format json > review.json
# Use exit code + json fields to decide whether CI passes
test $? -eq 0 || exit 1
Note --allowedTools is a whitelist: only pnpm test, GitHub MCP, and read-only tools are granted — anything ungranted gets prompted, and under headless a "prompt" = blocking failure, i.e. a hard ban. This is the only reliable permission guardrail when unattended.
--dangerously-skip-permissions on untrusted input (external PRs, user-submitted content) — this is the #1 prompt-injection entry point; a malicious diff can coax Claude into running arbitrary commands (Day 24). Unattended runs require a locked whitelist + sandbox. (2) Connecting a swarm of MCP servers "for completeness" — tool count explodes and triggers selection degradation (Day 4), and each server's schema eats context. (3) Running long headless tasks with no timeout or max-turns, looping forever and burning tokens.
Pick a mid-sized repo you use often and spend half an hour laying down all four config layers — feel the gap between default and "engineered" Claude Code:
test-runner, model: haiku, returning only a structured summary. Compare how much cleaner the main context stays.allowed-tools narrowing permissions. Verify they don't sit in context until /triggered.claude -p "summarize this week's merged PRs" --output-format json. Read the tokens / cost in the json — that's your baseline for putting it in cron..claude/settings.json, deny rm -rf / force push / --no-verify (Day 3's config). Every headless automation runs inside this guardrail.After this, you'll have an intuition: Claude Code's ceiling isn't the model, it's how much engineering discipline you've baked into its config surface. Same model, configured vs not, is two different levels of leverage.