DAY 18 / PHASE 2 · APPLICATIONS & SYSTEMS

MCP

Three Primitives · stdio vs HTTP · Build a Server · Context Tax

2026-06-01 · BigCat

MCP isn't another function-calling wrapper — it collapses the N×M integration problem into N+M.

// WHY THIS MATTERS

You already write tool schemas and wire up APIs. MCP's engineering value isn't "now I can call tools" — it's that it changes the topology of integration: instead of writing glue for every model × every data source (N×M), you write a server once and every client reuses it (N+M). But what actually decides whether MCP works for you are three points most people overlook: the control axis of the three primitives (what belongs in a Tool vs a Resource), the process model of the transport (one obscure stdio bug can silently corrupt your whole JSON-RPC stream), and MCP's biggest production failure mode — the context tax of tool definitions (connect 10 servers and you burn thousands of tokens before the user says a word). This issue assumes you know what MCP is; it goes straight to engineering it and avoiding the traps.

// 01

Three Primitives: Not Naming, but Control

Claim: putting a capability in Tool / Resource / Prompt decides "who initiates the call" — MCP's most misused design point.

Background & Principle

An MCP server can expose three primitives. The difference isn't function but control axis:

Most people make everything a Tool, turning "read a config file" into a model tool call — wasting a decision and adding noise to tool selection. The right rule: read-only data → Resource, side-effecting action → Tool, user-initiated workflow → Prompt.

This control axis is also a trust axis: model-controlled Tools carry the highest risk (the model may fire side effects when you didn't expect it), so hosts broadly put permission gates on Tools; application-controlled Resources are host-curated — what gets injected and when is under control; user-controlled Prompts are human-initiated, the highest trust. Misplacing an action as a Resource bypasses an approval that should exist; misplacing data as a Tool drops a zero-risk read into the high-risk, approval-gated channel. Primitive choice isn't just UX — it's the permission boundary.

Control axis: who initiates this call? MODEL decides ──────────▶ TOOLS (actions / side effects) e.g. create_issue, send_email APP decides ────────────▶ RESOURCES (read-only / into context) e.g. file://config.yaml, db://schema USER decides ───────────▶ PROMPTS (templates / slash command) e.g. /code-review, /summarize-pr

Hands-on

# Classify before you design the server — pin to the top of the file
# Does it change external state?         → yes → Tool
# Does it just "let the model see" data?  → yes → Resource
# Does the user have to click to fire it? → yes → Prompt
# Unsure: default to Resource (cheapest, no tool-list pollution)
Failure mode: exposing read-only data as a Tool — the model must "think" to call it, often forgets, or calls it when it shouldn't, polluting selection. Conversely, making a side-effecting action a Resource lets the host auto-inject and trigger it before the user authorized anything. Wrong control axis, wrong permission model.
Further: Anthropic · Introducing MCP (2024-11 launch, motivation for the three primitives) · MCP Spec · Prompts
// 02

A 30-Line Server: the docstring IS the prompt

Claim: an MCP server's tool descriptions go verbatim into model context — the moment you write a server you're writing a prompt, not API docs.

Background & Principle

The official Python SDK's FastMCP automates JSON-RPC, schema generation, and transport; you write three decorators. The key engineering point carries over from Day 4 Tool Use: parameter descriptions > parameter names, and the docstring is read by the model, not by humans. Type hints auto-convert to JSON Schema and docstrings auto-become tool descriptions — these two directly decide whether the model picks the right tool and passes the right args.

Know the limits of the automation: simple type hints (str / int / list[str]) convert to clean schema, but complex parameters can't express their constraints through a type name alone (value ranges, formats, mutual exclusions) — those must go into the docstring or be added via pydantic's Field descriptions. In other words, auto-generated schema saves boilerplate, not the cognitive work of spelling out constraints for the model — and that latter work is exactly the dividing line of server quality. A server that writes each parameter's bounds, units, and examples into its descriptions will have a meaningfully higher call-success rate than one with bare type names.

Hands-on

# pip install "mcp[cli]"
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("weather")

@mcp.tool()
def get_forecast(city: str, days: int = 3) -> str:
    """Get the weather forecast for a city.

    Args:
        city: City name in English, e.g. 'Tokyo'
        days: Number of days, 1-7, default 3
    """
    return _call_weather_api(city, days)

@mcp.resource("config://units")
def units() -> str:
    """Current temperature unit preference (read-only, auto into context)"""
    return "celsius"

@mcp.prompt()
def trip_brief(city: str) -> str:
    """Generate a travel weather brief (user-triggered template)"""
    return f"In one sentence, is {city} good for outdoor activity?"

if __name__ == "__main__":
    mcp.run(transport="stdio")
Failure mode: writing the docstring as a terse human-facing "Get forecast" — the model won't know city must be English or the days range, so it passes 'Tokyo, Japan' or days=30 and errors. Don't hoard tools either: cram 30 tools into one server and selection accuracy collapses (same degradation curve as Day 4).
Further: modelcontextprotocol/python-sdk (FastMCP quickstart) · modelcontextprotocol.io (official docs + SDK list)
// 03

stdio vs Streamable HTTP: transport decides the process model

Claim: transport isn't a deployment detail — it decides whether the server is a local subprocess or remote service, whether it's multi-tenant, and where the auth boundary sits.

Background & Principle

MCP uses JSON-RPC 2.0 over two standard transports:

Both transports run the same protocol lifecycle: after connecting, an initialize handshake runs — client and server exchange protocol versions and negotiate the capabilities each supports (the server declares whether it has tools / resources / prompts, and whether it supports dynamic list-change notifications). This means different clients connecting to the same server may see different capability surfaces; it also means your server must honestly declare its capabilities, or the client won't fetch the corresponding lists. Capability negotiation is the root of MCP's "N+M reuse": the client needn't know in advance what a server looks like — it asks once at handshake.

stdio Streamable HTTP ┌────────┐ ┌────────┐ client ─┤subprocess├ stdin/stdout client ─┤ POST/GET├─▶ https://host/mcp └────────┘ └────────┘ (optional SSE stream) local · 1:1 · no auth remote · multi-tenant · needs auth lowest latency · ties to client scales independently · over network

Hands-on

// Local stdio: Claude Desktop / Claude Code config
{
  "mcpServers": {
    "weather": {
      "command": "python",
      "args": ["weather_server.py"]
    }
  }
}
Failure mode (classic bug): in stdio mode, anything written to stdout is treated as a JSON-RPC message. A single print("debug") or a library's progress bar silently corrupts the entire protocol stream — the client reports "invalid JSON" with no traceable source. Iron rule: stdio server logs always go to stderr (logging defaults to stderr, but don't slip a print in).
Further: MCP Spec · Transports (stdio / Streamable HTTP spec, incl. SSE deprecation)
// 04

The Context Tax: MCP's biggest production failure mode

Claim: most clients dump every server's full tool definitions into context at the start — connect 10 servers and you burn thousands of tokens before the user speaks.

Background & Principle

MCP's convenience has a hidden cost, and the cost is twofold. First, preloading: clients typically dump the full tool schemas of every connected server into context at the start — a fixed overhead independent of what the user asks. Second, more insidious — intermediate-result accumulation: in an agentic loop every tool call's return value stays in context for later turns, so one call to a tool with a large return permanently nails thousands of lines of JSON into every subsequent turn's input. The two compound: more tools makes the opening expensive, more calls makes the process bloat, and by the late stage of a long task the context is packed with schemas and intermediate data no longer relevant — burning money and diluting the model's attention (back to Day 2 lost-in-the-middle). Anthropic's 2025-11 engineering blog Code execution with MCP offers a counterintuitive fix: expose MCP servers as code APIs, let the agent write code to call them, import only the tools it needs on demand, and process intermediate data in the execution environment before returning. They report a workflow that previously consumed about 150k tokens dropping to about 2k tokens (~98.7% reduction). Core insight: the tool catalog shouldn't live in context — it should be progressively disclosed (discovered on demand).

Hands-on

# Three savings you can make today without code execution:
# 1. Connect only the servers this task needs; close when done
#    (Claude Code: don't put all MCP servers in global settings)
# 2. Don't pile dozens of tools in one server — split by
#    responsibility, or merge similar ones
# 3. Trim/summarize large return values server-side; don't let
#    thousands of lines of raw JSON pass through context
Failure mode (counterintuitive): "more servers = stronger agent" is wrong. Every added server raises opening tokens, selection difficulty, and latency, while the model is more likely to mis-pick among 30+ tools. Past a point, more tools is a net negative — MCP's composability tempts you into over-connecting.
Further: Anthropic · Code execution with MCP (quantified context tax + progressive disclosure) · Anthropic · Advanced tool use

// DEEPER THINKING

Compared to writing function calling directly, when is MCP over-engineering?
When the integration is one-off, single-client, non-reusable, MCP's protocol overhead (spawning a process, JSON-RPC round-trips, schema negotiation) is pure burden — just def a function as a tool in your code. MCP's ROI comes from N+M reuse: you have multiple hosts (Claude Desktop + Cursor + a custom agent) sharing the same capabilities, or you distribute capabilities for others to use. For a single agent, single script, fixed tools, function calling is enough. Test: will a second client reuse this capability? No → skip MCP.
Resource is the most neglected of the three primitives — why do almost all servers only expose Tools?
Two reasons. First, Tools are model-controlled; "the model calls it itself" is intuitive, so developers needn't think about injection timing. Resources are application-controlled, requiring the host to decide when to inject into context — but many hosts' Resource support is weaker than Tools and the UI is less polished. Second, making data a Tool "works," so developers don't dig deeper — at the cost of one model decision and selection pollution per read. It's a mismatch between protocol design and host implementation maturity: the spec encourages three layers, the ecosystem leans Tool.
Code execution keeps tools out of context — does that mean MCP's "tool list" abstraction itself gets replaced?
Not replaced, layered. MCP remains the discovery and description layer (servers declare what exists), but "preload everything into context" was just one naive client implementation. Code execution projects MCP servers as a filesystem/code modules the agent imports on demand — the protocol is unchanged; what changes is how the client exposes the protocol to the model. The trend is to keep only the entry point in context (how to discover tools) and fetch real schemas on demand. Expect clients to default to progressive disclosure rather than filling context at once.
An MCP server is a dual entry point for untrusted code and untrusted data — how do you do defense in depth?
Two risk classes: the server itself being malicious (tool poisoning: injection instructions hidden in descriptions to fool the model), and server-returned data carrying indirect prompt injection. Defend in layers: 1) only connect trusted-source servers, prefer local stdio (no network surface); 2) permission-gate sensitive actions at the host layer, don't rely on the model's discretion; 3) treat server return values as untrusted content and isolate them — don't let them rewrite system instructions; 4) add auth + least-privilege scope to remote HTTP servers. This connects to Day 24 Prompt Injection — MCP widens the attack surface; isolation and permissions are non-negotiable.
stdio's 1:1 process model vs HTTP's multi-tenancy — which way does the local personal agent go?
Short term, local agents lean stdio: zero network surface, zero auth, lowest latency, matching the privacy needs of a "personal toolchain." But once you want to share the same capabilities and state across phone, laptop, and cloud, stdio's process binding becomes a shackle — then Streamable HTTP + a personal gateway (see Day 23 Personal AI Infra) fits better. A likely compromise: run high-frequency private tools over local stdio, run cross-device/shared-state tools over remote HTTP, with the host connecting both. Transport choice is fundamentally a privacy-vs-reachability trade-off.

// FURTHER READING