Think of multi-agent systems as an internal Slack channel — each Agent is a channel member and all collaboration happens via messages. You don't write state machines or explicit control flow; you define who can join, who initiates, who's allowed to reply, and let the conversation evolve toward a result. Backend analogy: trade orchestrated RPC for an event-driven message bus.
A single Agent has limited context and limited skills. For a task like "read code + write tests + run tests + fix bug," you can stuff it all into one Agent, but the prompt swells and attention dilutes. AutoGen (Microsoft, open-sourced 2023, rewritten as v0.4 in 2024) splits the task across specialized Agents that collaborate through conversation. Two core abstractions only:
ConversableAgent — a node that can receive, send, and call tools; under the hood it's just LLM + tools + memory;GroupChat + GroupChatManager — the "moderator" that picks who speaks next. Common strategies: round-robin, let the LLM choose, or rule-based by role.The whole system is essentially the actor model — each Agent is an actor, messages are the only coupling. v0.4 takes this further with a real async message bus and distributed runtime.
from autogen_agentchat.agents import AssistantAgent from autogen_agentchat.teams import RoundRobinGroupChat from autogen_ext.models.openai import OpenAIChatCompletionClient model = OpenAIChatCompletionClient(model="gpt-4o") # Each Agent does one thing — the system prompt defines its "persona" coder = AssistantAgent("coder", model_client=model, system_message="You are a Python engineer. Output runnable code only, no prose.") reviewer = AssistantAgent("reviewer", model_client=model, system_message="You are a code reviewer. Point out bugs/improvements; reply 'APPROVED' when satisfied.") # Group chat: take turns until reviewer says APPROVED team = RoundRobinGroupChat([coder, reviewer], termination_condition=lambda msgs: "APPROVED" in msgs[-1].content) async def main(): async for msg in team.run_stream(task="Write binary search that handles empty arrays."): print(f"[{msg.source}] {msg.content}")
max_turns as a safety net.If AutoGen is a "Slack channel," CrewAI is a "project team with a PM" — each Agent has a clear Role, Goal, and Backstory; Tasks are explicitly assigned to specific Agents; execution can be Sequential or Hierarchical (with a Manager Agent). Backend analogy: shift from "message-driven" back to an explicit workflow engine — more like Airflow + Slack mixed together.
AutoGen's conversational style is flexible but unpredictable — the same task may take wildly different conversational paths across runs, which is hostile to production. CrewAI (open-sourced 2024, now one of the most popular multi-agent frameworks) takes the opposite philosophy: make the flow explicit, keep the roles stable. You define "3 Agents + 5 Tasks as a directed graph" upfront, then run it. Two execution modes:
"Backstory" isn't decoration — it materially shapes the model's tone and decisions. Writing "you are a senior financial analyst with 15 years of experience, known for being rigorous and conservative" produces noticeably better output than "you are an analyst" (replicated across multiple benchmarks).
from crewai import Agent, Task, Crew, Process researcher = Agent( role="Market Researcher", goal="Find 3 latest trends for {topic} with sources", backstory="Industry analyst with 10 years of primary-source research experience", tools=[search_tool]) writer = Agent( role="Content Editor", goal="Rewrite the research into a tight 800-word brief", backstory="Ex-Economist editor; prizes fact density and readability") # Tasks are explicitly bound to Agents; `context` declares dependencies research = Task(description="Research AI Agent trends for 2026", agent=researcher, expected_output="3 trends + citation links") write = Task(description="Rewrite as a brief", agent=writer, context=[research], # waits for research to finish expected_output="800-word markdown") crew = Crew(agents=[researcher, writer], tasks=[research, write], process=Process.sequential) result = crew.kickoff(inputs={"topic": "AI Agent"})
Splitting one Agent into several roles is the backend world's "monolith → microservices" move — not because "division of labor" is virtuous, but because small context + focused prompt + restricted toolset together cut a single Agent's cognitive load, making each inference sharper. The costs mirror microservices too: communication, debugging, and overall consistency all get harder.
Hang 20 tools + 5 paragraphs of system prompt + full task context on one Agent and you get intent drift: it starts writing code, switches to explaining architecture, then begins interrogating the user. Anthropic's "Constitutional AI" work in 2024 and OpenAI's "Specialized Agents" experiments both confirmed the same finding: narrowing role scope significantly boosts task completion. Three mechanisms behind it:
Simple heuristic for "should I split this?": when an Agent's failures cluster around "did the wrong kind of thing" (wrote code when it should advise, summarized when it should expand) rather than "didn't do it well enough," it's time to split.
# Anti-pattern: a generalist Agent that writes, reviews, and runs generalist = AssistantAgent("engineer", system_message="You are an engineer. You write code, review code, run tests, fix bugs, write docs...", tools=[write_code, run_tests, lint, format, git, search_docs]) # Better: three specialized roles, each with short prompt and tight tools coder = AssistantAgent("coder", model_client=opus, system_message="Write the implementation only. No tests, no docs.", tools=[write_code, search_docs]) tester = AssistantAgent("tester", model_client=haiku, # cheaper model system_message="Write pytest cases and run them. Report pass/fail.", tools=[run_tests]) reviewer = AssistantAgent("reviewer", model_client=opus, system_message="Review code. List the 1-3 most serious issues. Don't rewrite.", tools=[lint])
How multiple Agents "exchange info and reach agreement" is the LLM-era version of classic distributed systems coordination — Paxos, leader election, gossip, blackboard pattern all have analogs here. Today's multi-agent frameworks are essentially combinations of these few patterns.
With N Agents, you have two questions to answer: (1) Who speaks/acts when? (control flow) (2) How is shared state managed? (data flow). Four mainstream protocols:
Which to use depends on the task: splittable + mergeable → Hierarchical; clear order dependency → Sequential; multi-perspective debate → Debate; any subset needs any intermediate result → Blackboard. Production systems typically mix — outer Sequential, with Debate inside one step.
2024-2025 also brought new protocol layers: Google's A2A (Agent-to-Agent) standardizes cross-vendor Agent interop (like Day 6's MCP did for tools); Anthropic Claude's Computer Use + sub-agents lets an Agent spawn sub-Agents. Both are still engineering iterations on these four patterns.
# LangGraph: graph + shared state to implement hybrid protocols (production favorite) from langgraph.graph import StateGraph, END from typing import TypedDict, Annotated import operator class State(TypedDict): question: str research: Annotated[list, operator.add] # blackboard shared region answer: str def researcher(s): return {"research": [search_web(s["question"])]} def critic(s): return {"research": [critique(s["research"])]} def writer(s): return {"answer": synthesize(s["research"])} g = StateGraph(State) g.add_node("research", researcher); g.add_node("critic", critic) g.add_node("writer", writer) g.add_edge("research", "critic") # sequential g.add_conditional_edges("critic", # conditional routing lambda s: "writer" if good_enough(s) else "research") g.add_edge("writer", END) g.set_entry_point("research") app = g.compile() result = app.invoke({"question": "AI Agent trends in 2026?", "research": []})
total tokens = Σ (each Agent's system_prompt + cumulative conversation history + tool descriptions) × times that Agent is invoked. Most underestimated hidden costs: (a) conversation history bloat — AutoGen GroupChat by default has all Agents share the full history; 5 Agents × 10 rounds = 50 LLM calls each carrying full history, tokens grow O(N²); (b) tool description duplication — one tool used by 3 Agents has its schema in each Agent's prompt; (c) retry on failure — Agent output rejected for format errors, tokens double; (d) unnecessary "polite check-ins" — Agents acknowledging each other ("got it, starting now") burn 5-10% of tokens. Cost-cutting moves: prompt caching (mentioned Day 6), periodic summarization of history into memory, kill politeness prompts ("deliver result directly, no acknowledgement"), prefer Hierarchical over Debate (former is O(N) communication, latter O(N²)). A typical 5-agent task going from 50K tokens to 8K tokens is common — single-task cost drops 80%.