A teardown of a self-extending coding agent

How Pi is built —
and why its author
tore everything else down first.

Pi is a minimal coding-agent harness by Mario Zechner (badlogic, of libGDX). Four tools, a sub-1000-token system prompt, and a loop that "just loops." Its real thesis isn't smallness — it's malleability: software you ask to rewrite itself. Here's the architecture, the design patterns, and the arguments behind them.

The provocation

01What he hated about the others

Pi exists because its author got fed up. The critiques are specific, and they map one-to-one onto the design choices that follow.

"Claude Code has turned into a spaceship with 80% of functionality I have no use for. The system prompt and tools also change on every release, which breaks my workflows."

Mario Zechner, "pi — a coding agent" (mariozechner.at, Nov 2025)
A

Opaque harnesses

"Existing harnesses make [inspecting context] extremely hard or impossible by injecting stuff behind your back." He wanted to know exactly what hits the model.

→ Pattern: total context transparency
B

Accumulated baggage

Claude Code, opencode, Codex "accumulated baggage along the way, which shows in the developer experience." Velocity multiplied bugs.

→ Pattern: tiny core, "add as few features as possible"
C

Leaky SDK abstractions

Other harnesses "rely on libraries like the Vercel AI SDK, which … doesn't support tool calling well with self-hosted models."

→ Pattern: own the provider layer (pi-ai)
D

MCP context tax

"Playwright MCP has 21 tools using 13.7k tokens (6.8% of context)… That many tools will confuse your agent."

→ Pattern: no MCP; CLI tools + Bash

The throughline: every feature you don't fully understand is a liability the model inherits. So instead of configuration, Pi gives you primitives — and a way to build the rest yourself.

Library, not framework

02Four packages, each usable alone

The monorepo is a stack of thin, independently-consumable libraries. You can build a Slack bot on the agent core without ever touching the terminal UI. That separation is the rebuttal to the monolith.

pi-ai
Unified multi-provider LLM API. Normalizes message shape across Anthropic, OpenAI, Google, xAI, Groq, Bedrock, OpenRouter… without hiding provider capability.
depends on → provider SDKs directly · zero agent knowledge
provider layer
pi-agent-core
The agent runtime: the loop, tool calling, state. Every policy (permissions, steering, compaction) is an injected function, not baked in.
depends on → pi-ai · zero UI knowledge
runtime / loop
pi-coding-agent
The actual CLI app + the extension system. Four tools (read/write/edit/bash), sub-1000-token prompt, 30+ typed lifecycle events.
depends on → pi-agent-core, pi-tui, pi-ai
the product
pi-tui
Terminal UI with differential rendering. "It doesn't flicker, doesn't consume a lot of memory, doesn't randomly break." — Ronacher.
standalone · consumed by the CLI
rendering

"Pi itself is written like excellent software. It doesn't flicker, it doesn't consume a lot of memory, it doesn't randomly break, it is very reliable."

Armin Ronacher, "Pi: The Minimal Agent" (lucumr.pocoo.org, Jan 2026)
The seams

03Seven extensibility patterns

Pi is "aggressively extensible so it doesn't have to dictate your workflow." Each pattern below is a seam where you plug in real code — no marketplace, no IPC, no manifest schema.

1

Library, not framework

Four layers you adopt à la carte. Pay only for what you use; never inherit a monolith.

workspaces/ → independent npm packages
2

Normalize shape, expose capability

One Message type for portability, but streamSimple() (unified) and stream() (full provider options) — plus a thinkingLevelMap and compat fields. You're never trapped under the abstraction.

pi-ai/api-registry.ts · stream.ts
3

Typed event hooks

Events carry their own result type as a phantom. observe() = read-only; on() = participate (transform / block / cancel). No stringly-typed shell hooks.

pi-agent/docs/hooks.md
4

Extensions = in-process TS

A default-export factory (pi) => void, loaded via jiti, hot-reloadable with /reload. Same API the core uses. Pi dogfoods it: its own widgets live in .pi/extensions/.

registerTool · registerProvider · registerCommand
5

Loop = thin mechanism, injected policy

The loop is a plain while. Everything that varies — convert, validate, steer, stop — is a function on AgentLoopConfig. Inversion of control.

pi-agent/agent-loop.ts
6

Tools + pluggable operations

A tool is just {name, parameters, execute}. Built-ins take an injected ReadOperations/BashOperations — swap local FS for SSH or a sandbox without rewriting the tool.

core/tools/read.ts → examples/ssh.ts
7

Schema-first + streaming everywhere

TypeBox schemas = one source of truth for runtime validation and static types. Everything returns an event stream, not Promise<string> — enabling steering & partial render.

Static<TParams> · AssistantMessageEventStream

The payoff

These seams compose into one capability: the agent can write its own tools and reload them live. Extensibility becomes self-extension.

→ see §05
// pi-agent/docs/hooks.md — the event carries its own result type
interface HookEvent<TType extends string, TResult = void> {
  type: TType;
  readonly [HookResult]?: TResult;   // phantom — no result map needed
}

// a tool_call handler can transform input, or block execution
interface ToolCallEvent extends HookEvent<"tool_call", { block?: boolean; reason?: string }> {
  type: "tool_call"; toolName: string; input: Record<string, unknown>;
}

pi.on("tool_call", async (event, ctx) => {
  if (event.toolName === "bash" && event.input.command?.includes("rm -rf")) {
    if (!await ctx.ui.confirm("Dangerous!", "Allow rm -rf?"))
      return { block: true, reason: "Blocked by user" };  // veto
  }
});
// pi-agent — the loop is a mechanism; ALL policy is injected
interface AgentLoopConfig {
  model: Model<any>;
  convertToLlm:        (m: AgentMessage[]) => Message[];   // history → provider
  transformContext?:   (m, signal) => Promise<AgentMessage[]>; // compaction / RAG
  beforeToolCall?:     (ctx, signal) => Promise<Result>;        // permission gate
  afterToolCall?:      (ctx, signal) => Promise<Result>;        // post-process
  getSteeringMessages?:() => Promise<AgentMessage[]>;          // mid-run input
  shouldStopAfterTurn?:(ctx) => boolean;                       // stop policy
  prepareNextTurn?:    (ctx) => TurnUpdate;                    // swap model/thinking
}
// the loop itself: "just loops until the agent says it's done."
// core/tools/read.ts — the IO backend is an injected interface
export interface ReadOperations {
  readFile: (absolutePath: string) => Promise<Buffer>;
  access:   (absolutePath: string) => Promise<void>;
}

export function createReadToolDefinition(cwd: string, opts?: ReadToolOptions) {
  const ops = opts?.operations ?? defaultReadOperations;  // ← swap for SSH / sandbox
  return {
    name: "read", label: "read",
    parameters: Type.Object({ path: Type.String() }),  // TypeBox = types + validation
    async execute(id, { path }, signal, onUpdate, ctx) {
      const buf = await ops.readFile(resolve(cwd, path));
      return { content: [{ type: "text", text: buf.toString() }] };
    },
  };
}
// ~/.pi/agent/extensions/greet.ts — a whole extension is one factory
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
import { Type } from "typebox";

export default function (pi: ExtensionAPI) {
  pi.registerTool({
    name: "greet", label: "Greet",
    description: "Greet someone by name",
    parameters: Type.Object({ name: Type.String() }),
    async execute(id, { name }) {
      return { content: [{ type: "text", text: `Hello, ${name}!` }], details: {} };
    },
  });
}
// drop the file in, hit /reload, the model can call it. No build, no restart.
"The loop just loops"

04The agent loop, stepped through

There is no max-steps knob — "I never found a use case for that." The loop runs until the model stops calling tools. Press run to watch a turn, including the two injected escape hatches (steering & follow-up) that keep it from being a dumb while(true).

user input stream from model tool calls? execute (par/seq) append results steering / follow-up
A turn begins with input. Click Run a turn.

"By default, pi gives the model four tools: read, write, edit, and bash. … These four tools are all you need for an effective coding agent. All the frontier models have been RL-trained up the wazoo, so they inherently understand what a coding agent is."

Mario Zechner
4
built-in tools
<1k
tokens: prompt + tool spec
30+
typed lifecycle events
0
MCP servers · max-steps
The shared thesis

05Code is the universal interface

This is where Zechner and Ronacher converge hardest. The model already knows Bash and a programming language fluently. So the best "tool API" isn't 30 rigid MCP schemas — it's a shell and the ability to write code.

MCP-everywhere

  • Tool defs eat context before you ask anything (8k–18k tokens)
  • "That many tools will confuse your agent"
  • Not composable without inference — every chain costs a round-trip
  • APIs change under you; your usage hints become a hindrance
  • Hard to extend; you don't own the surface

CLI tools + Bash + throwaway code

  • A README costs ~225 tokens — pay only when needed
  • "The command line … is a series of tools composed through bash"
  • Write a script once, run it 200× with no further inference
  • Agent reshapes a tool's output format in under a minute
  • You — and the agent — own and maintain it

"The interface to the MCP is now not just individual tools it has never seen — it's a programming language that it understands very well. … Once the script is written, I can execute it 100, 200, or even 300 times without requiring any further inference."

Armin Ronacher, "Tools: Code Is All You Need" & "Your MCP Doesn't Need 30 Tools"

Ronacher's rule of thumb — "anything can be a tool: a shell script, an MCP server, a log file" and "I really only start using MCP if the alternative is too unreliable" — is exactly Pi's default posture: no MCP; build CLI tools with READMEs, or write an extension if you truly need it.

The real point

06Software that is malleable like clay

Smallness is a means. The end is an agent that builds more of itself. Because extensions are typed TS modules using a documented in-process API, the agent can author a new capability and /reload it in the same session — gated by the same beforeToolCall policy you injected.

"Pi's entire idea is that if you want the agent to do something that it doesn't do yet, you don't go and download an extension or a skill. You ask the agent to extend itself. It celebrates the idea of code writing and running code. … It makes you live that idea of using software that builds more software."

Armin Ronacher on Pi

"Pi isn't a sealed product. If you need a command, tool, provider, workflow, or UI tweak, just ask Pi to build it. Have Pi manipulate itself in place, hit /reload, and keep going. … If pi doesn't fit your needs, I implore you to fork it. I truly mean it."

Mario Zechner

The analogy he uses: a hammer that reshapes itself for each job. The same harness becomes a bespoke harness — the agent modifies itself to fit the task, instead of you bending to the tool. registerTool() works at load and at runtime, so self-modification isn't a special mode; it's the same path the core already uses.

If you're designing your own agent

07The transferable principles

01
Make the loop a thin mechanism; inject all policy.

Permissions, previews, undo, compaction belong in hooks (beforeToolCall/afterToolCall), not branches inside the loop. One loop, many surfaces.

02
Normalize message shape, never capability.

Wrap providers behind a registry with a simple and a full path. Keep a capability map. Don't get locked to a lowest-common-denominator SDK.

03
Prefer code + CLI over rigid tool schemas.

The model speaks Bash and Python fluently. A README-documented CLI tool is more composable, cheaper on context, and self-maintainable than a wall of MCP tools.

04
Type your hooks; let the event own its result.

The phantom-result-type pattern beats stringly-typed shell hooks: transform / block / observe become statically checkable.

05
Let the agent extend itself — through the same API the core uses.

If new capability means "ask the agent to write a tool and reload," extensibility and self-improvement collapse into one mechanism.

06
Keep what hits the context window legible.

"Context engineering is paramount." Split tool output into a model portion and a UI portion. If you can't see the context, you can't debug the agent.