Pi is a minimal coding-agent harness by Mario Zechner (badlogic, of libGDX). Four tools, a sub-1000-token system prompt, and a loop that "just loops." Its real thesis isn't smallness — it's malleability: software you ask to rewrite itself. Here's the architecture, the design patterns, and the arguments behind them.
Pi exists because its author got fed up. The critiques are specific, and they map one-to-one onto the design choices that follow.
"Claude Code has turned into a spaceship with 80% of functionality I have no use for. The system prompt and tools also change on every release, which breaks my workflows."
"Existing harnesses make [inspecting context] extremely hard or impossible by injecting stuff behind your back." He wanted to know exactly what hits the model.
Claude Code, opencode, Codex "accumulated baggage along the way, which shows in the developer experience." Velocity multiplied bugs.
Other harnesses "rely on libraries like the Vercel AI SDK, which … doesn't support tool calling well with self-hosted models."
"Playwright MCP has 21 tools using 13.7k tokens (6.8% of context)… That many tools will confuse your agent."
The throughline: every feature you don't fully understand is a liability the model inherits. So instead of configuration, Pi gives you primitives — and a way to build the rest yourself.
The monorepo is a stack of thin, independently-consumable libraries. You can build a Slack bot on the agent core without ever touching the terminal UI. That separation is the rebuttal to the monolith.
"Pi itself is written like excellent software. It doesn't flicker, it doesn't consume a lot of memory, it doesn't randomly break, it is very reliable."
Pi is "aggressively extensible so it doesn't have to dictate your workflow." Each pattern below is a seam where you plug in real code — no marketplace, no IPC, no manifest schema.
Four layers you adopt à la carte. Pay only for what you use; never inherit a monolith.
One Message type for portability, but streamSimple() (unified) and stream() (full provider options) — plus a thinkingLevelMap and compat fields. You're never trapped under the abstraction.
Events carry their own result type as a phantom. observe() = read-only; on() = participate (transform / block / cancel). No stringly-typed shell hooks.
A default-export factory (pi) => void, loaded via jiti, hot-reloadable with /reload. Same API the core uses. Pi dogfoods it: its own widgets live in .pi/extensions/.
The loop is a plain while. Everything that varies — convert, validate, steer, stop — is a function on AgentLoopConfig. Inversion of control.
A tool is just {name, parameters, execute}. Built-ins take an injected ReadOperations/BashOperations — swap local FS for SSH or a sandbox without rewriting the tool.
TypeBox schemas = one source of truth for runtime validation and static types. Everything returns an event stream, not Promise<string> — enabling steering & partial render.
These seams compose into one capability: the agent can write its own tools and reload them live. Extensibility becomes self-extension.
// pi-agent/docs/hooks.md — the event carries its own result type
interface HookEvent<TType extends string, TResult = void> {
type: TType;
readonly [HookResult]?: TResult; // phantom — no result map needed
}
// a tool_call handler can transform input, or block execution
interface ToolCallEvent extends HookEvent<"tool_call", { block?: boolean; reason?: string }> {
type: "tool_call"; toolName: string; input: Record<string, unknown>;
}
pi.on("tool_call", async (event, ctx) => {
if (event.toolName === "bash" && event.input.command?.includes("rm -rf")) {
if (!await ctx.ui.confirm("Dangerous!", "Allow rm -rf?"))
return { block: true, reason: "Blocked by user" }; // veto
}
});
// pi-agent — the loop is a mechanism; ALL policy is injected
interface AgentLoopConfig {
model: Model<any>;
convertToLlm: (m: AgentMessage[]) => Message[]; // history → provider
transformContext?: (m, signal) => Promise<AgentMessage[]>; // compaction / RAG
beforeToolCall?: (ctx, signal) => Promise<Result>; // permission gate
afterToolCall?: (ctx, signal) => Promise<Result>; // post-process
getSteeringMessages?:() => Promise<AgentMessage[]>; // mid-run input
shouldStopAfterTurn?:(ctx) => boolean; // stop policy
prepareNextTurn?: (ctx) => TurnUpdate; // swap model/thinking
}
// the loop itself: "just loops until the agent says it's done."
// core/tools/read.ts — the IO backend is an injected interface
export interface ReadOperations {
readFile: (absolutePath: string) => Promise<Buffer>;
access: (absolutePath: string) => Promise<void>;
}
export function createReadToolDefinition(cwd: string, opts?: ReadToolOptions) {
const ops = opts?.operations ?? defaultReadOperations; // ← swap for SSH / sandbox
return {
name: "read", label: "read",
parameters: Type.Object({ path: Type.String() }), // TypeBox = types + validation
async execute(id, { path }, signal, onUpdate, ctx) {
const buf = await ops.readFile(resolve(cwd, path));
return { content: [{ type: "text", text: buf.toString() }] };
},
};
}
// ~/.pi/agent/extensions/greet.ts — a whole extension is one factory
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
import { Type } from "typebox";
export default function (pi: ExtensionAPI) {
pi.registerTool({
name: "greet", label: "Greet",
description: "Greet someone by name",
parameters: Type.Object({ name: Type.String() }),
async execute(id, { name }) {
return { content: [{ type: "text", text: `Hello, ${name}!` }], details: {} };
},
});
}
// drop the file in, hit /reload, the model can call it. No build, no restart.
There is no max-steps knob — "I never found a use case for that." The loop runs until the model stops calling tools. Press run to watch a turn, including the two injected escape hatches (steering & follow-up) that keep it from being a dumb while(true).
"By default, pi gives the model four tools: read, write, edit, and bash. … These four tools are all you need for an effective coding agent. All the frontier models have been RL-trained up the wazoo, so they inherently understand what a coding agent is."
This is where Zechner and Ronacher converge hardest. The model already knows Bash and a programming language fluently. So the best "tool API" isn't 30 rigid MCP schemas — it's a shell and the ability to write code.
"The interface to the MCP is now not just individual tools it has never seen — it's a programming language that it understands very well. … Once the script is written, I can execute it 100, 200, or even 300 times without requiring any further inference."
Ronacher's rule of thumb — "anything can be a tool: a shell script, an MCP server, a log file" and "I really only start using MCP if the alternative is too unreliable" — is exactly Pi's default posture: no MCP; build CLI tools with READMEs, or write an extension if you truly need it.
Smallness is a means. The end is an agent that builds more of itself. Because extensions are typed TS modules using a documented in-process API, the agent can author a new capability and /reload it in the same session — gated by the same beforeToolCall policy you injected.
"Pi's entire idea is that if you want the agent to do something that it doesn't do yet, you don't go and download an extension or a skill. You ask the agent to extend itself. It celebrates the idea of code writing and running code. … It makes you live that idea of using software that builds more software."
"Pi isn't a sealed product. If you need a command, tool, provider, workflow, or UI tweak, just ask Pi to build it. Have Pi manipulate itself in place, hit /reload, and keep going. … If pi doesn't fit your needs, I implore you to fork it. I truly mean it."
The analogy he uses: a hammer that reshapes itself for each job. The same harness becomes a bespoke harness — the agent modifies itself to fit the task, instead of you bending to the tool. registerTool() works at load and at runtime, so self-modification isn't a special mode; it's the same path the core already uses.
Permissions, previews, undo, compaction belong in hooks (beforeToolCall/afterToolCall), not branches inside the loop. One loop, many surfaces.
Wrap providers behind a registry with a simple and a full path. Keep a capability map. Don't get locked to a lowest-common-denominator SDK.
The model speaks Bash and Python fluently. A README-documented CLI tool is more composable, cheaper on context, and self-maintainable than a wall of MCP tools.
The phantom-result-type pattern beats stringly-typed shell hooks: transform / block / observe become statically checkable.
If new capability means "ask the agent to write a tool and reload," extensibility and self-improvement collapse into one mechanism.
"Context engineering is paramount." Split tool output into a model portion and a UI portion. If you can't see the context, you can't debug the agent.