codehelm · masterclass

01·A web server that spawns shells

You put a shell behind a URL.

codehelm runs a web server on your laptop that can open real terminals and run claude in any project. That power is the whole point — and it's also a gift to any page already open in your normal browser. A random tab can fire fetch and WebSocket at 127.0.0.1, and with DNS rebinding it can make its own domain resolve to your loopback and try again. Localhost is not a wall. It's an address every process on the machine can type.

The trap

The reflex is to shrug: it binds to 127.0.0.1, that's private, ship it. But a bind address says where the socket lives, not who's allowed to talk to it. Any process — including a tab you didn't open on purpose — can reach a loopback port. "It's local" is exactly the assumption a rebinding attack is built to exploit.

Toggle a layer off to see how much of the door it was holding. The attack only wins when every layer is off at once.

The decision

No single lock — a stack of independent ones: ephemeral port, a 32-byte token handed over once in the launcher URL and immediately swapped for an HttpOnly, SameSite=Strict cookie, a Host allowlist, a WS Origin check, CSRF double-submit, and a per-request CSP nonce.

Each control guards a different way in, and none of them trusts the others to hold. The random high port keeps you off a guessable address. The token never lands in history or logs, because the first thing /api/auth does is redirect the ?k= away and set a cookie instead. SameSite=Strict means that cookie won't ride along on a request a foreign page makes. If SameSite ever slipped, the CSRF double-submit still wants a header no cross-origin script can read. And the Host allowlist answers rebinding directly: rebind your domain to 127.0.0.1 all you like — the Host header still says your domain, so it's a 403. To get through, a page has to beat every layer at once, not just find the one you forgot.

Road not taken

Why not put TLS on it and feel professional? Because HTTPS on 127.0.0.1 buys a self-signed-cert warning, Secure-cookie bookkeeping, and a maintenance tax — against no attacker it actually stops. The threat here isn't a wiretap on loopback; it's a hostile page in your own browser, and a certificate does nothing about that. mTLS is the same answer one size louder. Both were left out on purpose.

Evidence
docs/SECURITY.md
bin/codehelm
lib/security/token.ts
lib/security/host-check.ts
lib/security/csrf.ts
lib/security/csp.ts

02·The path that escapes its root

Every slug is a path you didn't write.

Project slugs, session ids, terminal working directories — they arrive as strings from the client and end up as filesystem paths under ~/.claude. Treat them as trusted and a crafted slug walks straight out of the directory you meant to confine it to, into anything your account can read.

The trap

So you check the prefix: does the resolved path start with the root? Root is /home/bartek/.claude, the request resolves under it, looks fine. Then someone asks for /home/bartek/.claudeEVIL/secrets. That also starts with /home/bartek/.claude. The prefix check waves it through, because string-prefix doesn't know where a directory boundary is.

The decision

Run fs.realpath to collapse every symlink first, then accept only when the resolved path equals the root exactly, or starts with the root plus the path separator.

The separator is the whole trick. Comparing against root + path.sep means .claudeEVIL no longer matches — the next character after the root has to be a slash, so a sibling directory that merely shares a name prefix is rejected. And running realpath before the check means a symlink can't quietly point out of bounds after you've validated the string; you compare the real location, not the hopeful one. It's fuzzed with a hundred traversal payloads, and every endpoint that touches a path has to go through it — not as a guideline, as the only door.

Evidence
lib/security/path-guard.ts
docs/SECURITY.md

03·The sandbox you don't build

The terminal runs anything. So sandbox it — right?

A tab in codehelm opens a real shell with your environment and your privileges. Read that sentence and the security reflex screams: wrap it. firejail, bubblewrap, a seccomp profile — put the PTY in a box so it can't touch the rest of the machine.

The trap

The box feels like security while buying none. A sandbox around the shell would break the very things a shell is for — your paths, your env, the tools on your PATH — and it would do that to stop an attacker who, by the time they're running commands in your PTY, already is you. They own the account. The shell can't grant them more than they already have.

The decision

Don't sandbox the PTY. Draw the boundary at getting a command to run at all — the token, the cookie, the Origin and CSRF checks — and once a command runs, let it run as the user, because that's all it ever could be.

Security is about where the boundary actually is, not where it feels reassuring. The real attacker in this threat model is a web page reaching for 127.0.0.1, and every layer of the auth stack is aimed at that page. A firejail wrapper aims at a different attacker — one who already has shell access — and against that attacker it's theater: they could just open their own terminal. So the sandbox would cost real ergonomics every day to defend against nobody. The honest move is to say so out loud in the threat model and skip it.

Road not taken

What about a host that's already compromised — a keylogger, a hijacked Chrome profile? Also out of scope, and for the same reason: a local tool can't fix a machine whose owner has already lost it. codehelm assumes a clean host and defends the one boundary it can actually hold.

Evidence
docs/SECURITY.md

04·One server, two protocols

Next can't hand you a raw socket.

The terminal needs a real, bidirectional byte stream — a WebSocket carrying PTY data both ways. The App Router gives you request/response route handlers; it has no seam for a raw upgrade. So the shell feature needs a transport Next doesn't expose.

The trap

The easy answer is a second server: stand up a ws server on its own port next to Next, point the browser at it, done. Except now you have two front doors. Two ports to pin to 127.0.0.1, two places to check the Origin, two auth paths that have to agree on the same cookie — and the day they drift, one of them is the hole.

The decision

One http.Server. Next's request handler takes the HTTP path; a custom upgrade router takes WebSockets, dispatching /api/ws/pty, /api/ws/watch, and Next's own HMR socket off the same listener.

Collapsing it to a single server means there's exactly one thing to secure. One bind to lock to loopback, one Origin check on upgrade, one cookie that both HTTP and WS read the same way. The surface you have to reason about doesn't double — a WebSocket request goes through the same handshake auth as everything else, because it's literally the same server deciding. Fewer doors is fewer doors you can forget to lock.

Road not taken

A standalone ws process — or Socket.IO — would have been faster to wire up and is the default a lot of tutorials reach for. It was passed over because every byte of convenience there is paid back as a second auth surface, and the one asset worth protecting here is a process that can spawn shells.

Evidence
server.ts
lib/ws/server.ts
lib/ws/pty-channel.ts
docs/ARCHITECTURE.md

05·When the shell out-runs the browser

Someone types yes and walks away.

A real PTY can produce output faster than a browser can render it. yes, cat on a huge file, a chatty build — the shell will happily emit megabytes a second. If every byte it produces is shoved onto the socket the instant it appears, the slowest link in the chain buffers the difference, and that buffer is your memory.

The trap

Piping pty.onData straight to ws.send looks clean and works perfectly in the demo. Then a runaway command fills the server's send queue, or the browser's receive buffer, faster than it drains — and the unbounded middle grows until something falls over. The bug never shows up while you're typing ls; it shows up the one time output is infinite.

The first two layers bound memory under one loud command; the third bounds it under many small ones.

The decision

Flow control with a client ACK: the browser acknowledges every 64 kB it has actually consumed; the server pauses the PTY once 1 MB is in flight unacked and resumes when the client catches up — plus a hard cap of 16 concurrent PTYs and 10 spawns a minute.

The ACK turns an open firehose into a loop that can only get as far ahead as the reader allows. When the browser falls behind, the unacked bytes pile up to 1 MB, the server stops reading from the PTY, and the operating system's own pipe backpressure does the rest — the noisy command blocks instead of the buffer growing. Memory stays bounded no matter what runs in the shell. The spawn cap and rate limit cover the other failure shape: not one loud command, but a buggy UI loop opening terminals until file descriptors run out.

Road not taken

Dropping or truncating output when the client is slow would keep memory flat too — and silently corrupt the one thing a terminal must never lie about: what actually printed. Pausing the producer keeps every byte, in order, just slower.

Evidence
lib/ws/pty-channel.ts
lib/pty/manager.ts
docs/SECURITY.md

06·Terminals that outlive the tab

You reload, and your shell is gone.

A terminal lives in a React component. The natural lifecycle ties the PTY to that component: mount spawns it, unmount kills it. Which means the moment you reload the page, switch projects, or close the Chromium window, every shell dies — and so does the long claude run you left working inside it.

The trap

Tying the PTY's life to the component's life is the obvious design, and it's the one that quietly loses your work. The terminal feels like the process, so it's tempting to let the UI own it. But the UI is the most disposable thing in the system — it re-renders, re-mounts, reloads. Anchor a half-hour agent run to a <div> and a stray refresh ends it.

Toggle the top layer off — the browser can vanish and the process underneath keeps running. The disk layer is why it survives a restart.

The decision

The PTY lives on the server, not in the tab. On mount a terminal registers as a persistent PTY and writes the returned id into its pane; on reopen it re-attaches to that same process by id over the WebSocket. The roster lives in ~/.codehelm/persistent-tabs.json, and on restart the server respawns every persistent tab before the UI connects.

Flipping ownership makes the browser a viewport instead of a host. A reload doesn't kill anything — it reconnects to a process that never stopped. A project switch leaves your shells running. Even quitting and relaunching codehelm brings the tabs back, because their definitions outlived the process on disk. The component can come and go as often as React wants; the thing doing the work isn't in the component anymore.

Road not taken

Persisting the full split-layout — which pane sat where — was deliberately deferred, and it's the honest rough edge: each pane registers as its own server tab, so after a reload a three-way split comes back as three separate top-level tabs. The documented path forward is a server-side group_id so panes rehydrate under one parent — a small schema change that survives any browser loading the same ~/.codehelm. It's written down precisely because it isn't done yet.

Evidence
docs/PERSISTENT-TABS.md
lib/pty/persistent-tabs-store.ts
lib/pty/persistent-tabs-service.ts
app/(ui)/terminal/Terminal.tsx

07·Cron that types for you

A scheduled prompt fires while Claude is mid-sentence.

codehelm can schedule a prompt to land in a running claude tab — a daily research run, a nightly summary. But a terminal is a stream with no notion of "wait your turn." Fire your prompt at noon and Claude might be halfway through streaming an answer, or sitting in a fullscreen picker. Write into that and you don't send a prompt — you send keystrokes into the middle of something else.

The trap

The naive scheduler does one thing: when the timer ticks, pty.write the prompt. It works in testing, because in testing the tab is always idle. In real use it pastes "summarise today" into the seventh line of a half-finished response, or into a menu that reads it as navigation. The prompt is technically delivered and practically garbage.

The decision

Gate the write on a ready-check, serialise it with a lock, deliver it as a paste. Cron fires only when Claude is at the input prompt — a marker match, or the tab idle for more than three seconds. An in-memory mutex per tab stops two jobs colliding, and the prompt goes in via bracketed paste — ESC[200~ … ESC[201~ — so the terminal treats it as pasted text, not keys to interpret.

Each piece closes a specific failure. The ready-check is the difference between typing into a prompt and typing into a response — without it, timing is pure luck. Bracketed paste tells the terminal "this is a block of pasted text," so a multi-line prompt isn't re-read line by line as commands or shortcuts. And the mutex handles the case the timer makes likely but rare: two jobs aimed at one tab in the same tick, which without a lock would interleave their bytes into nonsense. The result lands exactly as if you'd typed it — then codehelm steps back and doesn't read the reply.

Road not taken

Capturing and parsing Claude's response — to know whether the job "worked" — was left out on purpose. It would mean scraping a streaming TUI and guessing where an answer starts and ends, a brittle parser chasing an interface that keeps changing. The job's contract stops at delivery: the prompt reached the prompt. What Claude does next is yours to read in the tab.

Evidence
docs/CRON-JOBS.md
lib/cron/executor.ts
lib/cron/tab-lock.ts
lib/pty/ready-check.ts

08·Opening a session that won't fit

The session file is bigger than the screen will ever show.

A Claude session is a JSONL file that grows without a ceiling — thousands of events, every tool call and result, some of them enormous. The viewer has to open one and feel instant, on a file that might be tens of megabytes of text you'll only ever look at a few hundred lines of.

The trap

Read the file, JSON.parse it, map every event to a component. It's three lines and it's correct, and it locks the tab for over a second on a big session while it parses everything and mounts two thousand DOM nodes you can't see. The cost scales with the file, not the screen — so the longer the conversation, the worse the first impression.

The decision

Stream the parse and virtualise the render. The file arrives as a ReadableStream, parsed event-by-event as bytes land, fed into react-virtuoso, which only ever mounts the rows on screen. Shiki loads lazily, per language, the first time a code block of that language appears.

Decoupling work from file size is the whole move. Streaming means the first events render while the tail is still downloading — first byte on screen in under 50 ms instead of after a full parse. Virtualising means a 2000-message scroll holds above 30 fps, because the DOM carries a screenful, not a sessionful — scrolling swaps row contents instead of growing the tree. And lazy-loading Shiki per language means a session full of Python never pays to parse the TypeScript grammar it doesn't show. The viewer's cost tracks what you're looking at, not what the file weighs.

Evidence
lib/jsonl/parser.ts
docs/ARCHITECTURE.md
PERFORMANCE.md

09·A conversation that updates while you read it

Claude is writing to the file you're staring at.

The conversations codehelm shows aren't static — the claude CLI is appending to those JSONL files in real time, from outside the app entirely. Open a session that's actively running and the screen should grow with it, without you reaching for refresh. The data source is changing behind your back, on purpose.

The trap

The reflex for "keep it fresh" is to poll: re-read the directory every couple of seconds, diff, update. It works and it's wasteful — constant filesystem reads to catch a change that happens occasionally, and you're always either too slow (long interval) or too busy (short one). Polling spends the most effort exactly when nothing is happening.

The decision

Watch, don't poll. A single chokidar watcher sits on ~/.claude/projects/, debounced 200 ms per file; on a change it pushes a session-updated event over the WebSocket, and the open viewer attaches a streaming tail of just the new lines instead of reloading the whole file.

An OS-level watch costs nothing until something actually changes, then fires within a debounce window — the opposite spend profile from polling. The 200 ms debounce coalesces the burst of writes the CLI makes into one event instead of a storm. And pushing a targeted "this session changed" lets the client be surgical: invalidate one query, or — if you're looking right at it — tail the new bytes onto the list you're already reading, so the conversation extends in place instead of flickering through a full reload.

Road not taken

Polling on a timer was the alternative, and it's simpler to write. It was passed over because it inverts the cost: maximum work when idle, guaranteed lag when active. A watcher pays only for real change.

Evidence
lib/watcher/chokidar.ts
lib/ws/watch-channel.ts
docs/ARCHITECTURE.md

10·Twelve seconds to a quarter second

The app booted in dev mode every single time.

bin/codehelm is the front door — find a port, mint a token, start the server, open Chromium. For a long time it started Next in dev mode without anyone deciding to. First paint took about twelve seconds, because every route compiled on its first hit, React ran with development checks, and the bundle shipped unminified. The launcher worked, so nobody looked at the clock.

The trap

Shipping the dev server is the silent default — it runs, it's what you tested, and the cost hides until you measure it. The prod branch existed; it just spawned node server.js from the repo root, where no such file lives. So the broken fast path quietly fell through to the working slow one, and "it launches fine" covered for a 30× tax on every cold start.

The decision

Make the launcher detect a real build. If .next/BUILD_ID is present, run NODE_ENV=production tsx server.ts against the prebuilt tree; if it's missing, fall back to dev with a warning explaining how to pnpm build. After health-check, pre-warm /api/projects and /api/settings so the first click isn't the first compile.

The fix is mostly about refusing to guess. A build either exists on disk or it doesn't — .next/BUILD_ID is an honest marker, so the launcher picks prod when prod is actually available instead of hoping. Pointing the custom server at the prebuilt tree skips on-demand compilation entirely: first GET / drops from about 6.9 seconds to 234 milliseconds, /api/projects from 2.4 seconds to 56. Pre-warming the heavy routes after the health check spends idle startup time loading the PTY and JSONL modules, so they're warm before the window finishes painting. And dev stays one --dev flag away, because the goal was to stop defaulting to slow, not to forbid fast iteration.

Road not taken

Auto-rebuilding when source changes was tempting and rejected. Detecting "is this build stale" correctly means walking the whole dependency graph, and getting it subtly wrong means serving yesterday's code while swearing it's fresh. An explicit --build flag is a sentence of friction in exchange for never lying about what's running.

Evidence
PERFORMANCE.md
bin/codehelm
docs/ARCHITECTURE.md