Last week I left Modern Realty (YC S24) to work on founding a different AI startup. To those that helped me along the way build that company, thank you so much.

I left to fix something that has been bugging me about the way people currently run tool-using LLM agents.

LLM agents that use tools today are typically architected to take very similar inputs and regenerate very similar tool call outputs in a loop, while dragging every tool input and output along for the entire duration of execution.

This is an issue for both speed and consistency.

Speed: Even with KV caching, there is an approximately linear latency increase for time-to-first-token response with an increase in input length. Much time is spent waiting for LLM outputs.

Consistency: The fraction of context which is relevant for the next step decreases per step; context rot.

Opus 4.5 might run today at 10-30s per iteration.

But it doesn't need to be like this.

The success of Claude Code and the rise of claude-agent-sdk shows us that the next wave of LLM agents can run code scripts that they create/find on the fly. Instead of running 10 tools and piping outputs from one to the next, if the LLM agent knows a sequence of tools will need to be called serially, it can make a script to call those 4 APIs, attaching each of the output variables into the next API calls, then assess where it's at, then make a script to run the next 6 APIs.

The future of context management is print().

Presuming that this is the future, my next step is to build more tooling to further improve claude-agent-sdk.

Announcing: Raysurfer.

Raysurfer is a stack overflow for long-running LLM agents to help them cache and reuse verified code snippets as scripts from previous runs. Many Xs of speedup and consistency improvements on similar runs.

An early version is available here at raysurfer.com.

If you'd like to chat about it, come talk to me in this Discord channel:

Join the Discord