Infrastructure for Agents

Reliability is all we need.

Jul 15, 2025

Reliability is the name of the game for agents, and it's unlikely to be solved purely at the model layer for the foreseeable future. This is creating green shoots for infrastructure builders, with a few interesting trends starting to emerge:

1. Simulation as CI for agents:

a) The most valuable piece of data today is trajectory data i.e. collections of task (P) -> {t1, t2... tk} mappings. With more trajectory data, agents can be improved with techniques like RFT.
b) Since these trajectories can be quite specific to a company's underlying data (D), you need to be able to actually simulate the behavior of agents within your environment vs. rely on 3P trajectory data.

So, how might you do this?

- Maintain an agent and MCP registry for an enterprise, and a staging environment. Bootstrap a metadata layer that contains the objective of each agent, the tools it has access to, the scope of each agent vis.a.vis each tool etc. Your SDK may need to generate MCP servers on the fly for certain internal applications.
- Execute scenarios in staging for each agent by providing prompt / task variations, inspecting the tool calls produced and evaluating performance against a multi-objective reward function (e.g. performance against the objective, minimization of tool invocations).
- A critical component is accurately providing quantifiable reward functions for each agent that unlock high-fidelity evals and close the loop for reliable CI.
- All of this needs to be productized: easy-to-adopt infrastructure that developers can extend, but with batteries included. You can start to see a new paradigm forming—not unit tests for code, but simulation harnesses for agents.

What happens when you get trajectory data?

2. Enterprises will move to "context lakes":

- An evolving, queryable memory layer that serves as a hub for agent trajectories enriched by enterprise data stored in the delta lake / SNOW. A potent mix of a knowledge base, a semantic cache, and an execution log.
- Extremely fast reads for inference-time retrieval that supports high QPS.
- As mentioned in a prior post, the semantic cache (really interesting opportunity for startups) will cluster task–trajectory pairs (e.g., via k-means), enabling fast retrieval and “result fusing” during planning or tool selection.

Agents will dip into the context lake constantly. High QPS, low-latency context fetch will become as important as fast embedding search is today.

3. Agent authentication becomes a first-class concern:

-Traditional OAuth and API key models break down when agents act on behalf of users and themselves, across long-lived sessions.
-You need a framework for agent identity, delegation, and scoping—one that supports things like tool level permissions, task bound credentials and delegation graphs.

We’re entering an era where testing software means simulating behavior, querying software means retrieving context, and securing software means authenticating autonomous agents.

Aditya’s Newsletter

Discussion about this post