As agentic products hit the market, sharing some early observations:
1. They're much harder to build than traditional software.
a) We're asking these products to solve dynamic search problems in enterprise environments, under real-world but hard to encode constraints. And they must do so fast - humans can tolerate latency from other humans, but not from software.
b) In other words - Given a task P, a corpus of raw data D, and a family of tools T = {t1, t2..., tn}, the agent must learn an optimal policy that yields an accurate permutation of tool invocations over D to complete task P.
This isn’t trivial. The system must reason over:
- What tools to use (retrieval, summarization, log search, graph traversal…)
- How to compose them into multi-hop workflows with intelligent backtracking.
- How to coordinate multiple agents that each have different permissions within T, and thus differing implicit context over D, from each other.
- When to stop — without hallucinating or looping. Remember, this is particularly hard when you don't have an explicit reward function upfront to optimize against.
2. To be viable in production, these agents must be fast.
- You need a semantic cache. One that persists a global map of P -> {ta, tb...., tk} for every validated piece of work performed by the system. This avoids forcing the system to reason from first principles for every incremental task, especially those inside our posterior distribution. In fact, a semantic cache could even store successful sub-paths, but this gets complicated fast.
- You need a clever way of shrinking the search space, D, to something smaller that is not very lossy. Especially useful when D can be PB / TB scale.
- We need "async-await" and durable execution for agents. I've heard OpenAI's Codex (recent launch) leverages Temporal.
- You need to have agents broadcasting work to each other, especially when some agents have different ACLs than others and context sharing is critical.
3. Pricing is tricky because not all units of work are equal.
Some tasks are low-value and low-risk. Others (e.g. “diagnose the root cause of a multi-region outage” or “triage a lateral movement alert”) are high-value, high-risk (a wrong conclusion is costly) and harder to get right.
Most usage-based pricing models treat these as equivalent and don’t account for outcome value. Startups need to figure out how to price units of work - either on a case by case basis (tricky to communicate to the buyer) or on a basis that accurately mirrors the expected value delivered across various scenarios.
Discussion about this post
No posts