AI Copilots to Coworkers
AI copilots supercharge individuals. AI coworkers will supercharge teams.
Abstract
The next evolution of AI copilots will be AI coworkers that operate in the background and automate a subset of workflows end to end. Copilots produce drafts of work for humans to verify. Coworker products will reliably see tasks through completion. Getting there will require pairing ever more capable models (reasoning engines) with high fidelity simulators that iteratively verify outputs against key constraints and communicate errors.
Introduction
It’s consensus at this point that AI copilots will be ubiquitous across job types and industries. If we look at this list of the highest paying jobs in the US, each of these professions will have some sort of AI copilot that augments their productivity and helps them get more done per unit time. Enterprises writ large will be forced to buy copilot products else they run the risk of inefficient labor forces relative to competitors.
This secular opportunity has given rise to a growing list of startups within various industries, each vying to be the “copilot for X”. Incumbent software vendors are also sprinting to upsell copilot capabilities within their products in order to capture this new class of budget.
AI Copilots Reduce Cognitive Tax
Copilots are nothing but vertical specific instantiations of large language models applied to a specific set of workflows within the enterprise. When they live within an existing software implement (example: Github copilot / Codeium* within VSCode) , copilots significantly reduce the reduce the activation energy required to complete a task.
Let’s take the canonical example of software development. Github’s copilot was famously trained on numerous lines of open source code and made available as an extension within VSCode and other popular IDEs. The reason it took off and why it provided a window of what’s possible with copilot products was that it automatically suggested complete snippets of code for developers.
All of a sudden, devs were enjoying a much smarter IDE where they had an ambient assistant that was always ready to suggest high quality lines of code. With the hit of a keystroke, entire functions and sometimes files could be auto-populated with code. Importantly, the cognitive tax on developers had been substantially reduced where they could spend a lot more of their time validating vs. creating from scratch.
Solution generation is hard. Validating a proposed solution is much easier.
According to research published by Github, “developers who used GitHub Copilot completed the (average) task.. 55% faster than the developers who didn’t use GitHub Copilot.” As long as code underpins software, enterprises have no choice but to equip their developers with copilots. A new market’s been created, it’ll be big, and will support a large outcome for not just Github but also a venture backed startup.
Building a copilot startup: If you’re a founder building what can be classified as a AI copilot for a specific persona, it may be worth thinking through the following questions:
Am I targeting an existing, mission critical workflow that has a good chance to persist regardless of meaningful change in supporting systems and organizational strategy?
AND, is that workflow universal across all users / buyers of interest?
AND, is the person my copilot augments at low risk of losing their job because there is asymmetric demand relative to supply for their skills?
AND, are the benefits of augmentation so great such that your copilot need not be entirely accurate?
There are nuances around how to distribute your copilot too. Feel free to reach out at aditya@kleinerperkins.com if you’d like to discuss.
The Gaps
All this being said, today’s copilots suffer from a few salient flaws.
They only provide bite sized “drafts” for a sliver of domain specific tasks. These “drafts" have no associated guarantees of correctness. Github’s copilot has a 30% suggestion acceptance rate.
There’s no iterative loop that allows the copilot to autonomously “rethink” and “adjust”, preventing it from adding value beyond spontaneous, sometimes accurate suggestions.
The productivity gains from copilot products still merit their existence. Where it makes sense, they will exist in the foreground alongside knowledge workers. However, there is a gaping hole for a new class of AI native product that takes on complex tasks (sometimes menial / mundane ones) for knowledge teams and sees them through completion.
AI Coworkers
This new class of AI native product can be thought of as “AI coworkers” because of their ability to accurately carry out complex tasks end to end. They will serve as extra headcount for teams and will supercharge their productivity. If copilots live in the foreground of applications exhibiting system 1 thinking, coworkers will operate as background processes chipping away at work delegated to them, exhibiting system 2 thinking.
Coworker products become really powerful in quantitative domains where there are provable guarantees of correctness. These include software engineering, dev ops, circuit design, bioinformatics, security and more. The unique aspect of these products is that they pair a performant model with a high fidelity simulator in an iterative loop, allowing the model to self adjust and eventually converge on an output that is verifiably correct.
The simulator is key. It needs to accurately represent the underlying environment the model is operating in. This could be in the form of a graph / tree, sandbox or system of equations. From there, it’s a question of how you guide the LLM to reason over the underlying structure, “gradient descending” to a correct, verifiably correct output. Abstractly, you are quantifying the search space for an LLM and asking it to autonomously traverse the space through a well crafted reward function.
I look back at the work DeepMind did with AlphaZero and excitedly await similar ideas to be brought to bear towards unlocking these next class of AI native products. Here’s a great explainer of how it worked under the hood, but in short, DeepMind combined the power of a neural net with Monte Carlo Tree Search (MCTS) in a “self-play” manner to get computers to get good at the hidden constraints and optimizations in chess.
These same ideas will be powerful in various domains with encodable constraints, which are a lot of non-creative endeavors as software and mathematics continue to eat the world.
Early Use Cases
Unsurprisingly, software engineering is proving to be an early test bed for startups with this ambition. Companies like Cognition, Detail, Factory, Magic and others are building AI engineers that go beyond code suggestions and produce correct work end to end.
We’re also seeing the emergence of startups applying these ideas to EDA, attempting to coax LLMs to generate verifiably correct integrated circuits.
Automated incident response and separately “SOC automation” in security is a third area.
Computational biology - drug and protein design.
Digital assistants.
A note on GTM: For founders that have picked their domain of choice, I believe it’s important to constrain the initial positioning of your coworker product to a sliver of tasks that are valuable in aggregate and where you can guarantee correctness in the wild - even if the kernel of what you’ve built is flexible. A common trapping we’ve seen with analogous products is that they position themselves as “horizontal” from the outset, resulting in users inevitably reaching edge cases (or unhappy paths) and churning due to lost trust.
Closing Thoughts
As we begin to transition models from quick, zero shot inferences to sophisticated reasoning over representations of real world systems, we will see the emergence of “AI coworkers” that supplement entire teams as extra head count.
Startups building coworker products will sell powerful, domain specific reasoning systems under the hood in conjunction with well thought out interfaces for teams to delegate to, interact with and steer the system. These products will command five to six figures annually per coworker sold, and eventually be packaged in tiers (intern, professional etc) with each tier unlocking more powerful reasoning.
If any of this resonates, do reach out to me at anaganath on Twitter or aditya@kleinerperkins.com. I’d love to hear from you!
Thank you Divya Gupta, Dan Robinson, Jim Fan, Varadh Jain, Alex Mckenzie and Emily Hanks for reading an early draft.
Its really interesting to read such a post, as it predicts what product would come in the next wave. We are building something similar