How we'll build with GenAI

I've been thinking a lot lately about the history of the screw.1 Although the screw itself is not particularly groundbreaking or complex, it was critical to the development of mass manufacturing.

During the Industrial Revolution, screws were standardized, so many different manufacturers could produce parts knowing they would fit together because they all were using the same screws. To quote Gemini 1.5 Pro: "The mass production of screws during the Industrial Revolution was a transformative event. It wasn't simply about making more screws; it was about creating a system of standardized, interchangeable parts that underpinned modern manufacturing." This standardization enabled specialization, increased efficiency, lowered production costs, and ultimately paved the way for mass production in manufacturing.

Amid the current abundance of APIs, frameworks, modalities, plugins, SDKs, and protocols for generative AI models, I (and others in the industry) have been thinking along similar lines. Packages like aisuite provide a "simple, unified interface to multiple Generative AI providers." These kinds of libraries are built around and fairly limited to dialogue use cases, however, and OpenAI's chat completions interface has already become the de facto interface for back-and-forth conversations between USERs and ASSISTANTs.

Projects like Anthropic's Model Context Protocol (MCP) take a lower-level approach—MCP is advertised as an 'open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools." MCP is a more screw-like proposal, a more fundamental way of representing the content that passes between systems, in a way that is more robust to how GenAI models will work in the future.

We (my corner of DeepMind, and many collaborators across Google) have our own approach to this, which we released recently at github/google-deepmind/evergreen-spec. We foresee GenAI models increasingly becoming "content-to-content" machines, in which the use can specify an arbitrary mix of content types (PDFs, emails, images, live audio, code...) in the request, and will get an arbitrary mix of content types (text, actions, generated images, retrieved images...) in the response. They won't just power dialogue applications, they'll power all kinds of systems. We designed Evergreen to be a generic, screw-like protocol that can connect the parts within these systems, and these systems to each other.

Building more systems that are fundamentally "fuzzy" and stochastic in nature will be a challenge. Most developers are used to building systems that turn highly structured inputs into highly structured outputs, often deterministically, and typically very quickly. By contrast, non-deterministic, GenAI-driven "content-to-content" systems will break a lot of our assumptions, and the tooling we've built on those assumptions.

Given the hype today around terms like "multi-agent" and "real-time AI," an approach that doesn't foreground the presence of AI in the application isn't going to spark as many headlines.2 But I believe one day "AI systems" will just be considered part of "application logic." The challenge now is to build not just the most powerful AI models, but also the tools needed to make them a fundamental part of the systems we build.


1. This first crossed my radar in Hard Fork's 100 Most Iconic Technologies, which I highly recommend.

2. Note that very few, if any, concepts in the Evergreen protocol are GenAI specific.

Return home