Episode 3

29 minutes 53 seconds May 14th, 2024

Upleveling Human Work

AI agents are the next level of the human-computer relationship, with potential to leverage our professional abilities beyond imagination. How do you build reliable, trustworthy AI agents that can actually work?

David Luan

CEO, Adept

Seth Rosenberg:

I'm Seth Rosenberg, general partner at Greylock and host of Product-Led AI, which is a new series exploring the opportunities at the application layer of AI.

My guest today is David Luan, who is the CEO and Co-founder of Adept. The company is developing multimodal agents designed to work alongside humans in any profession.

David has been an early builder, researcher and pioneer in AI. He was among the first few dozen employees at OpenAI, where he led all of OpenAI's engineering. He then served as co-lead of Google Brain working on frontier large language models. He co-founded Adept in 2022, and the company has stood out for its human-centric approach to AI (and eventually AGI).

David, thanks for joining today. Very excited to dive into the nuances of building agents with Adept.

So maybe to kick it off, when was the moment you knew that you wanted to actually break out and start your own company?

David Luan:

Yeah, I mean I think the easiest way to answer this is to tell a very, very short history of the last – what? 20 years or so of AI.

So when I started getting going, I was actually initially first drawn to robotics. So I thought the idea that you could write these programs that made these physical devices do smart things in the world was one of the coolest things possible. But way back during that period, mid-2000s, nothing really worked right? We were misidentifying horses as dogs and the Tay Chatbot came out and started insulting everybody on Twitter after 24 hours. It was also so early. And so I decided the right answer to do back then was to get into research.

And so after multiple twists and turns, because of previous work I had done in leading research-oriented teams, the founding team at OpenAI brought me into lead research and engineering there as the VP of Engineering. So I did that for three years and I think saw this transition from the previous era of AI of just, you and a couple of pals trying research ideas independently and writing a paper, to this next world of giant scale-up projects, which we did successfully with the GTS and back then the robot hand project and Dota and all this other stuff. And after that I ended up going to Google to lead Google's giant LLM training effort.

What was really interesting about that was sort of seeing the first innings of true productization on some of these models. And it just became really clear to me that first off, the recipe for building general intelligence is increasingly clear. And actually, a critical part of that is having a product. You need a product because you need something that users interact with to teach the models to be smarter. And as a result, the dominant shape of an organization that can win at that doesn't look like what anybody had built before. It doesn't look like the standard thing of having researchers sitting in a corner of the building and then pitching their ideas to engineers and other products trying to get those things landed. It also doesn't look like two separate orgs with two separate roadmaps. What you really need to do is you need to go start with something.

You need to start with a product shape that gets you all the way to general intelligence and you distill that down into what research problems, what engineering problems, and what product problems do you need to solve to build that product shape like a full end-to-end combination of research and products. And that sort of org structure didn't exist. And so on top of the actual technical bet we wanted to make organizationally it was obvious we had to do it as a startup.

Seth Rosenberg:

Yeah, I totally agree. The last 10 years were about the foundational research and infrastructure, and the next 10 years about putting this into products that actually work.

So obviously one of the maybe naive narratives around AI – especially the type of AI that you're building, an agent that can act like a human in front of a computer – there's a narrative around AI replacing jobs. I think you have a different take both on your mission for Adept as well as AI's impact on the world. So maybe spend a moment on that.

David Luan:

Yeah, so I think what's really interesting to me about this question is I think it's fundamentally a mission and values question (and then there's also a little bit of pragmatism in there), but if you go look at how everyone has framed AI progress so far: You go read mission statements of the classic labs, it's always about replacements, about doing things that beat humans at excess, doing a wide [margin] better than humans, and then creating lots of economic value and then needing to figure out how that's going to be redistributed.

And there's also framings about AI as being a race – a race of how can you get to general intelligence as fast as possible, be better than humans, and then how many companies will potentially have a seat like that and then be able to outcompete each other and prevent others and all this crazy stuff. It's just one formulation. And I think that you can just reject this framing entirely, and I think you can reject it not only out of a sense of mission of ‘Hey, what if we don't want to be building systems that replace people?’ but also [a sense that] I think potentially a misunderstanding of how technology is even diffused within society on the ladder when we go look at things like having a calculator or writing or using a computer. Those are technologies we've built as humans over the last couple of millennia and what has turned out has really happened is it's actually improved the cognitive skills of people. They've become cognitive technologies for people.

There are technologies we've built as humans over the last coupe of millennia, and what has turned out is that they've actually improved the cognitive skills of people

I think this is a really interesting study (that I should probably read again before I quote as definitive), but I think that showed that for cultures that didn't have reading, humans' ability to be able to extrapolate the new events and imagine things as done by tests was a lot lower than that of cultures that had reading and writing. And so I think similarly, by having these AI systems that get smarter and smarter – that work for people instead of replacing them – we are kind of basically building a new set of cognitive technologies for people that actually end up leveling humans. And that's a world I'm much more excited about living in.

Seth Rosenberg:

Yeah, I totally agree. I think people also underestimate the demand side of the equation and focus on the supply side. You can automate certain tasks, but demand doesn't stay constant, right?

David Luan:

It is just changing. It's changing the profile of work. I feel like the real human things (that at least I'm excited to do, and most people I know are really excited to do; they're excited to figure out what should we be doing, why should we be doing it? Who do I coordinate with and how do I deeply understand the people I'm working with, the people I'm selling to?) Those are really human. And I think the dream is how do you get people to focus on that and not focus on the tedium of like, ‘Ah, I have to go spend eight hours shuffling things around in my database, or I have to go physically, manually create a part’? Those things, the execution is what we really want to be able to delegate. Right?

Seth Rosenberg:

Exactly. 

So tell us: what is Adept?

David Luan:

Yeah, that's a really good question. Adept is an AI agent that helps humans do anything they need to do on a computer. And so let's just break down each part of that. 

What is an AI agent? The agent term has kind of become really diluted these days because it's become the new hot thing to slap on a company. But the true definition of an agent actually comes from folks who have been actually following the reinforcement learning side of things. An agent is an intelligent system that figures out the correct end actions to take to help you achieve a goal. So if my goal is I want to move these end leads from one stage to another in Salesforce, the agent figures out, okay, here are the end actions I need to do to go do that in the most efficient way possible.

What an agent is not is a chatbot you just talked to that just talks back at you, right? It doesn't do things. It's not taking actions. And similarly, an agent is not a single step API call because that's kind of the degenerate base case. It doesn't run a workflow. Agents run workflows for people. 

So what Adept is doing is we're making it possible for every knowledge worker to have this AI teammate that they can quickly show how to do tasks on their computer and then ask the agent to go do for them from there on out. And it's part of that upleveling of human work, a mission that we've always had. And I think as our models get more and more intelligent today, they do a lot of things that involve, for example, shuttling data from system A to system B or helping people fill out forms to do employee onboarding or do logistics and supply chain and all these very operational tasks. But tasks, every incremental amount of work we put into model intelligence gets our agent to help with higher and higher level things, ultimately to the point where this truly becomes a teammate that is a collaborator that you can talk to and interact with and solve problems together and brainstorm and also have it help you with the institution.

Seth Rosenberg:

So, it’s 2024: the year of agents. I think it's a good definition of what an agent is, what it is not. 

So maybe break down for us the architecture of an agent in your mind, from the foundational model to the orchestration layer to the integration into enterprise data to the UI and workflow. First of all, what's your perspective on the correct architecture of an agent and where do you think IP can be built?

David Luan:

Yeah, so I want to answer that for Adept and then we can answer that for the general case. For Adept, we look pretty different from a lot of companies in the space because we both control our foundation model for agents and we control the agent product that enterprises use. And our company is actually a vertical integration of those two things. 

So what our agent stack looks like is we have a base multimodal model that is specifically oriented around being really good for agentic use cases. So it has capabilities that others we're just using GPT or Claude or anything like that can't get. So an example of that is we're extremely good at fine-grained understanding of user interfaces. So our ability, for example, to figure out what you interact with to get a task done is in the nineties, whereas when we benchmarked that of GPT and Gemini, they're between two to four to 15% accurate. So it's a giant gap. While at the same time on a basic understanding of knowledge work, we specialize our models towards knowledge work because that's what people use in the enterprise. We care a lot less about cat and dog photos and all the other stuff people put in these models these days. And on knowledge work tasks actually, even though our models are pretty small and therefore very fast, they have higher accuracy than GPT and Claude-3 Opus and Gemini 5 Pro. 

So that's what we start with. We start with this base model that's really smart at being an agent that's also very fast. And what we then do is –  I think there's also another thing that people in the space may not have fully realized  – the dream for agents isn't a giant text box in the sky that you're like, ‘Hey, I want you to go do this business task for me, figure it out.’

What we learned is actually the most important thing for building useful agents in enterprise is the ability to be handed kind of like a standard set of operating procedures and be trusted to go do that. So we care a lot about the ability for the agent to follow any constraint on instructions going forward. And so what our stack looks like is sort of that next instruction-following layer, followed by a user interface that makes it possible for humans to easily have oversight of the agents to hate. And the UX for these things as we were chatting about before, beforehand is not going to be that of a chat bot. This idea that humans want to be able to specify everything down to the T in just words, we've actually found to be quite a big limiter on productivity.

Seth Rosenberg:

Yeah, that makes sense. 

So everyone's favorite debate in AI is who's going to win? Is it going to be one large model to rule them all? Is it going to be open source fine-tuned models? Is it going to be large models trained on specific use cases for the agentic use case? Obviously you have a very opinionated point of view for Adept, but what's your take on how the space evolves in terms of the number of large models available and what type of use cases each one is focused on and in what cases for product builders it makes more sense to just use GPT-4 out of the box and focus on other areas versus fine tuning?

David Luan:

Yeah, so I think there's a tremendous amount of fog of war. I have a fairly strong opinion, but I think that new information I think could change a lot of this. My view right now is that there's a crop of companies that are training that are just in the business of training models. So the open AI’s with, well OpenAI is not quite right because they also have their own products, but adjust the GPT training part of it, Anthropic’s entirely around training models, Cohere, Mistral, et cetera. I think that that space is really quickly commoditizing because it's the same corpus of training data. The architectures are mostly similar.

There's no real long-term defensibility in ideas in that space because they diffuse within order of months and as a result it really just becomes a cost of capital game. And I think that there will be end organizations that will be able to afford that. Open source is getting really good, really fast. Meta's obviously investing a lot in the space really smartly for them, but that also just sets the floor. If you've got a model that isn't as good as the next LLama, you don't really exist as a business, you don't get to charge more than the cost of you. And so I think companies in that space will have to look for alternate ways of making money besides just having better models. 

But I think that long-term, what's going to happen is that people who figure out how to hook up very differentiated approaches of generating data for the most valuable tasks to the base models that they also control will have tremendous leverage. And I think that's why I remain bullish on OpenAI. I feel like the fact that they have ChatGPT, that they have a lot of developers using ChatGPT actually for code-use cases, all of these things gives them a flywheel where they could overtake someone who's literally downloading more stuff on the internet, whereas an API business doesn't give you an opportunity to have a data flywheel.

Seth Rosenberg:

Yeah, I think that's really smart. And so tell us a little bit about how you're designing the frontend of Adept, the agent, in order to maximize valuable data collection.

David Luan:

Yeah, so I think that's a really good question because that's exactly our bet is that by end-to-end and optimizing our foundation model with the agent use case, we're collecting agent data that makes the whole thing smarter. I think that the way that I think about it is there's two parts to it. 

One of them is, why does the stuff make a model smarter? I think when we go look at, we talk to ChatGPT or stuff, there's been so much RLHF data – when you basically take human feedback on how well a model did something to improve the model's behavior through a reinforcement learning or something that looks like a reinforcement learning sort of loop. It's like these models out of the box are trained to just clone human behavior so they don't understand reward. You give it a reward signal by collecting lots of data about what good and bad looks like, and you teach the model to follow. So that's RLHF. 

And it's been with things like ChatGPT, a lot has been done on it in the summarization chat, all those other spaces. Our goal with Adept is to do that same loop but with agents. And that's so important because out of the box, these foundation models are really unreliable for agent tasks and reliability is almost the only thing that people care about. You go ask someone to book a flight for you in the background, even for a consumer use case, which we don't cover, you expect that to do a good job and not to have already swiped your credit card for something entirely different on the wrong date.

Seth Rosenberg:

And what are some interesting techniques of how to make RLHF actually productive? I've also heard stories of after models out in the wild and they get human feedback, parts of it can actually get worse. For example, humans are bad at probability or bad at math.

David Luan:

Yeah. So basically I think the answer is you want to be – this is actually a nice thing of our enterprise strategy – you want to be learning from the smartest knowledge workers in the world. I think that collecting more RLHF data for how to chit chat about your day doesn't make your model smarter. But in a work setting, all these people are, we're all paid every day to try to use intelligence as a way to get an advantage in business. And so the data that comes from people doing that is just inherently more valuable and less prone to some of the same problems you were talking about than arbitrary data.

Seth Rosenberg:

That makes sense. And what's your take on how vertical specific agents will become, right? Is Adept going to be my accountant, my lawyer, my analyst, my researcher, et cetera? Or where do you think lines will be drawn?

David Luan:

I think that my boring take on this is that within a couple of years, the whole AI concept is going to fade in the background and we'll just consider all companies will be AI companies, which just won't even matter. And it's just going to be a lot of amazing enterprise SaaS businesses built on AI agent technology that each has found a niche and is beating each other on things like go-to-market and understanding the customer, and less on 'Is my agent 5% smarter?' I think there'll be a lot of companies that will cover head use cases like invoice processing or customer support even. And so there will be many great companies built that way. 

My boring take on this is that within a couple of years, the whole AI concept is going to fade into the background. We'll just consider all companies as AI companies, and it's going to be a lot of amazing enterprise SaaS businesses built on AI agent tech that has each found a niche and are beating each other on things like GTM and understanding the customer.

But I think the reason why we're really bullish on the path we're taking on generally with AI with Adept is that if you just look under the hood a little bit, even some of the most obvious use cases in enterprises of workflows, people want to sound like they should be really common but are extremely customized to that business and their customers. And so if we were to apportion the overall pie of agent tasks to be done, maybe 10% of it or less is cookie cutter and percent is like, I need to teach my specific workflow to the agent and we want to eat that 90%.

Seth Rosenberg:

Yeah, I feel like you were several years ahead of where the market is, where everyone's been enamored, to your point, around chatbots and image generation, video generation, but it feels like the real value is going to come from actually executing work. And that's actually a different problem.

David Luan:

And our job is to execute custom work.

Seth Rosenberg:

Basically. And so on that point, again, being an independent thank you over here, I would say a lot of people, including OpenAI, have had success by just launching products into the wild. There's this magic effect where if you're first to create an agent that really works (and Adept is obviously one of the very few companies in that category), just releasing it to the wild can create a lot of Twitter buzz. You've taken a different approach on go-to-market. You've decided to go enterprise first. Talk us through your thought process.

David Luan:

So we're extremely quiet as a company. I think we've been really focused on building something that works well, that is reliable enough to be deployed, and then getting that in the hands of customers rather than sort of a bottom-up strategy combined with a lot of marketing. And some days I actually wonder whether we did the right thing, but what took us down the path of focusing on enterprise actually comes from a lesson that we learned last year, which is that for agents, the only thing people care about is reliability. 

When you talk to a chatbot and the chatbot says something dumb, one out of three times you don't care, because it's like you either enjoyed the interaction or it helps stoke your thinking or something like that. But if you're trusting this thing to go handle shuffling data around in Salesforce, then it deletes a third of your records, you're never going to use this thing again, or at minimum it's not useful to you. You could have just done the work yourself. And so we realized that the only thing that customers cared about was reliability. And because of that, we decided to focus on enterprise because there's a lot of value there. And in those settings, we can control for very high reliability and get stuff out in the market. Now we could put out more toys, we put out a toy called experiments that was just for kicks at the end of last year. But I feel like that's not the path to glory in the space.

Seth Rosenberg:

And in this stack of building Adept you're solving a lot of hard problems. The foundational model that you're building yourself, the orchestration, agentic use cases, the UI and workflow, how big of a component are the actual enterprise integrations as part of that stack, in terms of how your engineering team is spending your time and how difficult that problem is?

David Luan:

It's super difficult. I think the other thing (that I think most people won't say directly in AI) is that very little of it is glamorous. A lot of it is about mean, even on the model side, it's like data, low level, the level, systems issues, all that stuff. But on the customer side too, it's like you just got to do whatever it takes to get a deployment that is reliable enough for those folks to depend on at work. One of our use cases literally involves a physical truck being sent to a shipping container port, and if we screw up along the way, there's a truck being sent with no container on the other end. So that's really bad. And so what we end up doing is right now we work with the customer to get the reliability, but a large chart of our research roadmap is how do we make it more, and more, and more, out-of-the- box 95% from the get go?

Seth Rosenberg:

And as you're building this business, how do you approach the potential need or pull to do custom integrations or service-oriented work to make it work for a certain company versus the long-term vision of a generalizable agent?

David Luan:

I think this starts from a belief that we have (that actually is informed by the machine learning side), which is that the best thing to do is to figure out how you can delegate generalization to the neural network. So how can we try to make as many customer use cases, ones that sort of inform the ability of the base model to serve customer n+1 looks like an interpolation of two existing customers you already have. And so because of that, our philosophy is everything should be general unless you have to ship this thing tomorrow and then we'll go build a specific thing there. But we'll add that customer's work in a dummy data fashion to our evaluation set and then be like,’All right, research team, how do we make sure out of the box the model can do well for that customer?’ And then over time rip out the custom bit.

Seth Rosenberg:

Yeah, super interesting. 

So maybe let's talk about the future for a second. Humans are going to focus on their customers, focus on deciding what needs to be built, not actually building the thing. So in this world, what does software look like? There's systems of records, there's agents, there's legacy apps, maybe there's new types of apps. Walk us through what this future agent world looks like.

David Luan:

Yeah, so that's a really astute question, and it's part of why for us at Adept we have really emphasized the importance of design. From day zero, we've hired a lot of creative technologists type people in large part because we knew that the steady state of agents is going to look like a reinvention of how you use your computer, and how you figure out what that looks like as a tremendous influence on what your product shape is, and also your actual modeling problems. So it's all one giant game of co-design. 

My view basically is that computers have always been about giving leverage to people, and it started out with very little leverage. You had a punch card, you were literally punching in programs that programs didn't do very much, and then you had the command line, which was an interface abstraction that gave you way more leverage on your time and ability to do things.

And then when we realized that we could give people even more affordances through graphical user interfaces, we transitioned over to mostly doing that with the exception of certain specialized tasks to be dropped back into the command line. What that's really done is that for every unit of energy as a human, you can do way more with a computer, you could ever do before. I think what's really powerful about agents is it's the obvious next step. 

Beyond that, once agents exist that can control your machine, you only need to drop into the GUI to go do things that for some reason the agent can't do for you, and that it's something you want to supervise. And so I think to some extent, my analogy is like when Windows 3.1, that Windows 3.1 era, your computer boots up into DOS and you type “Windows”, you hit enter, and then you get into the GUI. I think we're in the very early innings of that transition for agents right now, and I think that ultimately what it'll all likely look like is that you become a coordinator interfacer with an agent or a set of agents on your machine that you basically can basically work with and it's almost like a generative UI way. The agent should be able to generate the right affordances for you to best collaborate with it on any particular task and everything else will be abstracted away.

Seth Rosenberg:

Yeah, yeah, I'm very excited for that feature and thank you for building it. One final question on that. So in this world where we have several billion agents, maybe several agents per person, what are the missing infrastructure pieces for agents to operate in that world? I'm thinking how do agents pay each other? How do they pass a captcha test? How do they understand data privacy, what to share with others? What needs to be built to allow agents to operate in this world?

David Luan:

That's a really good question. I think the most interesting thing about agents is that, especially if you take the Adept formulation, which is that it uses your computer like a human or can do that in addition to APIs, you kind of get access to the same rails underneath that we already have as people. So payments can be handled via regular old payment channels. We have to solve credentials. How do you safely share creds with your agent and have to be able to run either locally or in a VM or something like that? That's going to be tough, but you get to use everything else. 

Just like how things are today, I think what's going to be really challenging is the sheer amount of compute that is going to go to training and serving these agents is going to be a colossal, and I think what a big shift that's going to happen is when you are literally getting hours of knowledge work that you would've had to do yourself done by your agent. Your willingness to pay for ridiculous inference is enormous. And so I think what we're going to see is we're going to see people – I think it all, again, boils down to hardware –  I think we're going to see people invest in ridiculous data centers, not just for running small edge models. We're actually running the smartest models possible on the planet, and I don't think we're anywhere close to saturating them.

Seth Rosenberg:

So I'm going to buy some more Nvidia stock.

David Luan:

Yeah, I'll probably do that.

Seth Rosenberg:

Okay. David, thank you so much for taking the time. Obviously we're lucky and privileged to be investors in Adept, and thank you for continuing to push this industry forward.

David Luan:

Thanks, Seth. This was awesome.