Episode 4

42 minutes, 12 seconds May 23rd, 2024

Rewiring How We Work

AI agents and copilots that work alongside us are already fundamentally changing the way some businesses are run, structured, and even built. Jared Spataro weighs in on the ways AI is rewiring longstanding norms in the way we interact with software, our co-workers, and the working world at large.

Jared Spataro

Microsoft

Seth Rosenberg:

This is Product-Led AI, I’m Seth Rosenberg.

I'm interviewing some of the most interesting leaders in AI today, and I'm extremely excited to have Jared here. Jared has, I think, one of the most interesting jobs in technology today. He's the corporate vice president of AI at Work at Microsoft – which includes small products like Windows, Office, Copilot, Teams – and Jared is really at the center of real AI enterprise adoption at the application layer.

So I'm extremely excited to speak with Jared about everything ranging from how they're building the product, to how people are thinking about buying it, to the underlying technology and what opportunities he sees for startups in the ecosystem.

So with that, welcome Jared.

Jared Spataro:

Thanks for having me, Seth.

Seth Rosenberg:

I would love to start with just telling everyone about your background, how you got into the role you're in today, and what you focus on at Microsoft.

Jared Spataro:

Sure thing. Thanks again for having me, Seth.

Background wise, I'm educated as a computer scientist. Started out coding, but fairly quickly in my career, I really realized that I was excited about not just coding products, making products, but also making products successful. So I switched over to the business side, got an MBA and kind of never looked back.

The technology background is still really useful, but I'm more focused now on [questions like] what markets, what products, how do we scale those products? What does adoption look like? Even some of the mechanics that people don't find interesting, like, well, how do you drive an enterprise sales force? How do you get to scale on new domains like AI?

So it's a fun job. It's a hard job. We're right in the middle of the storm right now, and I'm sure we're going to hit a lot of those things today.

Seth Rosenberg:

That's amazing. Give us a quick overview of what's under your purview at Microsoft in terms of the products that you lead.

Jared Spataro:

Well, I get to work on two businesses. I sit in the product marketing function here at Microsoft, which operates a little bit differently than oftentimes we see in the industry. And the two businesses I work on, we call one Modern Work, and it was really birthed out of the pandemic when we brought together the underlying foundation, as you say, of Windows. On top of that, sitting the Office applications, Word, Excel, PowerPoint Outlook. Then Teams became such a runaway hit for us during the pandemic. So that was something that really allowed us to focus our energies, and most recently Copilot that kind of wraps all of that up together with AI. So that's one side of what I do.

And then I have the real honor of also working on what we call our business applications business. This is the Dynamics portfolio. So Dynamics, CRM, ERP, customer service, and also our Loco platform that's known as the power platform. So when you bring those things together, what we're excited about is it's an unparalleled set of assets to look at the future and to think about how you might rewire the way a firm operates. And that's some of the kind of altitude that we're operating at right now is thinking, where does this all go with this technology?

Seth Rosenberg:

That's amazing. It seems like Copilot has become kind of the default word in the industry, and I believe you guys coined it. Walk us through the origin stories of Copilot at Microsoft and what it looks like today.

Jared Spataro:

Sure. It started out with GitHub Copilot, and I actually don't know who coined that name for GitHub, but we loved it the first time we heard it – this idea that it wasn't meant to replace you, but it was meant to kind of endow you with superpowers. We kind of really liked what Copilot could do. We felt like it didn't claim too much, it didn't say it was your super suit, but at the same time it gave you a sense that there was human agency that would be augmented in some good way.

So it came from GitHub, and GitHub did a great job of I think, pioneering the overall pattern, the model that there's a human at the center, we call it human in the loop now. I think that's becoming fairly common. And then it would essentially act as a personal assistant in a particular domain.
The first time I saw the capabilities of what today we call Copilot for Microsoft 365, it was a set of essentially wireframe demos that my team had worked on together with a combination of engineering and design. And it was just features. It was just like, "Hey, we could do some cool AI stuff over here in Outlook. We could do some cool AI stuff over here in Word." And as we looked at it, we thought, dang, how's this going to work?

And it was through a series of meetings directly actually with Satya Nadella, our CEO, where he at one point turned to me and was like, "Jared, it feels like it's just scattershot. Can you guys bring this together?" And a really, really talented team kind of got together and over the course of just a couple of weeks came up with Copilot, which now seems so obvious in retrospect, but we thought what GitHub had done could be directly applied.

And then started to work through essentially a design-led sprint, design leading with marketing and engineering, playing the other two legs of the stool to create the product. And we created the product at first (as you always do, through Figma and motion graphics, if you will). And then from there, as things started to gel, we actually created the product together. So it was one of the most fun things I've been able to do in my career, but also one of the most stressful.

Seth Rosenberg:

That's amazing. And now before we get into the strategy and the underlying technology behind Copilot, maybe walk us through what is the product today? Which applications is it deployed and what are its capabilities today?

Jared Spataro:

Sure thing. So I'll focus for a moment on Copilot for Microsoft 365, that's the one that has the broadest usage and kind of broadest awareness in the industry.

You can basically think of it having two pieces of the value prop. There's the piece that is a Copilot, a personal assistant that's embedded into the applications that people use already. So here we're embedding it into teams where it actually really sings into Outlook, which is the second scenario that people really latch onto and into Word, Excel and PowerPoint. And we're able in that half of the value prop to meet people where they are, they're already doing work, we can come in and help them do that work more efficiently.

Interestingly enough though, that's not the most transformative part of the product, the most transformative part of the product is essentially when you get to a chat interface that can reason across all of what we called the graph (or essentially the data) associated with your job.

So here we provide an interface, you can get to it at Copilot.microsoft.com, you just sign in with your organizational ID and you can ask questions of your email inbox, your documents, documents shared with you, your calendar, your Teams meetings, chats that you've had with people, just across a wide array of things. And we've extended it out so you get other data. And here you can do amazing things. You can say things like, "I'm getting ready for this meeting tomorrow. Can you pull together everything my team has put together and give me a sense for how you think I should approach the meeting with this customer?" And it will reason across the inputs, do a really nice job of putting it together.

And it's like, gosh, when you see that for the first time, it kind of knocks your socks off. There's no comparable. That's the thing I'd say Seth, is it's like we stepped into a moment when there is nothing that does...Outside of having a human assistant, there's nothing that does this work and that second part of the value prop, that's where we really started to capture people's imaginations.

[Copilot] reasons across inputs and puts it all together. Outside of having a human assistant, there's nothing that does this work... that's the part of the value prop where we really started to capture people's imaginations.

 

Seth Rosenberg:

That's amazing. So I want to double click on that. It seems like there's this spectrum where on one end, a Copilot makes using an existing tool easier and faster. It's an autocomplete tool. And then on the other end of the spectrum, it's doing work autonomously for you, right? There's a lot of dimensions among that spectrum. I'm curious how Microsoft is positioned to go from automating the use of a tool to doing work for you.

Jared Spataro:

Yeah, great question.

Let me start by saying I learned something in the process that again, now is very intuitive, but I just wasn't smart enough. When I first saw Copilot as we brought it together in those design sprints, I was just enamored with that second half of the value prop. I was like, "Look, ma. Look what it can do." But as I tried to show that to the world, it was so new and it was so powerful. People had a really hard time grabbing it. The phrase that I coined was we had to meet people where they were and take them by the hand to help them understand the art of the possible.

So meeting people where they were started literally with where they were, which was in Outlook and in Teams, and we've published numbers. Teams, for instance, we've said we have more than 330 million monthly active users of that interface.So we were able to grab them there and say, we can help you not attend meetings where you can just query the meeting. We can help you in meetings, do some things that are useful, but then we gradually, through the product, are trying to work them over into these other dimensions as you indicate. In order to do that, reasoning over the data that Microsoft knows about your inbox, documents and SharePoint, that's useful. But immediately you find people saying, "Well, I have a task. I want you to, as an example, give me the last three months of sales data on this product line, plot a graph in PowerPoint and put it together so that I can speak to it with notes."

And it turns out Copilot can do all of that except it has to have access to the finance data. So in order for us to move toward the right-hand side of your spectrum where we're more process and automation oriented and often more outcome oriented, we've definitely had to really plug into other people's data. And we have the technical underpinnings to do that and we're excited about that. We have a product called Copilot Studio that essentially works through plugins, connectors, and extensions. So we've done a lot of work over the last, I don't, know, six months or so.

But probably the most interesting thing for me, Seth, is the economics of it all. That's yet to play out. If you're SAP, if you're one of our other wonderful partners, they're like, "Hey, Jared, I love you, but this is my data. This matters a lot." And so that's leading to some wonderful strategy conversations about how do companies work together, how do providers work together for the benefit of the customer?

Seth Rosenberg:

Yeah. That's super interesting. Another dimension of this spectrum from kind of Copilot to agent, Automating the use of a tool versus doing work autonomously for you. One is this kind of data integration piece. A second I'm curious to get your perspective on is how vertical specific does it need to be versus horizontal? Obviously Microsoft is operating at such a scale that it's difficult to be super vertical specific. So how do you think about which opportunities need to be very role oriented, vertical-oriented versus horizontal?

Jared Spataro:

Ah, Seth, you're catching me – that's the question of the moment. But let me explain the dynamic of the market right now. Faster than I ever thought we'd get to be on the left-hand side of your spectrum where we're looking at helping you use a tool, we've been able to prove out both by our own studies and with customers that you can truly save time.

We're at the point now where we've claimed, "Hey, you don't have to be an expert Copilot user. Good Copilot users can easily save 10 hours a month." I have customers that are coming in and saying, "10 is low. We actually found 13 or 15." So faster than I thought, people are getting to time savings. But the most fascinating thing that I should have again known in retrospect but I didn't, is they're saying, "But Jared, you give me 10 hours back for the average information worker. I don't know how to monetize that. It doesn't save me money, it doesn't make me any extra money. It improves their quality of life, but it's not on my OpEx budget. I don't know what to do with it."

So we have immediately found ourselves starting to push into more business outcomes in order to prove the value. And there's a click stop between broad horizontal productivity, which is the tasks of information workers searching, creating decks, doing email, and what you describe as kind of vertical applications. And that click stop I'm finding is functional applications, meaning what can I do for the legal function? What can I do for the sales function? What can I do for the support function? There, there's a lot of commonality across various industries, and we're finding a lot of success as we dig into those.

The number-one leading scenario for us actually happens to be in customer support. It is essentially deflection. So creating some interesting interfaces upfront to catch customers when they have questions and helping them not go to people, that's pretty low understood. And then using the power of generative AI by underlying GPT, a foundational model in our case to increase throughput and efficiency of the customer support agents. And we've really seen good examples there. But that's the thing we're learning right now is I think ultimately you're right, we'll get to actually industry applications. Right now, we've got a lot to do, a lot of value to create within the functional space.

Seth Rosenberg:

That makes sense. And where do you think Microsoft's ambitions stop and leave room for startups?

Jared Spataro:

Oh, gosh, there's so much opportunity for startups.

So at the outset, you allowed me to lay out the portfolio that I'm responsible for, what a portfolio. It's this big broad set of things. What I find is that there is so much opportunity as I'm working with customers as you go into any particular scenario and really double down on that scenario.

So I'll give you an example. Even as I look at sales and productivity per head in sales, helping people to essentially increase that particular metric and focus on the daily activities of sales, we're very interested in that from a Microsoft perspective, no doubt about it. But when we see companies who decide that they're going to go in, understand the domain, bring the domain expertise, they will frankly always do it better than we will, and we see that. That, we feel like we understand well.

So our biggest strategic point is that we want to be a platform as much as possible so that people can build on us, use our very valuable data, and that we can be a part of the solution, a valuable part of the solution. But gosh, I could pick any of those functions. We know people who are going into specific things for attorneys, both corporate attorneys and outside. I look at them and as I meet with them, I say, "Man, more power to you guys. I hope I can help you out with the tooling, I can present to you. I hope that I can in some ways become a distribution channel. Again, if you think of 330 plus million people in a Teams interface every day, gosh, I hope I can help you there."

So we kind of think we win when people are taking advantage of those opportunities. But across every one of the functions in the industries, there's just a ton of opportunity and we would encourage it. We hope that people get in there and provide that value because it really will, I think, benefit the entire market.

There's so much opportunity for startups. Companies who understand the domain and bring the expertise will (frankly) always do better than we will. So our biggest strategic point is to be a platform so people can build on us.

Seth Rosenberg:

Yeah, that makes sense. And to pull on your point on distribution for a moment from the outside, I think it's easy to take for granted that when Microsoft launches a new product, you have such an amazing sales force that people will adopt these new products. But it seems like you have a pretty nuanced perspective around making the case for enterprises to adopt Copilot, and you made the one point around shifting from time-saving to real business outcomes. I'm curious to hear more about how you're positioning it in terms of deployment and go-to-market and what questions you're getting from enterprises.

Jared Spataro:

Yeah, boy, I'll lay out the surface area. You tell me where you want to go deep. We have set this up like many in the industry as not a part of the package that you would buy from us in a standard way. So in other words, you have to buy Copilot separately. Our standard SKU is $30 per user, per month. That's a price tag that has to justify its value. On the one hand, if you're looking at the fully loaded cost of an FTE, not that much. On the other hand, if you're looking at the tool set you already provide, you could argue, "Hey, I better get a lot of value for that." So when we think of how do we do this, how do we do distribution, how do we do adoption, et cetera, we're just finding first, you have to convince people to essentially do basically a land and expand motion.

They're going to start with a certain number of licenses. Then they're trying to prove out the value of those licenses. And frankly, even though it's the future’s tech, tomorrow's tech, it is yesterday's adoption methods. I mean, you kind of have to pound the ground and you have to be there helping people understand how they can use it. There has not been a super-duper new age way to do it. We've been very successful when we get in, prove out the value and help people to expand from an initial land, but it really has taken that at this point.

So it's been a very interesting thing. I do think that people often look at Microsoft say, "Oh, you guys have this huge distribution advantage." Well, the barrier to trial is so low in the enterprise space these days, we find you have to literally battle it out for every DAU, every daily active user with the product itself.

Several years ago, I think the pandemic really taught us that, if I'm honest with you, we recognize wow, with a competitor like Zoom or a competitor like Slack, when users can choose, you better be good, really good so that they choose you.

Seth Rosenberg:

That makes sense. And so on the land, what's been most effective so far in terms of either product or segmentation?

Jared Spataro:

On the land, I think we have had great success, interestingly enough, by creating some hype that people want to just start. We don't often get that lift, that tailwind as Microsoft. It is a tailwind that I think oftentimes (speaking to our audience today) that startups or folks who are doing truly like, wow, never-seen-it-before innovation get.

But when we introduced Copilot in March of last year, for whatever reason, we had enough lightning in a bottle that we were able to create some real sense of buzz and interest, and we've just been able to ride that, frankly, with the land. So it's not that we're able to go, and I would say right now with any particular function. IT tends to be the place that we do best, but IT is not the place that makes the best business case right now. So we can start there, but we often quickly are moving out into support or into something like HR or another function that has a specific process.

Seth Rosenberg:

Yeah. And when you look at engagement across the deployed organizations with Copilot across all the different applications as well as the more general assistant, what does engagement look like today?

Jared Spataro:

Copilot and Teams is absolutely the most engaged surface. It's the place where it sings the most, and technically that's because it's just good. We ground the LLM on a Team's meeting transcript, whether that's in the meeting or post meeting, and it's very accurate and people are often amazed. Because we're right up the alley of what the LLM, what the GPT foundation model can do. So I attend fewer meetings these days for sure, because I can just ask, "Hey, what'd my boss say about this? Did he mention me at all? Did I get any action items?" It's impressive. So Teams is top.

Next one is Outlook. That's just simply because as people start to use that, summarize this long email thread button, it becomes addictive. You're like, you don't want to do email without it. I use it to write emails a little bit less. Because I can dash off emails pretty quickly.

But what I realized with Copilot is, wow, I spent a bunch of time in my inbox trying to absorb the information efficiently and Copilot helps me out. Then the next place ends up for us being chat essentially. So that chat interface I talked about, but it takes us some work to get people to see that use case. And then Word is right up there in the top three, four depending on what customer we're in because people do a lot of either content consumption where they can get summaries or content creation where they're using it to generate stuff. So those are the top ones. We're still working frankly on Excel and PowerPoint. The capabilities of those are coming along. They're a lot more sophisticated surfaces, command surfaces for us.

Seth Rosenberg:

Yeah, that's fascinating. I'm curious to double click on a few of these. So on Outlook, what do you think the end state is for AI enabled email? Does this just create an explosion of emails for us? Does it create a scenario where our bots are just communicating with each other? Where does this go?

Jared Spataro:

Yeah, that was always the worry. I craft an awesome Copilot written mail. You don't even read it because Copilot summarizes it for you. Then you make it longer, etc. In reality, I feel like as I am watching the project evolve, I am learning that email is an interesting medium because there's so much information assimilation that happens in that communication. There's a lot where people are not actually communicating as much as just trying to keep people in the know, in the loop.

So I find that I'm starting to do things like come in the morning and say, "Hey, over the last 10 hours, what has happened across my email? Any meetings I chose not to go to, but I didn't or I was invited to but I didn't, messages etc. Get me up to speed essentially on communications that would've come my way." And that's starting to help me understand, maybe I can imagine a post email world (a little bit), where it's not like email as a communication medium will stop, but I'll start using Copilot to kind of digest the information more than going directly to an inbox. So I think we'll see some of that happen. Certainly the users who I know who are using Copilot the most, that's kind of some of the patterns that they are developing at this point. But email people have been predicting that death of email for a long time. I'm not going to pile onto that.

I can imagine a post email world - a little bit. It's not like email as a communication medium will stop, but I'll start using Copilot to digest the information more than going directly to an inbox.

Seth Rosenberg:

Yeah, that's super interesting, the scenarios where the AI assistant actually abstracts the underlying applications.

Jared Spataro:

Yes.

Seth Rosenberg:

And I'm curious to learn more about the types of usage you're seeing in AI Copilot in terms of what queries are people asking? What are the use cases in the general chat assistant?

Jared Spataro:

They start out as general information retrieval patterns. Essentially people start to say, "Well, can you help me find that email sent to me by Seth, around about two weeks ago, and here's kind of what it was talking about." And they are often amazed like, wow, that was great. I didn't have to essentially pick the three words that would differentiate that email from something else in search, kind of how we've been trained. So information retrieval tends to be the first place people go.

Then the next set, as people get a little bit more sophisticated, they realize they can do information retrieval and synthesis or manipulation. So a great prompt, a great query would sound something like, "Over the last week, take all the emails from my boss, put them in a table. At the top of the table, put the ones that were addressed just to me and that have actions. I want three columns in the table, the title of the email, a summary of the email, and a summary of any action for me to take."

And that's a pretty sophisticated query, but it's like magic. You run something like that and people all of a sudden are like, wow. So you move again from information retrieval to some sort of manipulation. And then right now the state of the art is those people who have moved from there, start to move to a little bit of personal automation, task automation. So it sounds something like this, "Hey, Seth wasn't able to attend our stand-up meeting on Friday. Will you grab the transcript from that, summarize it, pull out the action items specifically for him, put it into an email for me and prep it so that I just have to hit send?"

And that set of things is very doable. It can do that today. And when people realize, oh, it's not just retrieval, it's not just manipulation, I actually can get some tasks done, then it starts to sing for them. Then what we see happen, Seth that's so interesting for me is they actually start to change their daily workflow, how they're approaching their work. And that's a pretty exciting moment for us because man, changing people's habits, it was only the pandemic that actually had there be more chat users than email users. So you know something has to be big often to change people's habits.

Seth Rosenberg:

Yeah, that's fascinating. And so in the product, are you actually able to save these workflow automations?

Jared Spataro:

Not yet, but it is something that we're working on. In fact, it has been our users that have led us to this place where they're starting to say, "Wow, I want to do two things. I want to essentially save them. I also want to proactively automate them." So people are starting to come up with my morning prompts and they're like, "Jared, why can't I run my morning prompt automatically 15 minutes before I know I'm going to log in?" Which starts to get totally fascinating. You're really getting into personal automation in a way that I wouldn't have guessed a year ago.

Seth Rosenberg:

And eventually the AI can learn from the system what the best morning prompts are and actually tell you what prompt, which is amazing.

Jared Spataro:

That's right. Yeah.

Seth Rosenberg:

So maybe moving on, I'm curious to learn a little bit about what's going on under the hood with Copilot. Which foundational models are you using? What does the technical architecture look like?

Jared Spataro:

Sure. In basic terms, I think of it this way. When I draw it out for customers, I say, "Hey, you have an end user who is consuming it through some sort of interface. Then you have an orchestration layer in the middle. That orchestration layer is incredibly important." I'll come back to that in a moment. And after it's done, its machinations, its orchestration, it's going to shoot essentially in simplified terms, the prompt together, likely with a context window, some additional information. It's a modified prompt too. There's no doubt about it. There's engineering kind of prompt engineering happening. To the LLM, the LLM will process, reason over it and kind of shoot an answer back. In reality, the way it functions is a little bit more sophisticated in that there are multiple calls to the underlying LLM today, we use GPT-4 Turbo for our underlying LLM. We also though, interestingly enough, use other models for some of the orchestration work that we do.

So the pre-processing of intent, those types of things. The most interesting piece of work I think really happens right now in the orchestration layer, and it is in the orchestration layer that we're essentially dissecting the prompt for intent. And we're also doing things like sending off a RAG query, so a retrieval augmented generation grounding query. Typically, right now, we're doing that through our underlying enterprise search. So we use the search engine to kind of go out, and frankly for everybody here who's working on this, you'll know that the LLMs are good enough that if you give a context Window with the information it would need to solve the problem. And it comes back with an impressive answer every time. However, if your search brings back a context window that doesn't have the right information, which is easier to do than it sounds, the poor LLM is left trying to figure out how in the world do I answer this question or solve this problem with this package of info that doesn't look like it's applicable.

So that orchestration layer and tuning the right search is more important and has been more difficult than we thought. In fact, a lot of what we spent over the last couple of the last months, 12 months has been refining what happens there. And then finally, one of the really important things we're doing is we're finding a way to crack open that orchestration layer and allow third parties, this is where the opportunity comes in for startups and our customers themselves to actually catch certain intents and do specific things with them.

And so the easiest example I have for this that always kind of resonates is when you use Copilot and you're going to ask about something like numbers, literally official finance numbers, oftentimes Copilot can go look across your email where you have reports that are summarized or even decks or Excel spreadsheets that people have sent to you, you have access to.

But when it comes to something like that, that's so official, you have an official reply, you have an official single source of truth, and you want to kind of catch that intent and then send it to SAP or whatever your system of choice happens to be. So we have a way to go into that orchestration layer with a product called Copilot Studio and catch those intents, and then take those intents and tell Copilot what to do with them, send it to this system to get the right information. Sometimes there's a multistep process that's required. So there's so much richness that we've built up over the course of the last 12 months in that basic frame. We're just really excited because we're starting to see it really sing. It hangs together, and it's a system that again, creates opportunities we think for the ecosystem to go and build on.

The most interesting piece of work really happens right now in the orchestration layer;  we're essentially dissecting the prompt for intent. That layer and tuning the right search is more important (and has been more difficult) than we thought. We've spent a lot of the last 12 months refining what happens there.

Seth Rosenberg:

Yeah, that's super interesting. You mentioned something about using a different LLM at the orchestration layer to parse through intent. Can you say more about that?

Jared Spataro:

Sure thing. We have our own LLMs that we've developed over the years here at Microsoft. We're continuing to develop them. Our latest one was called. It's actually an SLM. So it's a small language model, not a large language model. And as you can imagine, part of what we're doing at Microsoft is trying to figure out how to run these things very economically. So we don't have to make a call, every call is to GPT-4 Turbo. There's a lot of work we can do to cheaper, more efficient, sometimes faster models when we're trying to parse through intent or trying to get the right search terms that we would use to execute a query for the context window.

So there's a bunch of magic there. We don't go into the details of exactly how it works, but from a pattern perspective for our audience today, suffice it to say that one of the ways that we can optimize is by kind of chopping up the work that needs to be done in that orchestration layer and choosing the right LLM based on cost performance.

Seth Rosenberg:

Yeah. That makes total sense. And I also want to double click on your comment around cracking open the orchestration layer for developers and for startups. It's an interesting layer to be building in because in some ways, I think for some developers, it feels like you're building on top of quicksand because the underlying models are developing at such a rapid pace and they're consuming some of the use cases that used to be at this orchestration layer or enabling layer. So I'm curious, if you were to break down the hardest problems and also the problems that you think are going to endure at that layer, what do you think are the most interesting opportunities for startups?

Jared Spataro:

Yeah. I'll just hit a couple in no particular order. One of the opportunities we see is an opportunity for new skills we would call it. So the easiest way to illustrate this is today ChatGPT and even Copilot fronts, the DALL-E model. So you can do image generation, but if you know what's going on there, you're essentially catching the underlying prompt in ChatGPT, but you end up parsing it and throwing it over to DALL-E who's going to produce the image. And that is a basic pattern, I would say, of having a front door through whatever orchestration layer you choose and pushing something off to a more specialized skill. There's a ton of opportunity for more specialized skills out there as they relate to business domains. You can think of specialized skills for finance. You can think of specialized skills as you go into things like sales and marketing.

You definitely can think about specialized skills associated with the science of product development in various manufacturing industries. Lots and lots of really good work to do there. Then as we look kind of a little bit deeper into some of the elements that the whole industry is... I wouldn't maybe say wrestling, but it's grappling with right now, it includes things like how good the underlying model is at reasoning, and then we would call it planning associated with that.

That is often now starting to happen with the reasoning below in the LLM. And the planning starting to happen a little bit more in orchestration, where you essentially take a very sophisticated task and you ask combination of the orchestration layer and the LLM to break it down into tasks and then to register those tasks in some sort of long-running format, perhaps much longer than a single interaction with a user.

And to allow the agent, if we'll call it that, to execute those steps, either sequentially or in some sort of recursive way so that it finally gets to the answer. And if I just make this simple, you can imagine when you ask maybe in a specific application, something like working with chemicals, you might have to do some math, you might have to actually do some chemical synthesis. You might have to look at proteins. There might be a bunch of different tasks that you would break down, and you have to make sure that there's no, what we call drift as you work through those tasks so that the actual agent itself isn't getting lost in the sauce, if you will. Tons of innovation required there in planning and execution of plans. And that leads to one of the last ones that's top of mind for us of what we call memory.

Today, there isn't really a good facility in the state of the art, or at least in the architecture, that allows these agents, whether they be Copilot or something from any other competitor out there, to have kind of a long-running memory about what it's executing right now, what it's learned through execution, what it knows about you as a user, what it's learned about the domain, is it's solved problems. Almost everything that happens is entirely context-free beyond the context window. So memory and the way of creating, storing, and applying memory is a really interesting domain. And you put those things together, bam. You're going back to again, the premise for what we're talking about, tons of opportunity, I think for startups who want to innovate. There's just tons of opportunities that I see.

Seth Rosenberg:

Yeah, that's super interesting. And this goes back to that spectrum of the initial versions of these AI products where you give them a prompt and you get a paragraph or in return all the way through to planning and reasoning where you actually just give them a full task and they're able to actually execute work for you.

Jared Spataro:

That's right.

Seth Rosenberg:

These three things are interesting, which is reasoning, planning and memory. What do you think is the biggest technical barrier to get fully to these autonomous agents? And how far away do you think we are (or are we there already) in terms of autonomous agents that really work at the quality level of a human?

Jared Spataro:

Yeah. I'll take your second question. I don't think we're quite there yet, but we are seeing glimmers across the industry, certainly not just with Microsoft, of people being able to do this in well-scoped domains, in particular with coding. We've seen some of it out there, and I think our listeners will know some of those that have been cited. And so I would say that my guess would be by the end of this calendar year, so a couple of months from now, we're going to see some that are truly amazing where we're going to feel like, wow, that rivals a human's ability in this domain, supply chain, coding, things like that to take a problem, break it down into tasks, execute on those things, come up for error when it needs direction or it feels like it doesn't know what to do next, and then-

Seth Rosenberg:

Yeah, it's absolutely amazing. Yeah, the first time I used one of these agents that you're referring to like cognition, I guess I could say it more freely as one of the leading kinds of agents in the software development space. It's just crazy, right? Where just even the interaction model of giving an AI a task and then going for coffee and coming back 45 minutes later and they've made progress and they're asking you questions. It's really fascinating.

Jared Spataro:

You're right, there's innovation to be done on the interaction model itself. So I think that we're going to see that at some scale, not scale where everybody's using them, but at some scale beyond just the earliest adopters this calendar year. And then if I go to what are the technical hurdles? Well, there just hasn't been an industry adopted, recognized standard for things like how do you store what we would call a plan? How do you take that idea of reason to come up with a plan, then do plan and execution, keep that plan and learn on that plan, and essentially kind of use memory to both execute and get smarter over time. All those things I think right now are subject to innovation. I think we'll see spikes of innovation and then as we often do in this industry, we see kind of things settle into dominant design, and I bet you we'll see dominant design on that towards the end of this calendar year in a bunch of those places.

Seth Rosenberg:

And do you think that planning engines will be built by the large models, or do you think that's a distinct capability?

Jared Spataro:

I'll give you my personal opinion, right at the moment.

What I have seen is that we all kind of feel like the models do a fine job. I have the most experience with GPT-4 obviously, but we sure wish they were more powerful. So I think there's a little bit of a race between what you might think of as specialized systems and services that would do that versus the continued drops of those LLMs, like the models getting progressively stronger. And I think if we had everybody in the industry who is state-of-the-art at this moment, I bet you there would be a disagreement of some people saying, "Hey, give it time. The models themselves in a couple of months are going to start to get so good that you won't need a specialized service." And some people would say, "Hey, just like the brain has an executive function upfront on the prefrontal cortex that does this type of work, you should have that in your system too." So I don't know. I don't feel like I've seen enough evidence one way or the other.

Seth Rosenberg:

Yeah, that makes sense. This has been a fascinating discussion. Maybe to close it off, I'm curious if we fast-forward five years, what does the world look like? What's the most optimistic take in terms of how our lives change with AI?

Jared Spataro:

Well, I'm really focused on AI at work, so I'll kind of put us within that domain for a moment. The way I see it is building, working with, optimizing, overseeing agents... If we use the industry term, we would call them co-pilots, it will become a skill at least as important as working with people, as collaboration with people. I anticipate that you will see departments, teams, groups that are very naturally composed of some people and lots of co-pilots is the way that Microsoft would say it. And those co-pilots will perform various functions. They will often be very autonomous. Some of them will operate in near real time. I ask a question, you help me to get to the answer and a solution. Others will be, as you indicated, much more autonomous in going off to do the work and come back when they need help. But the interesting thing for me is I think that presents us, if I get very practical for a moment with two things to think about.

Number one, we have existing organizations today, firms today, that are built on what I would call kind of an organizational paradigm that comes out of the 1940s and '50s. It was built all around people. We've introduced information systems, but those have been largely to augment people, people pull on those information systems. So I think for existing organizations, there will be a fundamental, great rewiring is what I might call it. It will really be an AI driven kind of business process, re-engineering wave about, gosh, this is how finance was done in the olden days. How do we do finance today? And you can take every function.

But that begs another question that to me might even be even more interesting, certainly for this audience, which is, well, wait a second, what about AI native firms? Does this mean that they can be built from the ground up just like cloud native firms were back in the day?

Starting with the new premise, do you really need a full HR department? Do you really need a full finance team? Do you really need a full sales team the way we used to think about it? And undoubtedly, there will be lots of room for people, so I don't think five years out we don't need people, but I think that the structure of the firm, even certainly the workflows of the firm, even the culture of a firm will be very different. And it gives an opportunity, it lowers the barriers to entry for very innovative, inventive driven people, I think, to go change the world. So I'm very excited about that. Because if we can unleash people, today, there's a huge barrier to entry to get to scale, and we call it the organization of the firm. If we can unleash that faster and at a lower threshold, I think we will have done something really great for the world.

[AI lowers the barriers to entry for very innovative, inventive, driven people to go change the world.

Seth Rosenberg:

That's amazing.

Sorry, I can't leave it at that. So the fully autonomous firm – that is populated by an army of agents or co-pilots, what's the dimension of differentiation and competition in that world? Because is it recruiting the agents? Is it building the agents? Is it the creativity of how to use them? Is it data? How do people differentiate and win in that world?

Jared Spataro:

Well, I would love to talk with you more or some of our listeners more about it. We're working with economists. We're trying to envision that future. I think in some ways, Seth, it comes down to what's the dimension of differentiation for the sector, the industry in which you're working. I hope, I think that it will start to center more on that vector of differentiation than on some of the other things-

Seth Rosenberg:

The table stakes motions of building businesses.

Jared Spataro:

Exactly. So if we go into pharmaceuticals, we would say, well, one of the things that should have impressed all of us was how quickly the new drug development process happened for COVID. Wow, something happened there that was amazing. What if you could get rid of all of those advantages of scale that the big players had, and you just said, "No, it's all based on the IP of we have a problem, how could we sell it or solve it?" That to me, is my dream.When I think of AI at work – and what me, Jared is trying to do – I'm trying get us closer to that.

And then in other parts of the economy, it would be different. If you look at automobiles, et cetera, we would say, okay, it seems like today when you look at electric cars, a lot of the innovation would be on things like batteries and things like charging times and et cetera, maybe even on distribution of charging stations. But I think we would get closer to that than the economies of scale that traditionally over the last 40, 50 plus years have kind of created these interesting structural barriers to innovation. And that is totally fascinating to me.

Seth Rosenberg:

I love that. Getting closer to the real work rather than the work of doing work.

Jared Spataro:

Yes.

Seth Rosenberg:

So Jared, this has been a fascinating discussion. Really appreciate you taking the time, and thanks for moving the industry forward and building and leading the way at Microsoft.

Jared Spataro:

My pleasure. Thanks for having me, Seth.