WEBVTT

1
00:00:00.800 --> 00:00:04.600
Ah, well, the traditional problems that you have. I think

2
00:00:04.600 --> 00:00:07.920
with every piece of software that you're trying to

3
00:00:07.920 --> 00:00:10.640
productionize, it's fairly easy to build something

4
00:00:12.080 --> 00:00:15.840
that works, that solves a problem. But as soon as you

5
00:00:15.919 --> 00:00:19.320
put it out into the world, you have a few things. First of all, you

6
00:00:19.320 --> 00:00:23.080
need to make sure it's secure. Second of all, you need to make sure that

7
00:00:23.080 --> 00:00:26.880
it scales. Because if by all means I I hope for you

8
00:00:26.880 --> 00:00:30.560
as a startup it actually works and people enjoy using it,

9
00:00:30.640 --> 00:00:34.400
the next step really is how does it scale? How does it not fall

10
00:00:34.400 --> 00:00:38.200
apart under the load of people wanting to

11
00:00:38.200 --> 00:00:41.880
try? And the third thing

12
00:00:41.880 --> 00:00:45.680
really is observability, being able

13
00:00:45.760 --> 00:00:49.360
to really get telemetry, look

14
00:00:49.360 --> 00:00:52.250
into what's actually happening. Foreign.

15
00:01:02.090 --> 00:01:05.730
YouTube blog covering the German startup scene with

16
00:01:05.730 --> 00:01:09.210
news interviews and live events,

17
00:01:11.690 --> 00:01:15.370
AWS is proud to sponsor this week's episode of startup raid

18
00:01:15.370 --> 00:01:18.970
IO. The AWS startup team compromises

19
00:01:19.360 --> 00:01:22.640
former founders and CTOs, venture capitalists,

20
00:01:22.880 --> 00:01:26.720
angel investors and mentors ready to help you prove

21
00:01:26.720 --> 00:01:30.240
what's possible. Since 2013, AWS

22
00:01:30.480 --> 00:01:34.200
has supported over 280,000 startups across the

23
00:01:34.200 --> 00:01:36.800
globe and provided US$7 billion

24
00:01:38.120 --> 00:01:41.920
in credits through the AWS Activate

25
00:01:41.920 --> 00:01:45.440
Program. Big Ideas Feel at home at

26
00:01:45.440 --> 00:01:49.000
aws with access to cutting edge technologies

27
00:01:49.160 --> 00:01:52.920
like generative AI, you can quickly turn those ideas into

28
00:01:52.920 --> 00:01:56.520
marketable products. Want your own AI

29
00:01:56.520 --> 00:02:00.200
powered assistant? Try Amazon Q. Want to

30
00:02:00.200 --> 00:02:03.480
build your own AI products privately?

31
00:02:03.480 --> 00:02:07.160
Customize leading foundation models on Amazon Bedrock.

32
00:02:07.240 --> 00:02:10.840
Want to reduce the cost of AI workloads?

33
00:02:11.000 --> 00:02:14.770
AWS Trainium is the silicon you're looking for.

34
00:02:15.330 --> 00:02:19.090
Whatever your ambitions, you've already had the idea.

35
00:02:19.570 --> 00:02:22.690
Now prove it's possible on AWS. Visit

36
00:02:23.170 --> 00:02:25.730
aws.Amazon.com

37
00:02:26.450 --> 00:02:29.970
startups to get started. So you build a chatbot.

38
00:02:29.970 --> 00:02:33.650
Cool. But now your team is stuck wondering how to connect it

39
00:02:33.650 --> 00:02:37.250
to real APIs, make it reliable or roll it out

40
00:02:37.250 --> 00:02:40.930
to thousand users. That's where this episode comes in.

41
00:02:41.010 --> 00:02:44.850
AWS Dennis Troup walks us through how

42
00:02:44.850 --> 00:02:48.370
to productionize AI securely at

43
00:02:48.370 --> 00:02:52.090
scale and without breaking your app. We

44
00:02:52.090 --> 00:02:55.770
dive into agentic workflows, MCP

45
00:02:55.930 --> 00:02:59.690
and how real startups go from MVP to market.

46
00:03:00.170 --> 00:03:03.770
Let's see in our episode today in cooperation with aws.

47
00:03:04.170 --> 00:03:07.850
Dennis Straub is a developer advocate at aws. Focus

48
00:03:07.930 --> 00:03:11.660
on on helping startups and enterprises to take their

49
00:03:11.660 --> 00:03:15.260
gen experiments into real world deployment. With

50
00:03:15.260 --> 00:03:18.620
decades of experience in secure infrastructure and

51
00:03:18.620 --> 00:03:22.380
developer productivity, Dennis has helped teams across

52
00:03:22.460 --> 00:03:25.820
industries automate, integrate and scale their

53
00:03:26.060 --> 00:03:29.900
projects using modern cloud native tools.

54
00:03:30.220 --> 00:03:33.700
Today we talk about model context, protocol, agent based

55
00:03:33.700 --> 00:03:37.300
architecture and what it really takes

56
00:03:37.300 --> 00:03:40.980
to make AI Work in production Dennis, welcome back

57
00:03:40.980 --> 00:03:44.580
to celebrate. O oh thanks,

58
00:03:44.660 --> 00:03:48.460
thanks for having me again. Totally my pleasure. For everybody who's

59
00:03:48.460 --> 00:03:52.139
not aware of this, this is number two of a series of two

60
00:03:52.139 --> 00:03:55.900
interviews. But actually one of your colleagues would join us as well.

61
00:03:55.900 --> 00:03:59.740
So in total we have four interviews with you guys

62
00:03:59.740 --> 00:04:01.700
here. Dennis

63
00:04:02.670 --> 00:04:06.470
Productionizing AI Tell us about it.

64
00:04:06.470 --> 00:04:09.790
What's the hard truth? What's the biggest gap you see between

65
00:04:09.870 --> 00:04:12.910
Genai prototypes and action production systems?

66
00:04:15.310 --> 00:04:18.750
Well, the traditional problems that you have, I think with

67
00:04:18.910 --> 00:04:22.510
every piece of software that you're trying to productionize,

68
00:04:22.750 --> 00:04:26.190
it's fairly easy to build something that

69
00:04:26.190 --> 00:04:29.930
works, that solves a problem. But as soon as you put

70
00:04:29.930 --> 00:04:33.210
it out into the world, you have a few things. First of all you need

71
00:04:33.210 --> 00:04:37.050
to make sure it's secure. Second of all, you need to make sure that it

72
00:04:37.050 --> 00:04:40.890
scales. Because if by all means I hope for you as

73
00:04:40.890 --> 00:04:44.730
a startup it actually works and people enjoy using it, the

74
00:04:44.730 --> 00:04:48.130
next step really is how does it scale? How does it not fall

75
00:04:48.130 --> 00:04:51.770
apart under the load of people wanting

76
00:04:51.770 --> 00:04:55.350
to try? And the third

77
00:04:55.350 --> 00:04:59.190
thing really is observability, being

78
00:04:59.190 --> 00:05:02.670
able to really get telemetry,

79
00:05:02.910 --> 00:05:06.750
look into what's actually happening. Does it do what it's supposed to

80
00:05:06.750 --> 00:05:10.350
be doing? Am I running risks in

81
00:05:10.350 --> 00:05:13.950
terms of, for instance, running up a large bill

82
00:05:14.110 --> 00:05:17.710
because it does something that I didn't expect it to do?

83
00:05:19.550 --> 00:05:22.750
You may run into edge cases that you didn't have in your

84
00:05:22.830 --> 00:05:25.710
prototyping environment. So the traditional problems

85
00:05:26.710 --> 00:05:30.470
that you usually have when you're building something that was

86
00:05:30.470 --> 00:05:33.670
small and you put it into the market, the second,

87
00:05:34.550 --> 00:05:37.590
second thing that very often happens is as well is

88
00:05:38.550 --> 00:05:42.390
you. Most, most pieces of software are not an island.

89
00:05:42.470 --> 00:05:46.190
They have to connect to something. They have to connect to third party APIs or

90
00:05:46.190 --> 00:05:49.870
to your own APIs, or to customers, internal APIs

91
00:05:49.870 --> 00:05:53.720
and data sources. And that can be hard as well because

92
00:05:53.720 --> 00:05:57.400
how do you make sure that the agent then first of

93
00:05:57.400 --> 00:06:00.960
all securely connects to these APIs and

94
00:06:00.960 --> 00:06:04.680
second, doesn't mess with them, doesn't

95
00:06:04.680 --> 00:06:07.960
do anything that it's not supposed to be doing. That is something that you really

96
00:06:07.960 --> 00:06:11.760
need to look at. As soon as you run into production production situation, you

97
00:06:11.760 --> 00:06:14.800
will most likely have security, scalability

98
00:06:15.440 --> 00:06:18.400
and connectivity issues with

99
00:06:18.800 --> 00:06:22.310
internal and third party tools. Talking a little bit here about model

100
00:06:22.310 --> 00:06:26.110
context protocol is getting a lot of traction in

101
00:06:26.110 --> 00:06:29.750
AI engineering world. What is it and what does

102
00:06:29.750 --> 00:06:33.510
it matter now? One of the big questions that everybody had in

103
00:06:33.510 --> 00:06:37.310
regards to LLMs, to language models when they came out a few years ago,

104
00:06:37.470 --> 00:06:39.790
really was there

105
00:06:41.230 --> 00:06:44.870
a certain amount of unreliability especially when it

106
00:06:44.870 --> 00:06:48.130
comes to them producing facts.

107
00:06:49.250 --> 00:06:52.850
These models have been trained on an incredible

108
00:06:52.850 --> 00:06:56.450
amount of data, but they don't really understand the data.

109
00:06:56.530 --> 00:06:58.530
So by the nature of how these models work,

110
00:07:00.210 --> 00:07:03.850
they don't really know whether what they're saying is

111
00:07:03.850 --> 00:07:07.650
actually true or not. They just look

112
00:07:08.130 --> 00:07:11.810
whether it matches a certain pattern that they have

113
00:07:11.810 --> 00:07:15.330
seen quite often. And that's a really big limitations because in many

114
00:07:15.330 --> 00:07:19.180
applications, in many use cases it's really important to work with

115
00:07:19.180 --> 00:07:22.940
factual data and not with assumptions or with, with

116
00:07:24.540 --> 00:07:27.340
made up information that

117
00:07:28.780 --> 00:07:32.420
doesn't actually, doesn't actually represent the facts. That can be

118
00:07:32.420 --> 00:07:36.100
pretty dangerous. So most, most people in

119
00:07:36.100 --> 00:07:39.020
companies who've been working with LLMs, especially in their early days,

120
00:07:39.740 --> 00:07:42.460
ran into this problem. Well, it doesn't really work because

121
00:07:43.760 --> 00:07:46.880
it isn't really reliable in what it says, it makes up data,

122
00:07:47.360 --> 00:07:50.880
it confuses things and so forth. So

123
00:07:51.520 --> 00:07:55.080
there's basically two ways to solve this

124
00:07:55.080 --> 00:07:58.160
problem. One way is

125
00:07:59.520 --> 00:08:03.160
adding more information to the model, fine

126
00:08:03.160 --> 00:08:06.680
tuning the model, giving it access to a

127
00:08:06.680 --> 00:08:10.320
vector store so that it can do semantic search to retrieve

128
00:08:10.320 --> 00:08:13.920
actual data to, to basically either

129
00:08:13.920 --> 00:08:17.560
double check against actual data or just

130
00:08:17.560 --> 00:08:21.160
use this data as part of its process, which is called

131
00:08:21.160 --> 00:08:24.960
RAG retrieval, augmented generation. And I

132
00:08:24.960 --> 00:08:27.640
don't want to go too deep into rag and I have a very

133
00:08:28.360 --> 00:08:32.120
specific opinion of rag, not a bad one, but what it actually

134
00:08:32.120 --> 00:08:35.240
is, I don't look at it from the perspective of a data scientist.

135
00:08:35.800 --> 00:08:39.559
Actually I think RAG is the

136
00:08:39.559 --> 00:08:43.399
second thing or belongs to the second thing that we can do which is

137
00:08:43.399 --> 00:08:46.999
basically connecting the AI to the real

138
00:08:46.999 --> 00:08:50.599
world add runtime. So

139
00:08:50.679 --> 00:08:54.439
at the moment in time when I actually use

140
00:08:54.519 --> 00:08:58.039
the model, pre training,

141
00:08:58.279 --> 00:09:01.919
fine tuning and these things, they all happen before

142
00:09:01.919 --> 00:09:05.200
I deploy the model into production and everything

143
00:09:05.440 --> 00:09:09.280
after that happens when the model is in production, when my

144
00:09:09.280 --> 00:09:12.720
application runs and RAG tool use

145
00:09:13.440 --> 00:09:16.880
and other mechanisms basically solve this problem,

146
00:09:17.280 --> 00:09:21.000
connecting the AI to the real world. And that could be a database,

147
00:09:21.000 --> 00:09:24.600
that could be a vector store with a semantic search, but that could also be

148
00:09:24.600 --> 00:09:28.400
something where the AI can actually act like call an

149
00:09:28.400 --> 00:09:32.160
API to trigger a process or to send an email

150
00:09:32.400 --> 00:09:36.110
or, or do things like that. And the challenge

151
00:09:36.110 --> 00:09:39.790
was that every API works differently. Every

152
00:09:39.790 --> 00:09:43.590
API has a different authentication mechanism, every database has a slightly different

153
00:09:43.590 --> 00:09:47.430
SQL dialect and so forth. So all the tools

154
00:09:47.430 --> 00:09:50.790
that I wanted to use, starting from a simple

155
00:09:50.790 --> 00:09:54.350
calculator, all the way to a weather API or to

156
00:09:55.150 --> 00:09:58.990
a third party SaaS, providers API or anything, they

157
00:09:58.990 --> 00:10:01.190
all have their proprietary

158
00:10:02.230 --> 00:10:05.950
API. So it was my task as a

159
00:10:05.950 --> 00:10:09.750
developer to basically manually

160
00:10:09.830 --> 00:10:13.510
code the connectivity the connection to all of these APIs

161
00:10:13.590 --> 00:10:17.190
to be able to get the data or send a request or do whatever

162
00:10:17.190 --> 00:10:20.150
to get it into the model or to have the model act on my process.

163
00:10:21.110 --> 00:10:24.950
MCP Model Context Protocol is an open source

164
00:10:25.750 --> 00:10:29.210
protocol nad has been

165
00:10:29.210 --> 00:10:32.730
published and recommended by Anthropic. It's being widely adopted

166
00:10:32.730 --> 00:10:36.530
by Google, Amazon, Microsoft and many others.

167
00:10:36.850 --> 00:10:40.690
It looks like it's actually turning into an industry standard,

168
00:10:40.690 --> 00:10:43.890
which is a good thing because it simplifies

169
00:10:45.810 --> 00:10:49.410
that connection layer between the

170
00:10:49.410 --> 00:10:52.860
actual tool or API and

171
00:10:52.940 --> 00:10:56.380
your agent. It's a fairly

172
00:10:56.380 --> 00:11:00.220
simplified protocol that contains a few primitives like tools.

173
00:11:00.540 --> 00:11:03.860
So what can I actually do with that

174
00:11:03.860 --> 00:11:07.260
API as a model? That tool could be

175
00:11:08.140 --> 00:11:11.900
a browser engine, could be a runtime for me to run

176
00:11:12.700 --> 00:11:16.420
code in a sandbox environment, could be an API to a third

177
00:11:16.420 --> 00:11:18.060
party provider, could be

178
00:11:20.250 --> 00:11:23.850
a piece of code that I wrote myself, could be basically everything. And with mcp,

179
00:11:23.850 --> 00:11:27.690
you're able to just wrap this as a so called server,

180
00:11:28.330 --> 00:11:32.050
it's an MCP server. And then you can use the client

181
00:11:32.050 --> 00:11:34.650
library of the SDK inside of your own application

182
00:11:36.490 --> 00:11:38.970
to just connect to the server and

183
00:11:42.250 --> 00:11:46.050
include it into your agent so that you don't have to rebuild

184
00:11:46.050 --> 00:11:49.670
it. For every single API that you to want to use, that's, that's mcp.

185
00:11:49.670 --> 00:11:53.270
It has a few more primitives like it's able to send

186
00:11:53.270 --> 00:11:56.750
notifications to the client, it's able to provide

187
00:11:56.830 --> 00:12:00.670
resources, static resources to the provider, and a few more.

188
00:12:00.990 --> 00:12:04.630
But the most important and most interesting use case for most

189
00:12:04.630 --> 00:12:08.030
AI applications really is the fact that it can expose

190
00:12:09.310 --> 00:12:12.750
its capabilities or the capabilities of the underlying

191
00:12:12.750 --> 00:12:16.550
API as tools that the LLM can

192
00:12:16.550 --> 00:12:20.150
understand and use. You said

193
00:12:20.150 --> 00:12:23.910
before that AI systems are only as powerful as their

194
00:12:23.910 --> 00:12:27.470
connections. What does it take to connect models to real world

195
00:12:27.470 --> 00:12:31.310
APIs or workflows? I'm not sure how to answer

196
00:12:31.310 --> 00:12:35.150
this question. You need, you need. It

197
00:12:35.150 --> 00:12:38.270
takes a few things on a few different levels. Well, first of all, you should

198
00:12:38.270 --> 00:12:41.670
be aware of what you want to connect to

199
00:12:41.990 --> 00:12:44.950
and also the implications of

200
00:12:45.450 --> 00:12:49.290
connecting to these things. For instance, if you connect to your,

201
00:12:49.450 --> 00:12:53.210
to your email account, you give the model effectively access

202
00:12:53.290 --> 00:12:56.930
to all your data, possibly including pii, most

203
00:12:56.930 --> 00:13:00.450
likely including sensitive and personally identifiable

204
00:13:00.450 --> 00:13:03.690
information. That is something that you just need to be aware of.

205
00:13:04.170 --> 00:13:07.770
So once you connect your AI to this system and

206
00:13:07.850 --> 00:13:11.580
connect it to another system, you need to

207
00:13:11.580 --> 00:13:15.220
be aware of the fact that technically these two connected,

208
00:13:15.540 --> 00:13:19.060
these two systems are connected through an intermediary,

209
00:13:19.700 --> 00:13:23.260
but they are effectively connected. That is something. So it

210
00:13:23.260 --> 00:13:26.100
takes, it takes thinking about

211
00:13:27.380 --> 00:13:31.180
what do I Connect and do I really want

212
00:13:31.180 --> 00:13:34.260
these things to be connected with each other?

213
00:13:34.820 --> 00:13:38.660
We're talking about boundaries, we're talking about service isolation,

214
00:13:39.380 --> 00:13:43.200
something that we've been talking about software architecture for a very long time time,

215
00:13:43.840 --> 00:13:47.520
service isolation, just to make sure that a

216
00:13:47.520 --> 00:13:51.240
service which could be an AI agent only does what it's

217
00:13:51.240 --> 00:13:54.080
supposed to do and doesn't accidentally

218
00:13:54.480 --> 00:13:56.880
expose information

219
00:13:59.120 --> 00:14:02.920
to something else, even though it shouldn't. And this is

220
00:14:02.920 --> 00:14:06.640
even more important with LLMs because LLMs are by

221
00:14:06.640 --> 00:14:10.160
nature not deterministic. You cannot just

222
00:14:10.240 --> 00:14:13.160
code the, the model so that it's

223
00:14:14.270 --> 00:14:18.030
doesn't cross this boundary. You could try, but it's really

224
00:14:18.030 --> 00:14:21.790
hard and I wouldn't, I wouldn't suggest you do that. So

225
00:14:22.030 --> 00:14:25.750
looking at this, let's say I have two, I

226
00:14:25.750 --> 00:14:29.510
have two systems and I need a connection between these two systems, one

227
00:14:29.510 --> 00:14:33.310
being my email, the other being maybe a chatbot

228
00:14:33.550 --> 00:14:36.830
that I provide to a customer as part of my website.

229
00:14:38.270 --> 00:14:42.030
I wouldn't want this to be one thing. I wouldn't want this to be one

230
00:14:42.030 --> 00:14:45.810
agent. I would want to separate these two, isolate these two. One could be

231
00:14:45.810 --> 00:14:49.410
an agent that talks to the customer through a

232
00:14:49.410 --> 00:14:52.810
chatbot interface that then realizes, well, we do what

233
00:14:53.050 --> 00:14:56.810
that classifies the use case. Well, is this a complaint or is this an

234
00:14:56.810 --> 00:15:00.250
order or is it a general inquiry? And then

235
00:15:01.930 --> 00:15:05.570
if it's a complaint, this

236
00:15:05.570 --> 00:15:09.290
agent may then send a message

237
00:15:09.290 --> 00:15:13.110
to a different agent that is responsible for

238
00:15:13.830 --> 00:15:17.670
processing customer complaints and does that. And

239
00:15:20.070 --> 00:15:22.390
they don't share data between each other.

240
00:15:24.150 --> 00:15:27.910
It's more of a handoff situation where the classifying agents

241
00:15:28.150 --> 00:15:31.990
says, well, I call the compliant agent,

242
00:15:32.230 --> 00:15:35.910
tell them there's a complaint from this customer, and this agent

243
00:15:35.910 --> 00:15:39.530
comes back with maybe, okay,

244
00:15:39.610 --> 00:15:42.250
thanks. Somebody's going to call you

245
00:15:43.450 --> 00:15:47.090
and tells this the actual, the agent that's talking to the customer. And the

246
00:15:47.090 --> 00:15:50.850
agent tells the customer, well, okay, somebody's going to call

247
00:15:50.850 --> 00:15:54.610
you or could you give me your email address or phone number? Would

248
00:15:54.610 --> 00:15:57.370
you like to be called or would you like to get an email? How do,

249
00:15:57.370 --> 00:16:00.810
how would you like us to contact you? So the

250
00:16:00.810 --> 00:16:04.410
negotiation with the customer and the negotiation with

251
00:16:04.410 --> 00:16:07.050
the CRM with access

252
00:16:08.410 --> 00:16:12.170
to sensitive information should be isolated. That is

253
00:16:12.170 --> 00:16:15.850
something that I would have in mind

254
00:16:16.090 --> 00:16:18.890
from the start if I wanted to do something like this.

255
00:16:20.010 --> 00:16:22.970
I'm not sure. Did I answer your question? Well,

256
00:16:23.370 --> 00:16:26.930
partially. I

257
00:16:26.930 --> 00:16:30.650
totally do understand that we are very, very early in this whole world

258
00:16:31.050 --> 00:16:34.870
of AI agents and how the systems work. And

259
00:16:35.030 --> 00:16:38.870
currently I, without like global standards, I do believe

260
00:16:39.270 --> 00:16:41.350
you can really completely

261
00:16:43.030 --> 00:16:46.630
answer that questions. But let's talk about

262
00:16:46.870 --> 00:16:50.390
Agentic workflows, how they are different from simple

263
00:16:50.549 --> 00:16:52.790
prompt chains or aig

264
00:16:53.750 --> 00:16:57.510
RAG pipelines. Sorry, RAG pipelines? Yeah.

265
00:16:59.430 --> 00:17:02.910
So rack pipelines are completely different. Beast rack

266
00:17:02.910 --> 00:17:06.560
pipelines themselves don't actually. Well, they actually involve

267
00:17:07.040 --> 00:17:10.120
AI in a certain way, but not in the way that we talk about it

268
00:17:10.120 --> 00:17:13.680
right now. RAC pipelines are more a

269
00:17:13.680 --> 00:17:17.400
data preparation. How do you prepare your

270
00:17:17.400 --> 00:17:20.560
data, your siloed information, your customer,

271
00:17:21.200 --> 00:17:25.000
your CRM, your product database, whatever you have? How do you prepare

272
00:17:25.000 --> 00:17:27.680
that so it can be used in a useful way

273
00:17:29.360 --> 00:17:32.540
by an AI? The other thing, prompt

274
00:17:33.420 --> 00:17:37.060
chains. Prompt chains can be part of an agent. So first of

275
00:17:37.060 --> 00:17:40.900
all, but very often you actually don't even need an

276
00:17:40.900 --> 00:17:44.380
agent. But a prompt chain could be enough. A prompt chain just being

277
00:17:44.620 --> 00:17:48.420
you have a fairly deterministic workflow where you send something to an

278
00:17:48.420 --> 00:17:51.860
LLM, you get a response, maybe you send a second prompt based on the

279
00:17:51.860 --> 00:17:54.540
response, and then at some point the thing is done.

280
00:17:56.870 --> 00:18:00.590
The way I look at it is I distinguish between three types of

281
00:18:00.590 --> 00:18:04.390
agentic applications, or not agentic, three types of

282
00:18:04.390 --> 00:18:05.270
LLM or

283
00:18:07.750 --> 00:18:11.110
generative AI enhanced applications. The first being

284
00:18:11.590 --> 00:18:15.390
non agentic. That are all the use cases

285
00:18:15.390 --> 00:18:18.550
where you send something to an LLM and the LLM

286
00:18:18.710 --> 00:18:22.550
responds and that's it. Or where you send something to the.

287
00:18:22.550 --> 00:18:26.390
Or a chatbot, you chat with the LLM, the

288
00:18:26.390 --> 00:18:29.710
LLM response and you have a back and forth between the person

289
00:18:30.670 --> 00:18:34.110
and the model. That's non

290
00:18:34.110 --> 00:18:37.710
agentic workflows, they can become quite

291
00:18:37.710 --> 00:18:41.270
complex and complicated, but they are mostly

292
00:18:41.270 --> 00:18:44.510
predetermined. Either they're a loop like in a chat,

293
00:18:44.830 --> 00:18:48.670
or there's a sequence of steps that needs to

294
00:18:48.670 --> 00:18:52.350
be done and this sequence is almost always

295
00:18:52.590 --> 00:18:56.430
the same or maybe has some decisions in between that you can

296
00:18:56.430 --> 00:18:59.070
do with traditional conditional

297
00:19:00.990 --> 00:19:04.790
steps. The second being. So the first of the three

298
00:19:04.790 --> 00:19:08.350
is non agentic. The second is agentic

299
00:19:08.350 --> 00:19:10.350
AI. This is where

300
00:19:12.910 --> 00:19:16.760
the AI, the model, actually makes

301
00:19:16.760 --> 00:19:20.520
decisions, plans. Does

302
00:19:20.520 --> 00:19:24.320
reasoning understands that it doesn't

303
00:19:24.320 --> 00:19:27.880
have all the information it needs, asks you about

304
00:19:28.200 --> 00:19:31.919
this information, or reaches out to one of the tools

305
00:19:31.919 --> 00:19:35.760
via mcp, the tools that it has available to

306
00:19:35.760 --> 00:19:39.400
get the information it needs for this specific use case, and is

307
00:19:39.400 --> 00:19:43.100
able to adapt the workflow based on the

308
00:19:43.100 --> 00:19:46.860
interaction, based on the available information, based on its

309
00:19:46.860 --> 00:19:50.420
own reasoning. So very often an agentix system

310
00:19:51.940 --> 00:19:55.460
starts by analyzing the

311
00:19:55.460 --> 00:19:57.780
task, coming up with a plan,

312
00:19:59.060 --> 00:20:02.660
maybe even storing that plan somewhere in the file

313
00:20:02.660 --> 00:20:06.380
system as a checklist for itself, using a tool,

314
00:20:06.380 --> 00:20:10.060
again using MCP, for instance, that gives it access to

315
00:20:10.060 --> 00:20:13.880
a contained file system, a

316
00:20:13.880 --> 00:20:17.440
temporary directory, basically where it can store intermediate

317
00:20:17.440 --> 00:20:21.200
information. So it puts its checklist, its plan in there and

318
00:20:21.200 --> 00:20:25.040
then it does something and then it goes back to the checklist, checks this thing

319
00:20:25.040 --> 00:20:28.840
off and then it realizes, well, for this I need some more information from

320
00:20:28.840 --> 00:20:32.560
the customer, goes back, chats with the customer

321
00:20:32.560 --> 00:20:36.080
to receive this information and so forth. So an

322
00:20:36.080 --> 00:20:37.680
agent basically

323
00:20:40.820 --> 00:20:43.060
perceives, acts.

324
00:20:44.740 --> 00:20:48.340
So it gets information from

325
00:20:48.340 --> 00:20:51.860
either from the user through the prompt or through a tool, through a

326
00:20:51.860 --> 00:20:53.620
database or something, gets information,

327
00:20:55.460 --> 00:20:58.500
decides based on this information and then acts

328
00:20:58.980 --> 00:21:02.500
within certain bounds. And these boundaries are basically

329
00:21:02.500 --> 00:21:05.990
defined by the use case and the capabilities of this agent.

330
00:21:06.150 --> 00:21:09.750
The third so non agentic, gentic and the third basically

331
00:21:09.750 --> 00:21:13.310
being multi agent systems. This is where multiple

332
00:21:13.310 --> 00:21:16.790
agents interact with each other to,

333
00:21:18.070 --> 00:21:21.429
to provide for an even more

334
00:21:21.429 --> 00:21:25.230
complex use case. This is something that

335
00:21:25.230 --> 00:21:28.550
I talked about before where you may have an agent

336
00:21:28.870 --> 00:21:32.630
that interacts with the customer through the website and is able to

337
00:21:33.780 --> 00:21:37.620
kick off different processes depending on the customer and what they want.

338
00:21:37.940 --> 00:21:41.700
And this agent then communicates with an agent that's responsible

339
00:21:41.700 --> 00:21:45.540
for complaints and another agent that's responsible for ingesting orders

340
00:21:46.740 --> 00:21:47.620
and so forth.

341
00:21:51.060 --> 00:21:54.660
So it can become infinitely complicated.

342
00:21:55.620 --> 00:21:59.060
I'm basically thinking about agents as like I think about

343
00:21:59.060 --> 00:22:02.710
microservices. Actually I would say agent is

344
00:22:02.710 --> 00:22:06.390
when the AI deviates from simple. If this, then that rules,

345
00:22:06.870 --> 00:22:10.430
would that be a good definition? Even

346
00:22:10.430 --> 00:22:13.830
complex? If this and that rules. There are

347
00:22:13.910 --> 00:22:16.790
fairly complex workflows that can be

348
00:22:16.790 --> 00:22:18.790
deterministically defined

349
00:22:20.310 --> 00:22:23.910
where the entire work process, no matter how

350
00:22:25.030 --> 00:22:27.600
complex the process is, is in itself

351
00:22:28.960 --> 00:22:31.840
is deterministic and

352
00:22:32.800 --> 00:22:36.520
algorithmic. So you can do it with step functions or you can do

353
00:22:36.520 --> 00:22:40.280
it with a, with an orchestration engine or something like that. In

354
00:22:40.280 --> 00:22:44.080
that case I wouldn't necessarily use AI, an AI agent,

355
00:22:44.240 --> 00:22:47.840
I might use AI, for instance, as a front end to that process

356
00:22:48.080 --> 00:22:51.680
that understands natural language. So if I want to have,

357
00:22:51.920 --> 00:22:55.520
if I want to have, want to have the

358
00:22:56.400 --> 00:23:00.000
possibility to use Slack, do

359
00:23:00.400 --> 00:23:04.080
anything through Slack, I have an agent in Slack and I'm able to

360
00:23:04.080 --> 00:23:07.800
tell the agent, please do this for me. In the past

361
00:23:07.800 --> 00:23:11.520
I would have to use a very specific command format for Slack

362
00:23:11.520 --> 00:23:15.280
Ops. And now with LLMs I can use natural language

363
00:23:15.280 --> 00:23:18.680
and I just use natural language. Then I have the LLM as

364
00:23:18.680 --> 00:23:22.140
basically the client and the

365
00:23:22.140 --> 00:23:25.700
LLM is not an agent. It just takes the

366
00:23:25.700 --> 00:23:29.460
request and has a list of processes, classifies

367
00:23:29.460 --> 00:23:33.220
the request, extracts the relevant information and

368
00:23:33.220 --> 00:23:36.660
kicks off the processes. It becomes

369
00:23:36.740 --> 00:23:40.380
agentic as soon as, yeah, it becomes agentic as

370
00:23:40.380 --> 00:23:43.860
soon as this thing

371
00:23:44.820 --> 00:23:48.100
may have to make more involved decisions like

372
00:23:49.030 --> 00:23:52.350
maybe we need more information or maybe we have to call

373
00:23:52.350 --> 00:23:56.070
somebody. It becomes fairly complicated fairly quickly. So it's really hard

374
00:23:56.470 --> 00:23:59.790
to talk about it. But what I'm trying to say really is don't build an

375
00:23:59.790 --> 00:24:02.790
agent for everything. First of all, don't use AI

376
00:24:03.270 --> 00:24:05.190
if the problem can be solved without

377
00:24:06.870 --> 00:24:10.590
in a fairly easy way. Second, don't build an agent if it can

378
00:24:10.590 --> 00:24:14.390
be done by a simple deterministic workflow,

379
00:24:14.390 --> 00:24:17.710
even if it involves an LLM. And don't

380
00:24:18.590 --> 00:24:22.230
build complicated. Well, let me say it this

381
00:24:22.230 --> 00:24:25.950
way. If the solution to your problem is more complicated

382
00:24:25.950 --> 00:24:27.870
than the problem you're trying to solve,

383
00:24:29.550 --> 00:24:33.230
you are probably doing it wrong,

384
00:24:33.950 --> 00:24:37.750
if that makes sense. I see,

385
00:24:37.750 --> 00:24:41.510
I see. What does a typical production AI stack looks

386
00:24:41.510 --> 00:24:45.150
like in 2025, especially for startups that are scaling fast?

387
00:24:45.790 --> 00:24:49.230
Well, you need a few things. One is obviously model

388
00:24:49.230 --> 00:24:53.030
serving. So you need the model

389
00:24:53.030 --> 00:24:56.790
somewhere. Could be locally, could be something that you do yourself. As a

390
00:24:56.790 --> 00:25:00.430
startup. I wouldn't recommend doing that as a startup. I would really

391
00:25:00.430 --> 00:25:03.870
just, really just suggest you use an existing model

392
00:25:03.870 --> 00:25:07.590
provider that provides the

393
00:25:07.590 --> 00:25:11.350
model through an API. Could be Amazon Bedrock, for instance. We have lots of

394
00:25:11.350 --> 00:25:15.070
different models and the list is growing. We have open weights model like Llama,

395
00:25:15.070 --> 00:25:18.330
Mistral, Deep Seq, Others

396
00:25:18.570 --> 00:25:22.370
we have commercial models like Claude, our

397
00:25:22.370 --> 00:25:24.810
own Nova model family we have.

398
00:25:26.730 --> 00:25:30.410
Now I'm blanking out. We have Cohere. We have many different models

399
00:25:30.490 --> 00:25:33.930
that you can use for different use cases. We have many, also many

400
00:25:34.250 --> 00:25:37.730
general purpose language models. So you could use Amazon

401
00:25:37.730 --> 00:25:41.490
Bedrock to just talk to a model through an API, through

402
00:25:41.490 --> 00:25:45.240
a secure API, while it's secure, so you

403
00:25:45.240 --> 00:25:48.680
don't have to worry about what does the model provider do with your data. We

404
00:25:48.680 --> 00:25:52.000
don't do anything with it. We don't even use it for model training.

405
00:25:52.640 --> 00:25:56.120
We just provide the model to you so that you can use it in a

406
00:25:56.120 --> 00:25:59.360
secure way, so that you can even build GDPR compliance

407
00:25:59.920 --> 00:26:03.600
systems. And so that's model

408
00:26:03.600 --> 00:26:07.400
serving. You may need databases, either your own

409
00:26:07.400 --> 00:26:11.010
databases, maybe a vector store if you want to do some semantic search.

410
00:26:11.170 --> 00:26:15.010
But that's advanced, I wouldn't start with that. And you need

411
00:26:15.010 --> 00:26:18.770
something that orchestrates the process. So

412
00:26:18.770 --> 00:26:21.570
something that basically takes the input

413
00:26:22.050 --> 00:26:25.850
calls the actual language model. Because the language model itself cannot do

414
00:26:25.850 --> 00:26:29.170
anything. It only takes text

415
00:26:30.050 --> 00:26:33.890
and, or depending on the modality, we're talking about language models right now. So it

416
00:26:33.890 --> 00:26:37.570
takes text and it produces something based on that text. It doesn't do anything.

417
00:26:37.890 --> 00:26:41.490
So the orchestration engine connects to

418
00:26:41.490 --> 00:26:45.330
MCP tools, to the front end, to whatever you

419
00:26:45.330 --> 00:26:48.890
want to do, and The LLM and the orchestration framework

420
00:26:49.930 --> 00:26:52.970
could be one of the many open source frameworks that we have, like

421
00:26:53.210 --> 00:26:55.850
Langgraph is one or

422
00:26:56.890 --> 00:27:00.610
Llama Index is another one. True AI is one strands agents.

423
00:27:00.610 --> 00:27:03.290
This is one that we have open sourced about two months ago,

424
00:27:04.410 --> 00:27:08.160
which is model agnostics, even provider agnostics. You can use

425
00:27:08.160 --> 00:27:11.880
trans agents with models from OpenAI

426
00:27:12.280 --> 00:27:16.000
or directly with Llama API or

427
00:27:16.000 --> 00:27:19.640
even with Olama on your own machine. So you need an orchestration

428
00:27:20.120 --> 00:27:23.800
tool or engine, you need a

429
00:27:23.800 --> 00:27:27.520
model somewhere. You may need a database or some data

430
00:27:27.520 --> 00:27:30.760
for the model to work with. You might want to think about

431
00:27:31.400 --> 00:27:35.100
primary and secondary models that's a bit more advanced than as well. So primary

432
00:27:35.100 --> 00:27:37.660
models is the general purpose model

433
00:27:38.780 --> 00:27:42.620
that does the majority of the work. And then you may want to use

434
00:27:42.620 --> 00:27:46.220
secondary models for instance for very simple use cases, so you don't

435
00:27:46.220 --> 00:27:49.980
need to use the expensive ones. You can use very cost effective models

436
00:27:49.980 --> 00:27:53.780
for simple summarization tasks, while you may want to use a

437
00:27:53.780 --> 00:27:57.300
more expensive reasoning model for the overall orchestration for

438
00:27:57.300 --> 00:28:00.780
instance. Another thing that's part of the stack is evaluation and

439
00:28:00.780 --> 00:28:04.530
monitoring. And that is something where I really would say as

440
00:28:04.530 --> 00:28:07.130
a startup you should put that in place as early as possible.

441
00:28:08.170 --> 00:28:11.970
Monitoring. I think monitoring observability is self explanatory. You

442
00:28:11.970 --> 00:28:15.530
should be able to see what's happening. And you should also very

443
00:28:15.530 --> 00:28:19.290
early implement cost monitoring. Because if something

444
00:28:19.290 --> 00:28:23.050
goes wrong, especially in a non deterministic system

445
00:28:23.050 --> 00:28:25.450
like agentic AI system,

446
00:28:28.750 --> 00:28:32.550
if it runs into a loop, it may run up

447
00:28:32.550 --> 00:28:36.150
a big context that it sends to the recursively

448
00:28:36.150 --> 00:28:39.750
sends to the LLM and all of a sudden it become very expensive. You wouldn't

449
00:28:39.750 --> 00:28:43.509
want that. So please set up cost monitoring

450
00:28:43.509 --> 00:28:47.310
very early. And also I would recommend

451
00:28:48.110 --> 00:28:51.950
implementing an evaluation mechanism. Evaluation is

452
00:28:51.950 --> 00:28:54.360
basically testing but full LLMs. So

453
00:28:57.080 --> 00:29:00.880
you take a specific model and you have a number of prompts for your

454
00:29:00.880 --> 00:29:04.640
system and some data that you get from your database or

455
00:29:04.640 --> 00:29:08.360
through RAG or through mcp and you plug these things together and

456
00:29:08.360 --> 00:29:11.800
you test them in different scenarios, maybe with different user

457
00:29:11.800 --> 00:29:15.360
inputs to a point where in most cases you are

458
00:29:15.360 --> 00:29:19.160
satisfied with the response. So you have you reach

459
00:29:19.160 --> 00:29:22.970
a certain threshold of reliability of your

460
00:29:23.370 --> 00:29:26.090
system to do what it's supposed to do.

461
00:29:27.050 --> 00:29:29.450
All of a sudden a model provider

462
00:29:31.450 --> 00:29:34.810
deploys an update of their model, a new version

463
00:29:35.690 --> 00:29:39.450
that has been trained on different data or has been fine tuned in a different

464
00:29:39.450 --> 00:29:41.770
way, which could break your

465
00:29:42.970 --> 00:29:46.410
system apart because a variable, a very important

466
00:29:46.410 --> 00:29:48.650
variable has changed. Or

467
00:29:50.830 --> 00:29:54.510
maybe you change your prompts that

468
00:29:54.510 --> 00:29:58.070
you use as part of the pipeline or Your data changes, the

469
00:29:58.070 --> 00:30:01.630
structure of your data changes. This could all lead to

470
00:30:03.070 --> 00:30:06.910
the fact that your overall system degrades in terms of

471
00:30:06.910 --> 00:30:08.750
reliability when it comes to

472
00:30:11.310 --> 00:30:15.070
how good the results are. And you can solve that by

473
00:30:15.070 --> 00:30:18.870
implementing an evaluation pipeline so that whenever you change

474
00:30:18.870 --> 00:30:22.430
anything, you run a

475
00:30:22.430 --> 00:30:25.990
number of prompts or a number of use

476
00:30:25.990 --> 00:30:29.630
cases against the system to see

477
00:30:29.630 --> 00:30:33.349
if the reliability drops beneath your threshold. And

478
00:30:33.349 --> 00:30:36.830
if it does, the test fails and you have to go look at it.

479
00:30:36.990 --> 00:30:40.830
That's very important. If you implement something like this as early as possible,

480
00:30:41.070 --> 00:30:44.430
just like with testing in general, you will be able to

481
00:30:44.740 --> 00:30:48.260
to iterate with much quicker than

482
00:30:48.340 --> 00:30:51.900
if every time there's a new model update, or every time the data structure

483
00:30:51.900 --> 00:30:54.820
changes, or every time you update your own prompts,

484
00:30:56.580 --> 00:31:00.020
the system falls apart because you realize it doesn't

485
00:31:00.020 --> 00:31:03.700
reliably create the solutions or

486
00:31:03.700 --> 00:31:07.420
the responses that I was looking for. Apart from

487
00:31:07.420 --> 00:31:11.140
that, well, we were at the question, what's part of the stack? So model

488
00:31:11.460 --> 00:31:14.120
serving, data access and orchestration,

489
00:31:15.240 --> 00:31:18.760
then mostly primary model to start with. And I would suggest

490
00:31:18.920 --> 00:31:22.200
just starting with a primary model, then evaluation and monitoring,

491
00:31:22.440 --> 00:31:25.960
then maybe a data pipeline if you actually want

492
00:31:26.040 --> 00:31:29.680
to use live data that changes over time. But again that's a fairly advanced

493
00:31:29.680 --> 00:31:33.520
topic and obviously security and compliance. Whenever you

494
00:31:33.520 --> 00:31:37.000
use sensitive data, whenever you use proprietary information,

495
00:31:37.320 --> 00:31:41.070
make sure that you comply to your internal

496
00:31:41.470 --> 00:31:45.230
compliance frameworks, to your customers, compliance frameworks to

497
00:31:45.230 --> 00:31:48.750
legal compliance frameworks, make sure that you use proper

498
00:31:48.990 --> 00:31:52.790
authentication, make sure that your agent can only do what it's supposed to

499
00:31:52.790 --> 00:31:56.510
do, that your agent doesn't have access to your customer

500
00:31:56.510 --> 00:32:00.150
database, while also chatting on the Internet with random

501
00:32:00.150 --> 00:32:03.670
people and perhaps by

502
00:32:03.670 --> 00:32:07.020
accident giving them access to your customer database.

503
00:32:07.570 --> 00:32:11.170
That's important. Security and compliance, that's part of

504
00:32:11.250 --> 00:32:15.010
any production stack and should be

505
00:32:15.090 --> 00:32:18.810
because these are the basic things that you need to make

506
00:32:18.810 --> 00:32:22.370
sure that you don't run into problems, most likely

507
00:32:22.370 --> 00:32:26.090
earlier than later. How do you

508
00:32:26.090 --> 00:32:29.850
guys from AWS support this kind of production

509
00:32:29.850 --> 00:32:33.370
grade AI stack from bedrock to step functions to vector

510
00:32:33.370 --> 00:32:36.610
DBs? Well, first of all we have a number of services

511
00:32:37.810 --> 00:32:40.210
around the bedrock family of services,

512
00:32:41.650 --> 00:32:45.290
and that is Amazon Bedrock itself, which

513
00:32:45.290 --> 00:32:48.770
is first of all model serving,

514
00:32:49.570 --> 00:32:53.410
where we provide secure and private

515
00:32:53.490 --> 00:32:56.610
access to models from different providers, including

516
00:32:57.330 --> 00:33:01.010
the current frontier models of most providers

517
00:33:01.250 --> 00:33:03.820
where you can just use models and be sure that

518
00:33:04.940 --> 00:33:08.580
your data is not being used for training or used for

519
00:33:08.580 --> 00:33:12.380
anything else. So we basically run these models

520
00:33:12.380 --> 00:33:15.700
inside of our own escrow accounts. They are air

521
00:33:15.700 --> 00:33:19.260
gapped. Everything you send to the model

522
00:33:19.420 --> 00:33:23.100
is not stored or reused for anything.

523
00:33:23.500 --> 00:33:27.260
It's just sent to the model. The model's basically brought to

524
00:33:27.260 --> 00:33:30.790
life, loaded into the GPU cluster. It

525
00:33:30.790 --> 00:33:34.430
runs and then it returns the response and then the

526
00:33:34.430 --> 00:33:38.270
models basically goes back down and all the data is gone

527
00:33:39.070 --> 00:33:42.910
apart from the actual model weights themselves because they

528
00:33:42.910 --> 00:33:46.709
need to be used for subsequent calls. So that is one thing.

529
00:33:46.709 --> 00:33:49.710
Amazon Bedrock, which provides access to models,

530
00:33:50.350 --> 00:33:53.470
including our own family of models, including

531
00:33:56.750 --> 00:34:00.330
the capability to actually fine tune certain models

532
00:34:00.970 --> 00:34:04.530
or distill models into

533
00:34:04.530 --> 00:34:08.130
smaller models. If you want to use smaller models, basically you want to use,

534
00:34:08.130 --> 00:34:11.930
let's say you want to use Llama Llama 4, but you don't want to use

535
00:34:12.490 --> 00:34:16.090
the version that Meta provides. You want to distill this into a

536
00:34:16.090 --> 00:34:19.930
smaller model. You can do that with Bedrock. Again, very advanced features.

537
00:34:19.930 --> 00:34:23.490
I would not suggest starting with that. It's

538
00:34:23.490 --> 00:34:27.300
time intensive, it's costly, it's a use

539
00:34:27.300 --> 00:34:31.140
case for enterprises and it may be a use case for you

540
00:34:31.140 --> 00:34:34.700
once you are further ahead on the road in

541
00:34:34.700 --> 00:34:38.420
adoption. The second thing that we provide is Bedrock

542
00:34:38.420 --> 00:34:42.100
Guardrails along with a few other capabilities.

543
00:34:42.260 --> 00:34:45.900
Bedrock Knowledge Base guardrails is

544
00:34:45.900 --> 00:34:49.700
basically to mask sensitive data or

545
00:34:49.700 --> 00:34:53.540
to block requests that contain sensitive data or responses that contain

546
00:34:53.540 --> 00:34:57.320
sensitive data or to block requests or responses, responses

547
00:34:57.320 --> 00:35:01.080
that violate ethical codes that you've defined or something like

548
00:35:01.080 --> 00:35:04.880
that. And Knowledge base is basically

549
00:35:04.880 --> 00:35:08.720
direct access to vector stores. And we also provide

550
00:35:08.880 --> 00:35:12.400
obviously services for vector stores with OpenSearch

551
00:35:12.560 --> 00:35:16.000
or with Postgres on RDS. We've just

552
00:35:16.000 --> 00:35:19.720
released S3 vectors, Amazon S3 vectors.

553
00:35:19.720 --> 00:35:23.320
So you can even store your vectors on S3 as object

554
00:35:23.320 --> 00:35:27.130
stores, but and use that which is extremely cost effective

555
00:35:27.130 --> 00:35:30.730
if you compare it to traditional vector stores. Because traditional vector

556
00:35:30.730 --> 00:35:34.370
stores have to be. They're basically database servers, servers

557
00:35:35.170 --> 00:35:38.730
and they have to run and they cost money. And S3

558
00:35:38.730 --> 00:35:42.530
vectors stores your vectors on S3. So you only

559
00:35:42.530 --> 00:35:45.690
pay for storage, not for a machine that's running all the time. You pay for

560
00:35:45.690 --> 00:35:49.370
storage and then of course for access. So it's, you

561
00:35:49.370 --> 00:35:52.610
can, you can even. You can save up to 90% of the cost

562
00:35:53.010 --> 00:35:56.410
compared to. To database

563
00:35:57.450 --> 00:36:00.810
based vector stores. And then

564
00:36:01.530 --> 00:36:05.370
we have Bedrock Agents, which is out of a box system that provides

565
00:36:05.530 --> 00:36:09.210
agents in a fairly opinionated

566
00:36:09.210 --> 00:36:12.890
way. So you can just build an agent using Bedrock

567
00:36:12.890 --> 00:36:16.410
agents. We don't have to do that much,

568
00:36:16.490 --> 00:36:20.010
but these agents are self contained inside of

569
00:36:20.010 --> 00:36:23.420
aws. And the third thing, and that is something that's

570
00:36:23.820 --> 00:36:27.340
just in purview since. Since a few weeks at the time of

571
00:36:27.340 --> 00:36:30.860
recording. Maybe ga. When you listen to this episode. GA means general

572
00:36:31.340 --> 00:36:34.540
generally available. It's In a public preview right now, that's

573
00:36:34.540 --> 00:36:38.060
Bedrock, Amazon Bedrock's Agent

574
00:36:38.060 --> 00:36:41.700
Core and that's family of services. That's very

575
00:36:41.700 --> 00:36:45.260
interesting because it, it gives you all the

576
00:36:45.260 --> 00:36:48.890
individual capability capabilities as building

577
00:36:48.890 --> 00:36:52.570
blocks that you can use. That is, it

578
00:36:52.570 --> 00:36:56.370
has access to Bedrock models, obviously, but it also, you can

579
00:36:56.370 --> 00:36:59.770
also use this, use it to use Bedrocks, any,

580
00:37:00.570 --> 00:37:04.330
sorry to use models anywhere. So you can

581
00:37:04.330 --> 00:37:07.970
also use this, use it with OpenAI, with llama API,

582
00:37:07.970 --> 00:37:11.530
with your own ollama or whatnot. And it's

583
00:37:11.850 --> 00:37:15.610
provider, no, it's framework agnostic. So you can deploy

584
00:37:15.850 --> 00:37:19.490
your CREWAI agents or your langdref agents.

585
00:37:19.730 --> 00:37:22.610
You don't have to do it the AWS way.

586
00:37:24.210 --> 00:37:27.250
The next capability is memory, because very often it's important

587
00:37:27.810 --> 00:37:31.530
to maintain information across sessions. So when

588
00:37:31.530 --> 00:37:35.250
I talk to the agent right now I wanted to retain information

589
00:37:35.330 --> 00:37:39.010
about previous conversations. And that's a capability. It's called

590
00:37:39.010 --> 00:37:42.730
Agent Core Memory. Again, Agent Core memory can

591
00:37:42.730 --> 00:37:46.450
just be plugged into an agent that you run on Agent Core, but it can

592
00:37:46.450 --> 00:37:50.190
also just be plugged into an agent that you run somewhere else. Again,

593
00:37:50.510 --> 00:37:53.910
it's provider and framework agnostic. The third

594
00:37:53.910 --> 00:37:57.470
capability is Tools. So right now,

595
00:37:57.550 --> 00:38:00.830
as of now, we provide direct access to

596
00:38:02.190 --> 00:38:05.230
code environment sandbox, which is completely isolated.

597
00:38:06.030 --> 00:38:09.630
So if you, if, if your agent creates code or if you're the

598
00:38:09.630 --> 00:38:13.150
user of your agent's agent sends code, the

599
00:38:13.150 --> 00:38:16.880
agent can then just use one of these

600
00:38:16.880 --> 00:38:20.640
sandboxes to run that code in a completely secure

601
00:38:20.640 --> 00:38:24.360
and isolated environment. And right now it provides a

602
00:38:24.360 --> 00:38:27.640
python runtime and typescript, most likely more in the future.

603
00:38:28.600 --> 00:38:32.040
It also provides browser access to a browser

604
00:38:32.280 --> 00:38:36.040
environment so that your agent can use the Internet again

605
00:38:36.920 --> 00:38:40.600
in an isolated environment. Another capability, of

606
00:38:40.600 --> 00:38:44.450
course, is Security Identity. You can do everything,

607
00:38:45.010 --> 00:38:48.610
of course, using IAM identity and access management with

608
00:38:48.610 --> 00:38:52.290
AWS, but you can also use OAuth with

609
00:38:52.850 --> 00:38:56.450
any kind of OAuth provider or your corporate

610
00:38:56.450 --> 00:38:59.730
identity provider, your commercial identity provider that

611
00:39:00.130 --> 00:39:03.410
you're using anyway to make sure only the people who

612
00:39:03.730 --> 00:39:07.530
should access your agents are actually able to access

613
00:39:07.530 --> 00:39:11.100
your agents. And then we have, and I

614
00:39:11.100 --> 00:39:13.660
realize I've been talking a lot and it's a lot of stuff. I'm going to

615
00:39:13.660 --> 00:39:17.500
summarize that in a second. Then one of two more things. Observability

616
00:39:17.580 --> 00:39:21.300
out of the box using OpenTelemetry, so you can use your

617
00:39:21.300 --> 00:39:24.220
existing observability stack if you want.

618
00:39:24.860 --> 00:39:27.260
And Gateway

619
00:39:28.300 --> 00:39:31.660
Agent Core Gateway allows you to just wrap any

620
00:39:31.660 --> 00:39:35.500
API that you may already have and expose it as

621
00:39:35.500 --> 00:39:38.830
an MCP server, including discovery,

622
00:39:39.070 --> 00:39:41.470
including even the ability to

623
00:39:45.630 --> 00:39:49.310
sell your Own agent or your own MCP server on the

624
00:39:49.310 --> 00:39:52.750
AWS marketplace to other AWS customers if you want.

625
00:39:53.150 --> 00:39:56.950
So in summary, what Agent Core provides is all the building blocks that

626
00:39:56.950 --> 00:39:59.310
you might need to build an agent memory,

627
00:40:00.590 --> 00:40:04.310
a runtime for the agents and the MCP servers, a gateway.

628
00:40:04.310 --> 00:40:07.670
If you already have your MCP or your server and just want to wrap it

629
00:40:07.670 --> 00:40:11.290
as MCP uses, has identity, it has

630
00:40:11.370 --> 00:40:14.410
observability, tools and memory.

631
00:40:14.970 --> 00:40:16.970
That's what we. It's a lot, I realize.

632
00:40:18.970 --> 00:40:22.690
Yeah, it is. I

633
00:40:22.690 --> 00:40:26.490
was wondering for our audience, have you built an AI prototype that

634
00:40:26.570 --> 00:40:30.210
almost made it into production? What blocked you

635
00:40:30.210 --> 00:40:32.970
tag us with your war story?

636
00:40:33.930 --> 00:40:36.650
We'll be back after a very, very short ad break

637
00:40:42.470 --> 00:40:46.070
then is some startups are using multiple

638
00:40:46.070 --> 00:40:49.830
foundation models at once. What's AWS's approach

639
00:40:49.910 --> 00:40:53.670
to multiple model orchestration and

640
00:40:53.670 --> 00:40:55.750
how do you manage that securely?

641
00:40:58.630 --> 00:41:02.230
Multimodal usage is one of the core premises

642
00:41:02.310 --> 00:41:04.950
because we say it doesn't make sense to use one for everything,

643
00:41:06.720 --> 00:41:10.240
which is why we started bedrock the way we did in the first place. It's

644
00:41:10.240 --> 00:41:14.000
not one model that you can use. You can use models from

645
00:41:14.000 --> 00:41:17.840
many different providers with many different capabilities. Because

646
00:41:18.560 --> 00:41:22.360
in many use cases you may want to use a general purpose large language

647
00:41:22.360 --> 00:41:25.960
model, but you also may want to use a model to create your embeddings or

648
00:41:25.960 --> 00:41:29.800
to create images. Could be from a completely different provider. Or you want to

649
00:41:29.800 --> 00:41:33.630
use a reasoning model for involved tasks and a

650
00:41:33.630 --> 00:41:37.310
much less expensive small model for basic

651
00:41:37.310 --> 00:41:40.310
tasks like summarization or classification.

652
00:41:41.030 --> 00:41:44.590
And these can be from different providers. Then there's models that are

653
00:41:44.590 --> 00:41:48.230
specialized in translation, language, translation of languages.

654
00:41:48.230 --> 00:41:51.990
There's models that may be specialized in code, creating

655
00:41:51.990 --> 00:41:54.950
code. So Bedrock has already always

656
00:41:55.590 --> 00:41:58.950
Bedrock and AWS has always looked at it through the lens of different

657
00:41:58.950 --> 00:42:02.510
customers need different things, different models and individual

658
00:42:02.510 --> 00:42:06.210
customers may need different models models for different use cases

659
00:42:06.290 --> 00:42:09.930
or even inside the same use case. So I wouldn't

660
00:42:09.930 --> 00:42:12.770
start that way. If I would just build a prototype,

661
00:42:13.810 --> 00:42:17.330
I wouldn't start with multiple models, I would start with one to start

662
00:42:17.330 --> 00:42:20.770
understanding the moving parts and how it works and what and the limitations.

663
00:42:21.330 --> 00:42:24.770
But you can certainly use multiple models, and

664
00:42:25.170 --> 00:42:28.290
in any kind of production application I probably would

665
00:42:28.770 --> 00:42:32.070
because that helps me reduce cost,

666
00:42:33.030 --> 00:42:36.750
reduce latency, because the larger the model, the longer it

667
00:42:36.750 --> 00:42:40.150
takes for the model to respond, to take

668
00:42:40.790 --> 00:42:42.790
these things into account. So yes,

669
00:42:44.070 --> 00:42:47.910
multimodal, using different models, even from different providers,

670
00:42:48.070 --> 00:42:51.110
is certainly something that I would suggest looking into

671
00:42:52.550 --> 00:42:55.830
once you have your basic use case down

672
00:42:56.230 --> 00:42:59.950
and once you go into well, how can I optimize cost how can I

673
00:42:59.950 --> 00:43:03.760
optimize latency? Is there a specialized model that

674
00:43:03.760 --> 00:43:07.080
helps me with certain tasks inside the workflow?

675
00:43:07.640 --> 00:43:11.360
How do we make sure it's secure? Well, just like everything

676
00:43:11.360 --> 00:43:15.040
on aws, everything goes through the

677
00:43:15.040 --> 00:43:18.360
AWS API. So every model invocation goes through the

678
00:43:18.440 --> 00:43:21.840
AWS API, which is protected through iam, through

679
00:43:21.840 --> 00:43:25.240
identity and access management. So you can have

680
00:43:25.400 --> 00:43:28.940
really fine grained mechanism to say

681
00:43:28.940 --> 00:43:32.540
who or which service or which third

682
00:43:32.540 --> 00:43:35.860
party or which agent is allowed, which process is

683
00:43:35.860 --> 00:43:39.380
allowed to interact with individual models with intro

684
00:43:39.620 --> 00:43:43.380
with individual data stores or tools that you provide.

685
00:43:45.140 --> 00:43:48.500
For our audience. I was wondering what's one tool or

686
00:43:48.500 --> 00:43:51.860
pattern that help you finally scale your project?

687
00:43:52.260 --> 00:43:55.780
Shared on threads or LinkedIn and taxpayer o

688
00:43:56.460 --> 00:44:00.140
Dennis, what's your take on LLM ops or

689
00:44:00.140 --> 00:44:03.420
GenIML ops? Is it the same as

690
00:44:03.420 --> 00:44:06.300
traditional ML Ops or something new?

691
00:44:07.260 --> 00:44:10.780
I'm getting into hot waters when I start

692
00:44:10.860 --> 00:44:14.700
talking about that because it's

693
00:44:14.700 --> 00:44:18.420
not the same. So. And I, I'm not sure if. What did

694
00:44:18.420 --> 00:44:21.420
you say? Gen AI Ops, LLM Ops,

695
00:44:22.220 --> 00:44:25.840
ML Ops OP definitions

696
00:44:25.840 --> 00:44:29.600
are in flow. Well, I'm pretty sure there is a definition for MLOps

697
00:44:30.480 --> 00:44:33.680
and there's probably also a definition for LLM ops.

698
00:44:34.800 --> 00:44:38.560
But the thing is that

699
00:44:39.119 --> 00:44:42.760
with the democratization of generative

700
00:44:42.760 --> 00:44:46.600
AI since the Chat GPT moment, effectively when everybody wants to

701
00:44:46.600 --> 00:44:48.670
build on top of generative AI,

702
00:44:50.820 --> 00:44:54.180
there's a new kind of discipline

703
00:44:54.580 --> 00:44:58.100
emerging which sits at the intersection of

704
00:44:58.500 --> 00:45:00.980
software developers and

705
00:45:02.980 --> 00:45:06.100
data scientists and

706
00:45:06.500 --> 00:45:10.260
machine learning engineers. And that's what's

707
00:45:11.300 --> 00:45:15.100
emerging as AI engineers. That's the term

708
00:45:15.100 --> 00:45:18.740
that's being used increasingly for this, where you

709
00:45:18.740 --> 00:45:22.270
don't go build the models yourself,

710
00:45:23.070 --> 00:45:26.670
you don't even necessarily fine tune the models and then

711
00:45:26.670 --> 00:45:30.350
deploy the models somewhere and run them. You use

712
00:45:30.510 --> 00:45:34.030
models, you build applications that use

713
00:45:34.110 --> 00:45:37.870
these models and traditional approaches to build an

714
00:45:37.870 --> 00:45:41.390
application that has both the intelligence

715
00:45:41.710 --> 00:45:45.070
of a language model or any kind of generative AI model

716
00:45:45.340 --> 00:45:48.140
and the, the, the

717
00:45:48.780 --> 00:45:52.180
capabilities of the piece of software you build. So the AI

718
00:45:52.180 --> 00:45:55.820
engineer understands how

719
00:45:56.300 --> 00:46:00.140
LLMs works, understands the limitations, understands the difference

720
00:46:00.140 --> 00:46:03.580
between models, but the AI engineer usually

721
00:46:03.580 --> 00:46:07.420
doesn't deploy these models, doesn't build and train these models.

722
00:46:07.420 --> 00:46:11.020
That's what happens, that's what the ML engineer does.

723
00:46:13.060 --> 00:46:16.740
And when it comes to operations, I think it's very

724
00:46:16.740 --> 00:46:20.500
similar. The LLM ops, or

725
00:46:20.660 --> 00:46:23.700
more more specifically defined probably

726
00:46:23.940 --> 00:46:27.260
ML Ops is really the

727
00:46:27.260 --> 00:46:30.820
operational aspect of building and deploying and running

728
00:46:30.820 --> 00:46:34.660
models, training models and everything around

729
00:46:34.660 --> 00:46:38.380
that. And the gen AI Ops, or

730
00:46:38.380 --> 00:46:40.020
AIOps if you will,

731
00:46:42.280 --> 00:46:45.880
is DevOps. But

732
00:46:45.960 --> 00:46:49.560
now it includes AI

733
00:46:50.040 --> 00:46:53.560
as another very important component

734
00:46:53.720 --> 00:46:57.240
which requires its own

735
00:46:58.840 --> 00:47:02.680
capabilities like evaluation. So you test AI

736
00:47:02.680 --> 00:47:04.280
differently than you test

737
00:47:06.530 --> 00:47:10.210
a front end or then you do load tests

738
00:47:10.690 --> 00:47:14.330
on an environment. You have to approach it a

739
00:47:14.330 --> 00:47:18.130
slightly different way. It's the same thing in terms

740
00:47:18.210 --> 00:47:21.530
of what you have to do. You have to make sure it works. Every time

741
00:47:21.530 --> 00:47:25.170
you change something in your application, you have to make sure that it still works,

742
00:47:25.170 --> 00:47:29.010
that you don't have any regression, that you didn't introduce any bugs.

743
00:47:30.370 --> 00:47:34.050
Now there's a new class of regression, there's a new

744
00:47:34.050 --> 00:47:37.530
class of bugs, there's a new class

745
00:47:37.610 --> 00:47:41.410
of thing that may introduce latency or that may

746
00:47:41.410 --> 00:47:45.210
introduce additional cost. And that class is based

747
00:47:45.210 --> 00:47:48.850
on the integration of AI. And in

748
00:47:48.850 --> 00:47:52.170
my opinion, the operational aspect of

749
00:47:53.210 --> 00:47:56.810
this becomes native part of DevOps over

750
00:47:56.810 --> 00:48:00.580
time. There are a lot of tools right now, there will

751
00:48:00.580 --> 00:48:04.340
be more tools in the future, but it's very different

752
00:48:04.420 --> 00:48:07.780
from what the data science and the ML engineer. Have you

753
00:48:08.180 --> 00:48:12.020
seen any counterintuitive success story

754
00:48:12.580 --> 00:48:16.340
where there was less tech and that actually led to better

755
00:48:16.340 --> 00:48:19.340
performance of the AI in production? The most

756
00:48:19.340 --> 00:48:22.860
counterintuitive thing is something that I see fairly often really

757
00:48:22.860 --> 00:48:25.860
is when you approach it saying, well,

758
00:48:26.500 --> 00:48:30.140
let's do this with AI and you realize actually we don't need AI for

759
00:48:30.140 --> 00:48:33.780
this, or let's build an AI agent because

760
00:48:33.780 --> 00:48:36.900
everybody's talking about agents right now, which is a good thing because

761
00:48:37.620 --> 00:48:41.340
it's an evolving space. But you realize actually I don't really need an

762
00:48:41.340 --> 00:48:44.740
agent because I can simply use an LLM

763
00:48:45.620 --> 00:48:49.220
for this. So the most counterintuitive

764
00:48:49.300 --> 00:48:53.060
is something that I have seen throughout my entire career. And software

765
00:48:53.060 --> 00:48:56.740
engineering is less complex, very often

766
00:48:57.540 --> 00:49:01.220
is more effective. So whenever you build something,

767
00:49:01.220 --> 00:49:04.620
I encourage you to try and to experiment with AI and AI

768
00:49:04.620 --> 00:49:07.540
agent, but I also

769
00:49:08.340 --> 00:49:12.140
encourage you to not try to just solve

770
00:49:12.140 --> 00:49:15.940
everything with AI. And that may be counterintuitive advice,

771
00:49:15.940 --> 00:49:18.890
but it has always been sound advice in my experience. Experience.

772
00:49:22.170 --> 00:49:25.770
Last question for us here in the second interview

773
00:49:26.010 --> 00:49:29.690
and thank you for sticking around with me because we're together here in a session

774
00:49:29.690 --> 00:49:33.330
for more than two hours now. Zoom out for us, Dennis.

775
00:49:33.330 --> 00:49:37.090
What's the future of AI architecture? In something like two to

776
00:49:37.090 --> 00:49:40.610
three years, do you see MCP and ancient frameworks

777
00:49:40.610 --> 00:49:44.330
becoming the new standard? I have no

778
00:49:44.330 --> 00:49:44.730
idea.

779
00:49:47.860 --> 00:49:51.700
Literally, I have no idea. If you look back through

780
00:49:51.700 --> 00:49:55.140
the last two to three years since the ChatGPT

781
00:49:55.620 --> 00:49:56.500
moment, basically

782
00:49:59.940 --> 00:50:02.340
everything has changed so dramatically.

783
00:50:03.620 --> 00:50:07.180
The technology, the infrastructure, the capabilities of the

784
00:50:07.180 --> 00:50:10.980
models themselves, the availability,

785
00:50:11.140 --> 00:50:14.100
the open source framework, the work that the community is doing,

786
00:50:15.100 --> 00:50:17.740
the many, many startups that are around

787
00:50:18.780 --> 00:50:22.580
Certainly many still try to solve old problems with new tools. But there are

788
00:50:22.580 --> 00:50:26.100
also so many niches where something incredible is actually

789
00:50:26.100 --> 00:50:28.620
happening and there's so much innovation happening.

790
00:50:31.100 --> 00:50:34.340
I don't even know I'm going to go on vacation a week from now for

791
00:50:34.340 --> 00:50:37.740
three weeks. I don't even know what the world will look like when I'm back.

792
00:50:39.020 --> 00:50:42.760
It's really hard. I think agents. Well, first

793
00:50:42.760 --> 00:50:46.200
of all, AI is not like a flu. It won't go away.

794
00:50:46.680 --> 00:50:50.440
It's going to stick around. Agentic

795
00:50:50.440 --> 00:50:54.200
AI, it's being hyped right now. But I also think

796
00:50:54.440 --> 00:50:58.200
it is a very important topic that either sticks around

797
00:50:58.200 --> 00:51:01.480
or evolves into something even more

798
00:51:03.480 --> 00:51:06.360
capable. The thing is,

799
00:51:08.530 --> 00:51:12.170
the thing is the best, the best time to get in, to get

800
00:51:12.170 --> 00:51:15.730
involved is now because it's never, it's never going to be as

801
00:51:15.730 --> 00:51:19.410
simple as it is today. And I realize it is really hard.

802
00:51:19.970 --> 00:51:23.810
I'm able, I'm. I'm lucky to be able to work with this stuff

803
00:51:23.890 --> 00:51:27.330
every day, all day long. And I'm still overwhelmed. I'm still

804
00:51:27.330 --> 00:51:30.890
overwhelmed. I've subscribed to so many newsletters and

805
00:51:30.890 --> 00:51:34.490
there's so much news and so many tools to look at and so many frameworks

806
00:51:34.490 --> 00:51:38.310
and so many. I don't know. I don't even know where to start until

807
00:51:38.310 --> 00:51:41.750
I realized most of these newsletters and most of the

808
00:51:41.750 --> 00:51:45.430
experts that are around all of a sudden, they just copy from each

809
00:51:45.430 --> 00:51:49.230
other. Many of them, not all of them, but many really just copy from each

810
00:51:49.230 --> 00:51:50.870
other. And I,

811
00:51:51.270 --> 00:51:55.110
I'm fairly convinced that many of them

812
00:51:55.110 --> 00:51:58.790
really are just AI tools creating content on the

813
00:51:58.790 --> 00:52:02.150
socials, in newsletters and so forth. So it's really hard

814
00:52:02.750 --> 00:52:04.270
to, it's really hard to

815
00:52:06.270 --> 00:52:09.230
distill the actual signal from the noise right now.

816
00:52:10.750 --> 00:52:14.470
But at the same time, it has never been as easy as today

817
00:52:14.470 --> 00:52:17.630
because it's only getting more complicated. It's only going to be more

818
00:52:18.750 --> 00:52:21.950
so for you. Really important is to get started now

819
00:52:22.670 --> 00:52:26.510
and at the same time try to understand the fundamentals, not necessarily

820
00:52:26.590 --> 00:52:30.270
the math behind these models. You don't need a PhD in science

821
00:52:30.880 --> 00:52:34.560
or in math or anything. I, I certainly don't. I'm a developer. I don't

822
00:52:34.560 --> 00:52:38.280
understand AI, to be, to be honest. But

823
00:52:38.280 --> 00:52:41.040
what I do understand very well by now is

824
00:52:41.920 --> 00:52:45.600
how can I use AI in software application?

825
00:52:45.840 --> 00:52:49.520
What impact does it have on the capabilities of what I build, but also

826
00:52:49.520 --> 00:52:53.280
what impact does it have on the way I work? That's two different

827
00:52:53.360 --> 00:52:57.180
levels. And I'm, I'm able to do

828
00:52:57.180 --> 00:53:00.780
that because I did the work to at

829
00:53:00.780 --> 00:53:04.020
least understand the fundamentals of

830
00:53:04.580 --> 00:53:08.420
what these models actually are and how

831
00:53:08.580 --> 00:53:12.180
they work in terms of their capabilities. So

832
00:53:12.260 --> 00:53:14.180
why do they get things wrong?

833
00:53:16.260 --> 00:53:19.900
Why do they have what we call hallucinations? Why do they

834
00:53:19.900 --> 00:53:23.360
have a hard time doing basic

835
00:53:23.360 --> 00:53:26.960
math while being able to talk for

836
00:53:27.600 --> 00:53:31.200
hours? So these are the things. And I

837
00:53:31.200 --> 00:53:33.360
invite you to listen to

838
00:53:34.560 --> 00:53:38.159
Joe's podcast. I invite you to have a look at the stuff that we put

839
00:53:38.159 --> 00:53:42.000
out at aws, to the things that I put out on the socials.

840
00:53:42.560 --> 00:53:45.520
Follow me on LinkedIn. It's just Dennis Troup. I think

841
00:53:45.920 --> 00:53:49.360
Joe's going to put my context into the details. Ask

842
00:53:49.360 --> 00:53:52.090
questions, talk to people. Figure out,

843
00:53:53.450 --> 00:53:56.730
figure out how this stuff works.

844
00:53:57.290 --> 00:54:00.570
Experiment, play around with it. Don't be stupid.

845
00:54:00.890 --> 00:54:04.410
Don't connect a random piece of AI to your email.

846
00:54:07.370 --> 00:54:10.970
Don't put something out on the Internet and then run up a bill because

847
00:54:10.970 --> 00:54:14.610
somebody. DDoS is you. Experiment

848
00:54:14.610 --> 00:54:18.410
in an isolated environment, maybe inside of an AWS account or

849
00:54:18.410 --> 00:54:22.090
on your local machine where everything's isolated and protected.

850
00:54:22.250 --> 00:54:25.430
You don't have to worry. Worry about external

851
00:54:26.070 --> 00:54:29.750
influences and maybe threats. Experiment, play around with it

852
00:54:29.750 --> 00:54:33.590
and at the same time think about, think

853
00:54:33.590 --> 00:54:37.430
about the things that you might want to solve for

854
00:54:37.430 --> 00:54:41.190
yourself. Think about, think about the things that you

855
00:54:41.830 --> 00:54:45.190
need to do manually, manually every day

856
00:54:45.750 --> 00:54:49.550
because it was too hard to automate or it was too impossible. It was

857
00:54:49.550 --> 00:54:53.350
impossible to automate or it was too costly, or you just didn't get around

858
00:54:53.350 --> 00:54:56.930
to automating it. Maybe AI can help you

859
00:54:57.410 --> 00:55:00.610
with a small problem that you have every day,

860
00:55:01.250 --> 00:55:05.090
that you're trying to solve every day, and it bothers you

861
00:55:05.090 --> 00:55:08.850
and it's so annoying. That is what I did.

862
00:55:08.930 --> 00:55:12.730
That's how I got started. That's how I learned. I looked at what I'm

863
00:55:12.730 --> 00:55:15.970
doing every day and there's so much stuff that I never got around to doing.

864
00:55:15.970 --> 00:55:19.530
And it was. I was complaining about them all the time and it bothered me

865
00:55:19.530 --> 00:55:23.370
all the time. And all of a sudden I realized I can build a small

866
00:55:23.370 --> 00:55:27.130
agent that just does it for me. And it doesn't even need

867
00:55:27.130 --> 00:55:30.850
to connect to sensitive data or it doesn't even connect to

868
00:55:30.850 --> 00:55:33.930
the Internet or anything. It is just a small

869
00:55:34.650 --> 00:55:38.370
CLI tool that I run automatically

870
00:55:38.370 --> 00:55:41.290
every day and it takes care of some stuff for me.

871
00:55:42.170 --> 00:55:45.130
It pulls some statistics or

872
00:55:47.730 --> 00:55:51.370
looks. If there was some new conversations on Slack that I need to know about,

873
00:55:52.330 --> 00:55:56.010
these are the small things that I built. And by building these small things,

874
00:55:56.010 --> 00:55:59.530
I. I learned how they work, I learned how they fail.

875
00:56:00.010 --> 00:56:03.730
I learned about all the things that can go wrong. And then I

876
00:56:03.730 --> 00:56:07.490
started being able to build larger things, to build more

877
00:56:07.490 --> 00:56:10.930
complex applications, to build actual agents, actual

878
00:56:10.930 --> 00:56:14.740
agentic systems that I now run for

879
00:56:14.740 --> 00:56:18.420
more and more things. I have to admit though, I'm not running anything in

880
00:56:18.420 --> 00:56:22.180
production because I'm not building production software anymore. I haven't

881
00:56:22.180 --> 00:56:25.900
for a few years, unfortunately. But the great thing is, in

882
00:56:25.900 --> 00:56:29.019
my role, I get to experiment with that stuff a lot.

883
00:56:30.700 --> 00:56:34.380
Dennis, awesome last words. Thank you very much

884
00:56:34.380 --> 00:56:37.740
for being such a good guest and telling us so much about

885
00:56:37.740 --> 00:56:40.460
AI and aws, how they're working together.

886
00:56:41.570 --> 00:56:45.010
Thank you so much for having me. It was a great time. Thank you.

887
00:56:49.810 --> 00:56:53.330
That's all folks. Find more news, streams,

888
00:56:53.570 --> 00:56:54.610
events and

889
00:56:54.610 --> 00:56:59.010
interviews@www.startuprad.IO.

890
00:56:59.410 --> 00:57:01.650
remember, sharing is caring.

891
00:57:06.300 --> 00:57:14.620
Sam.