Simplify Your AI Agents with this Strategy

May 15, 2024

When creating API, we can use all available tools. If you haven't run into

this

problem, you probably don't see it yet, but the problem is tool scatter. I've suffered from

this

a few times and it's when you start building a bunch of end-to-end solutions that you're handing off to people, you start with maybe RPA, then we've got a little bit of Python here, maybe we're jumping into verell using a little bit of AI on the back end. launching the whole chain, we launch a flame now we funnel it into Pine, you no longer have a bunch of different tools, they all have their own documentation, they all have their own snowflake shape that you have to deal with, maybe you're building a flow to respond to customer emails maybe you are creating a simple flow that will produce a newsletter for you, whatever it is, now it has become incredibly complex with all these point to point integrations, how do you take all these point to point integrations? target integrations and make them easy to access and also something you can iterate over time, it's really just putting it behind an Avi endpoint and don't take my word for it, many of these companies have already done it and come to this .

Conclusion a long time ago, if you're in the consumer layer you won't really find these APIs anywhere, even in the Business layer they start to show up in a couple of places, but these companies tend to keep their APIs internal where I'm looking. All these emerging APIs are in this prosumer layer and in that prommer layer you will have access to generate videos, social media things, you can generate a lot of images, so it is starting to become normal for these companies to expose an API that users finals may use and want to remove things here and adapt things there now if you look at the bigger picture, that will be the opportunity for this.

More Interesting Facts About,

simplify your ai agents with this strategy...

I'm pulling out First Mark's 2024 crazy diagram, they list so many different tools here. And I can bet that many of these will have some sort of API interface available, so taking a complex workflow and then putting it behind an API will not only increase

your

productivity, but also allow you to share

your

work with a wide number of people. people because guess what's happening now, people are dealing with a lot of decision fatigue and a lot of broken processes, so they don't want another complex process for you to throw at them, they want something they can consume very quickly, like an Endpoint API and maybe they can put it into your workflows as well for you on your own, maybe you're building something with zapier and it's become this huge web of things that you can take a lot of those steps and turn them into a single endpoint.

The example we're going to try to use today is that I want to get AI news quickly, but I don't want to sit and watch all the AI news; There are almost too many and I spend a lot of time watching news about AI. I'm going to build myself an API endpoint to do that. It will be quite simple. I'm going to take a definition first approach. This is not unique to me. Many companies do this with Smithy, so let's look at that quickly. they can quickly get a couple lines of code and can recover a service and an API.

Microsoft recently announced a type spec which is a similar idea and you can see here they are creating a tsp file that defines a couple of services and then We are bringing back an open API spec and Google has grpc which they came up with and Qing which actually It's off the radar. I don't think many people use this, but essentially it has a configuration language and you can do a lot of things. where you can generate schemas, you can generate protocol buffers, you can even generate that open API specification like we mentioned before, so the idea is to take this approach yourself instead of combining a bunch of tools, start thinking in terms of services , start thinking about In terms of what API, I can package all of these workflows together, so that once you have that tight feedback loop of defining some kind of use case, you nail it down and then you have a strict iteration between define that use case and develop it.

Sharing it will make your life a lot easier, so let's continue with this example of creating a YouTube tool that, what do you know, sends us AI news? I'm sure you've already created something much more complicated than this. We're going to map out the key features we're going to avoid using vendor-specific tools for our IFI design and then make sure we understand our security posture for this, so let's start with the security posture, it's not that big of a deal. I don't want to, I don't think anyone is going to hack my YouTube tool, so we're okay with using managed platforms, we're okay with not having all the super strict security controls for this one and then we'll talk about the two different components.

Of this, the first one will be the actual data channel that I want to configure, so the data channel will be super simple. I just want to download the transcripts of the YouTube videos and then I want to format them. and then save them to some disk somewhere that you can access later, then we will set up a simple rag robot and this rag robot will have its own vector store in memory so that we don't have to communicate with any third party tools to store the data, of course the performance won't be as good, we'll do it in memory, but for the example we'll let it pass and then we'll have a model provider that we'll probably access.

In this case, there will be open AI, we will have some kind of function that reaches the model and then returns with the response, so one thing here is that we are going to have an agent GPU container. I want to run this on my local desktop I want this to run in the cloud somewhere so when someone gets to that endpoint I don't have to worry about spinning up a VPS or I don't have to worry about maintaining a server, it just spins up. and then it reduces when the person is not using it, so the last step here is to become GPU rich.

You want to understand the compilation effort involved with the APIs you are building. You want to stay away from really high vendor-specific knowledge requirements. Unless you're intentionally using that platform, if I use AWS, that means I want to have strict security controls. I want to be high performing in some certain areas and AD managed services will do that for me, so we want to consider that. like whole system maintenance so you can spin up a VPS and include your API there but it may require you to manage a GPU on your own or a group of GPUs which is a completely different commitment compared to if you use a managed service platform to just handle the gpus for you and the last one is to think about adaptability which is an indicator of how easy it is for you to change.

Can you change on the fly and get it up and running again or will it be a 1-2 session? so you can get this working again if all your requirements change so quickly, a little visual on this if you go the hardware specific route. I'm not really a hardware guy. I don't have servers that I manage or anything like that, but I think that the specific knowledge of the provider will be almost zero because you have your own server, you can do what you want with it, there is no imposed opinion on how to do things and the effort of maintenance again would be a little less, let's say it's a single GPU.

As long as you understand what you need to do to keep that GPU alive you should be fine now, if you go back to managing groups of GPUs that's a different story and the security controls will have their own security. Controls you can implement there, but you can only do so much if you're a single person or if you're a small team and security isn't the main thing you're doing, so the management platform will be completely low build. The vendor-specific effort and anology will mean a little more maintenance effort here at zero. What I really mean is that I don't have to worry about the server as I don't have to think about it and I can just call it when I need it, it ceases to exist when I don't need it anymore so that's great, so the controls Security will not have as much control over security because the platform must come integrated with the security you want, of course there is.

There will be a couple of little things you'll need to do here and there, but overall you shouldn't have to worry too much about it, so let's go with the cloud solution. The cloud will require the most effort and also the one that will require the most. vendor specific knowledge, you will also have to maintain it, so if you create a bunch of ec2 instances you should maintain it, but if you go for more serverless management options, maintenance could go down and I have security controls at the highest. level here, it's really up to you how much you want to invest in that security part and if you want to use all the security management solutions that's great, you can also build some of this stuff with Spoke from scratch if you really want to. just keep in mind that you want to become GPU rich, so you don't want the GPU to be your bottleneck anymore.

When you start thinking about API endpoints, you want something that you can quickly ramp up when you don't need it. or at least you have your own hardware that has the right specs and you don't have to do model quantization and all that kind of stuff, so that's really going to slow you down, so let's jump into the code here where we are. We're going to set up that tool that we're going to use to pull AI news and get a quick summary from a chatbot endpoint, then we're going to take that endpoint and put it into an RPA tool so you get This feeling okay, we took something really complicated, but then we made it available through an endpoint and now we can quickly use it in one of our flows.

Remember we're taking a definition first approach and why I chose a couple of these. The tooling is because it embodies this idea, so modal was primarily developed for data teams, but it just so happens that they really hit the nail on the head with this generative AI use case and are able to power a lot of things like the fine-tuning AI inference and batch processing. They're used by a lot of people, so the highlight here is the cursor substack cognition that came up with Devin, the Sunol Labs AI software engineer, which is like the music generator, so a lot of people use this and one more call is scale and Scale is used by open AI almost all the big players in the space so these guys are heavyweights so using their platform is a safe bet.

They also take security very seriously. They have a really good set of security protocols that they follow and you can read this in your spare time, but they also have a couple of compliance things that they do, it's a perfect platform as a starting point to use something like AWS or one of the Cloud Fighters, okay, that covers modal if you want me to. I like more deep dive tutorials and stuff on that I can't, but I think you've probably already used it. Also using Quick API and Quick API helps you create API faster with Python.

You also get documentation out of the box, so I won. I don't cover that there is a lot of information available about Fast API. I'm assuming you're comfortable with modal and have read some of its documentation, so I'm not going to cover exactly how modal works, but I want to take you along. through this approach, which is the first definition approach, the first thing we're going to do is define our application here, then we're going to create our Fast API container and then we're going to define our storage which works in terms of when we spin up the application and it's going to create that volume for you. .

I passed this parameter here. Create if missing equals true, that means if there is none it will create it, but if there is one it won't overwrite it, so next up is containers. So this is where you don't have to deal with Docker. I can simply describe the container I want. What I'm doing here is simply defining a Linux container with Python 3.2 installed and then I go. go ahead with a pip install command which installs the libraries I need for this specific container, this will be the data images container, so that's where we'll run our download of the YouTube transcripts and save them somewhere where the next one goes. to be our agent image here we are doing the same thing again we are creating a Linux container and then we are installing all of our Python libraries that we need for this project and within this agent image specifically okay so let's talk a little bit . about the data pipeline So inside the data pipeline, all we need now is to use that container image, that data image that we want to have, a location to share, download our transcripts and then we want to run it in some kind of programming, soLet's make sure this runs at 5:00 a.m. m. every day and then it will call our YouTube upload adapter with this AIU search term, so it will go ahead and do all the work on the back end.

This is where I definitely want to find the actual file. API endpoints and we only have one here, which will be the endpoint of the chatbot from before and we are going to have it communicate with that ragot that is going to do the aquaring, so this ragbot will be naive and will just have a warehouse of vectors in memory and from there it will be able to give us an answer based on those YouTube transcripts, so what we have to do is define the exous API and then return it to give it a full definition here.

I want this to run again on our agent image, we're going to have the same volume that we want to be able to reach to open the AI, so we're going to have to give it a secret and then I have that here. the GPU vaser any so this is a really cool feature of modal this is how you get GPU rich technically and all I need to do is define which GPU I want here and I can have that GPU whenever it's available there's a couple of programming things that do it. It's a lot easier than trying to run this specific code on the GPU in the cloud and then running it here on a CPU and I need to try to quantify it.

I can define in my code which parts need a GPU and leave it at that. So next, let's dig a little bit into each of these parts here, so if we dig into the data pipeline, we'll see that it's actually called this use case and this use case idea is what I've carried over. my code, if you go to YouTube transcripts, you get YouTube transcripts, this is essentially the use case, so what it does is one, it will create a writer instance, it will clear the cache, so I don't want old transcripts. all new transcripts, so first we're going to clean up the cash and then we just do a little bit of work to search YouTube, download those transcripts and save them to that location, so it's pretty simple, if we go back to our main file, the next one is the rag robot, so if we go into our rag robot here, this is all an adapter that calls a use case and inside the use case we have something called our lazy rag agent, that's where I'm doing great part of the heavy work we are creating. an index store here and then we'll come back to this chat engine, so llama index is doing the heavy lifting for us on this end and what I like about llama index is that it's actually wrapping the chat engine for us using an agent, so if we go back to the code here, let me make it easier to read, you'll see that they have a Runner agent and it's returning a Runner agent with this query engine tool and that's what does the work for us in the backend, so I don't have to waste time writing the fragment from start to finish, this, do that, do this, it's easy enough to call it, of course, if we wanted to optimize this process, we would have to get down to the low level of rag, but for this example, I'm assuming you've already done rag stuff, you can optimize it if you want, okay, so let's see how this runs, let me open up my terminal here and actually implement this, so let me implement it right, like this which has already been deployed and we have an API endpoint that is available for us to access, so just from that quick definition, a couple of functions here and there, I have my API endpoint ready to go and by Of course, this is not like a single processed model.

I want you not to focus on the tool specifically, but the idea that I defined it then I got something that is an API endpoint for me to use Downstream if we jump to the dashboard, so let me jump to the dashboard here and go to the application itself for them to see. I'm on version five, so I implemented this a couple of times. I had a few things that weren't working, I got them working and let's look at our data pipeline blindly so we could see the logs associated with this. I can do something, you know, basic. observability stuff and now if I really wanted to run this flow, you can see it's set to run tomorrow at 5:00 AM.

UTC time, so in 5 hours I could start my data channel right now, so let's go ahead and start with that and you'll see that it takes about 5 to 8 seconds to actually run these things on startup and then act . Actually the duration is about 30 seconds, it's quite slow, not optimized. The idea is that you could optimize this if you wanted to. Simple example. I'm going to let it run this way and I'll be okay with that, so let's let it run and once it's successful, we'll go ahead and try to interact with our API endpoint so that it's successful and you'll see that.

About 2 minutes ago I ran the whole process where it uploaded the transcripts for us and we'll see here that there are a couple of obvious new stories that are probably popular right now, maybe they won't be popular when you watch this video, but Microsoft has a new one. Stepen Co Bear's massive AI showed up here and music generators as the typical suspects here, nothing too surprising, so we ran a query to ask what the latest AI news related to Microsoft is and if we look at the response from our API, we will see it. says Microsoft is preparing to launch its own new AI language model called mai1.

I don't know how to pronounce what's being developed internally, which is great, so we've done a couple of things here, which we've gotten around to. some of the latest YouTube videos that appeared on this topic, we saved the transcript, now added it to our own repository, created a vector index in memory and then had our chatbot reference in the Vector store index return us a answer. It may be trivial for you to do this now, but it's essentially a complicated process where instead of someone having to know all that stuff, you can just use an NPA API endpoint.

To further clarify this point, I set up a quick workflow using A Chimera as an example of how you can now start saving a lot of time but also share some of your work here, so I took the API endpoint I developed and incorporated it into a of these RPA tools and that's how we did it. What we're going to do is we're going to run this now and we're going to take a quick look at the settings and the question I'm asking is what is the latest AI news related to Microsoft and so we should get a response that gives send us a response specific to the transcripts we downloaded from YouTube earlier and we have an answer here, so Microsoft is ready to launch its new artificial intelligence language model called ma, mine, I still don't know how to pronounce it, developed entirely in-house. by Mustafa Suan, who previously worked at Google, aims to have around 500 billion parameters which is great, now we have a way to get news about AI without sitting on YouTube watching videos all day and I can even run this to maybe spit out a summary like a notion file Google doc something like that so we can do a lot more with this instead of just a simple chatbot.

Overall, we've accomplished what we set out to do, which is to say, we went from a bunch of tools to like this and back to just having a simple workflow, defined our use cases, built what we needed, and then shared it with customers. Downstream Downstream consumers, we can even put this in applications, we can put this in apis, we can also put this in RPA tools, so if you're using something like vercel, for example, vercel has a let me see if I can find it, verol also has a modal integration so that's Pro one of the options I took of course it integrates tightly with versel so if you're trying to go to versell That mode is pretty easy and it actually works with AWS too so you can add your AWS secrets and you can even bring in images from AWS specifically, so it's pretty interesting how, if you make some decisions up front, you can really save yourself a lot of time and a lot of headache if you go with this Define First approach, which You'll do better than trying to really navigate this marketplace of a bunch of different tools and trying to put things together that aren't meant to work together at the end of the process.

I hope this was helpful to you. If you want to see something specific, let me know in the comments below and I'll see if we can prioritize it.

Watch Video & Subscribe

If you have any copyright issue, please Contact