YTread Logo
YTread Logo

host ALL your AI locally

Jun 08, 2024
I built an AI server for my daughters. Well, first it was more for me. I wanted to run all my AI

locally

. And I'm not just talking about the command line with alama. No no no. We have a graphical user interface, a beautiful chat interface and all the features of this thing. It has our chat histories, multiple models, we can even add stable broadcast. And I was able to add this to my obsidian notes app and have my chat interface right there. I'm going to show you how to do this. Now you don't need something crazy like Terry, that's what I called my AI server.
host all your ai locally
It can be something as simple as this, this laptop. In fact, I'll be demoing the entire setup on this laptop. So probably the computer you're using now, the one you're watching, this video probably works. And seriously, you're going to love this. It's customizable, it's blazing fast, much faster than anything else I've used. Isn't it amazing? And again, it's local, it's private. I control it, which is important because I pass it on to my daughters. I want them to be able to use AI to help with school, but I don't want them to cheat or do anything weird.
host all your ai locally

More Interesting Facts About,

host all your ai locally...

But since I have control, I can include special model files that restrict what they can do, you can ask and I'll show you how to do it. So, here we go. Prepare

your

coffee. We're about to dive in, but first let me meet Terry. Now Terry has a lot of muscle. So, for that matter, he needed something big. I bought the Leon Lee zero 11 dynamic EVO xl. It is a full tower EATX case perfect for storing my ASUS X six 70 E Creator pro art motherboard. This thing is also a beast. I'll put it in the description so you can see it.
host all your ai locally
Now I also gave Terry a big brain. He has the A MD Ryzen 9 79 50 x. That's 4.2 gigahertz and 16 cores. From memory, I went a little crazy. I have 128 gigabytes of the gki trite D five Neo, it's DDR R five 6000 and it's too much for what I'm doing. I think I have a Leon Lee water cooler for the CPU. I'm not sure you're dating Leon Lee, right? I don't know. Correct me in the comments. You always do it. And then for the stuff the AI ​​loves, I bought two 40 nineties, they're the MSI Sremm and they're liquid cooled to fit on my motherboard. 24 gigabytes of memory each, which gives me a lot of power.
host all your ai locally
For my AI models for storage, we have two Samsung nine 90 pros, two terabytes, that can't be seen because they're behind things. And also from Corsair AX 1600 I 1600 watt power supply to power the entire build. Terry is ready. Now, I'm surprised to say that my system actually published on the first try, which is amazing. But what is not surprising is the fact that Ubuntu did not install. I tried for hours, in fact, for a whole day, and I almost gave up and installed Windows, but I said, no, Chuck, you're installing Linux. So I tried something new, something I'd never messed with before.
It's called Pop Os for system 76. This is amazing. It worked the first time. It even had a special image with built-in Nvidia drivers. It just sucked and it worked. So I took a sip of coffee, didn't question the magic, and moved on. Now if you want to create something similar, I have all the links below. But anyway, let's talk about how to build

your

own local AI server. First, what do you need? Really all you'll need is a computer. That's all. It can be any computer running Windows, Mac or Linux. And if you have a GPU, you'll have a much better time.
Now, again, I have to emphasize this: you won't need something as beefy as Terry, but the more powerful your computer is, the better off you'll have. Don't come at me with a Chromebook, please. Now step one, alama. This is the foundation of all our AI stuff and what we will use to run AI models. So we'll head over to alama.ai and click download and we'll have a version for each operating system. I love that. Now, if you are on Mac, download it right now and launch it. If you're on Windows, they have a preview version, but I don't want you to do that.
Instead, I want you to try the Linux version. We can install it with a command. And yes, you can run Linux on Windows with WSL. Let's get it going real quick. The first thing I'll do is go to the start bar, search for terminal and start my terminal. Now, that first part is for Windows people, just for Linux people, so wait a minute, we have to install WSL or the Windows Subsystem for Linux. It's just a WSL Dash installation command and that's it. In fact, press Enter and that will start doing some things. When you're done, we'll set up a username and password.
By the way, I have a new keyboard. Listen to that link below? It's my favorite keyboard in the entire world. Now some of you may have to restart. Alright. Simply pause the video and come back. Mine is ready to go. And we're walking around with Ubuntu 22.04, which still surprises me that we're running Linux on Windows. That's just magic right now, we're about to install Llama, but before doing that, you need to do some best practices like updating our packages. So we'll do a pseudo update of PT and then we'll do a pseudo update Y of A PT to apply all those updates.
And actually, while it's updating, can I tell you a little bit about our sponsor IT Pro by CI Learning? Now in this video, we're going to do a lot of heavy Linux stuff. I'm going to guide you through it. I'm going to hold your hand and you may not really understand what's going on. That's where the IT professional comes into play. If you want to learn Linux or anything related to it, they are your choice, that's what I use to learn new things. So if you want to learn Linux to get better at these things or want to start making this hobby your career, learn some skills, get some certifications, get your A plus, get your CNA, get your AWS certifications, your Azure Certifications and follow this crazy path of IT, it's amazing.
It's the reason I make this channel and make these videos. Check out IT Pro – they have IT training that won't let you sleep. They have labs, they have practice exams, and if you use my Code network verification right now, you'll get 30% off forever. So learn some Linux and thanks to IT Pro for sponsoring this video and making things like this possible. And speaking of my updates, they're done. And by the way, I'll have a guide for all of this. Every step, every command, you can find them in the free Chuck Academy network membership. Click the link below to join and get other cool stuff too.
I can't wait to see you there. Now we can install llama with a command. And again, all the commands are below. You'll paste this into a little curl command, magical little things and I love how easy it is. See how you sit there and let it happen. Don't you feel like a wizard when you install things like this and the fact that you're installing AI right now? Come on. I noticed one thing very quickly. Old LAMA automatically discovered that I have an Nvidia GPU and it's awesome, you're going to have a blast. If he didn't see that and you have a GPU, you may need to install some Nvidia Cuda drivers.
I'll put a link for that below, but not everyone will have to do it. And if you're using a Mac with an M1 to M three chip, you'll have a good time too. They will use the built-in GPU. Now, on this, our Mac users, our Linux users, and our Windows users are all converged. We are on the same path. Welcome. We can hold hands and sing. That's getting weird. Anyway, we need to test a few things first to make sure alama is working. And for that we are going to open our web browser. I know it's a little strange, stay with me.
I'm going to start Chrome here and here's my address bar. I want to write the local

host

, which is looking here on my computer. And port 1, 1 4, 3, 4, press enter. And if you see this right here, this message, you're ready and you're about to find out. Port 1 1 4 3 4 is what the calling API services run on and is how our other things will interact with it. It's so powerful. Just look at this. I'm very excited to show you this. Now before we continue, let's add an AI model to alama. And we can do it right now with alama Pull and we will download Llama dos, a very popular one.
Press enter and it will be ready. Now let's try it real quick. We'll have Alama run Call Two. And if it's your first time doing this, it's kind of magic. We are about to interact with a chat GPT, like AI right here, no internet required. It's all happening in that five gigabyte file. Tell me about the solar eclipse. Boom. And you can really control that to stop it. Now I want to show you this. I'm going to open a new window. Actually, this is an amazing command and with this WSL command, I am connecting to the same incident.
Again, a new window. I'm going to write watch dash N 0.5, not four five Nvidia dash smmi. This will look at my GPU performance right here in the terminal and keep updating it. So keep an eye on that right here. While I'm talking to llama two, give me a list of all of Adam Sandler's movies and check out that Go GPU. Ah, it's very funny. Now can I show you what Terry does? Very fast? I have to show you Terry. Terry has two GPUs here. They are here and Alama can use them both at the same time.
Look at this. It's so good. All semi-old Jackson movies. And look at that. Isn't it amazing? And look how fast it was. That's ridiculous. This is just the beginning. Anyway, I had to show you Terry. We now have a flame installed. That's just our base. Remember that I am going to say goodbye to you. So goodbye to end that session. The second step has to do with the web user interface. And this is amazing. It's called Open Web UI and it's actually one of many web UIs you can get for Llama, but I think Open Web UI is the best.
Now Open Web UI will run inside a Docker container. So you'll need to have Docker installed and we'll do that right now. So we'll just copy and paste the Network Struck Academy commands. This is also available on the Docker website. The first step is to update our repositories and obtain the Docker GPG key. And then with a command we will install Docker and all its advantages. Ready, Set, Go. Yes, let's do it. And now, with Docker installed, we will use it to deploy our open web UI container. It will be a command that you can simply copy and paste.
This Docker Run command will pull this image to run this container from Open Web ui. You're looking at your local computer for the llama base, URL because you're going to integrate and use Llama and you're going to use the

host

's network adapter to make things nice and easy. Keeping in mind that this will use port 80 80 on whatever system you are using. And all we have to do is press Enter after adding some pseudo at the beginning, run the pseudo docker and let it do its job. Let's check it real quick. We will make a small pseudo-docker PD.
We can see that it is indeed running. And now let's log in. It's something exciting. Okay, let's go to our web browser and just type in local host colon port 80, 80, and wow, okay, it's really zoomed in. I'm not sure why yours shouldn't do that. Now, the first time you run it, you're going to want to click register here at the bottom and just put your stuff in. This login information is only relevant to this instance, this local instance, we will create the account. and we have logged in. Now, just so you know, the first account you log in or sign up with will automatically become an administrator account.
So right now, as a first-time user, you get the power. But look at this. How amazing is this? Let's play with that. So the first thing we have to do is select the model. I'll click on that drop-down menu and we should have a flame two. Awesome. And that's how we know our connection is working too. I'll go ahead and select that. And by the way, another way to check your connection is by going to the little icon here at the bottom left and clicking on settings and then connections. And you can see that our LAMA-based CRL is here.
If you ever have to change that for any reason. Now with LAMA two selected, we can start chatting and just like that, we have our own little chat, GBT which is completely local and this fool is beautiful and extremely powerful. Now, the first thing is that we can download more models. We can go out and call and see what they have available. Look at your models to see your list of models. Code Gemma is great. Let's try that. So to add the Gemma code, our second model, we'll go back to our command line here and type in the Alama Gemma extraction code.
Great, that's it. Once we've got that done, we can go up here and just change our model by clicking on the little drop-down icon at the top. Yes, there is gma code. We can change. And I've never actually done this before, so I have no idea what's going to happen. I want to click on my original LAMA two model. In fact, you can add another model to this conversation. Now we havetwo here. What is going to happen? So the Gemma code is the one that responds first. Actually, I'm not sure what that does. Maybe you can try it and tell me.
I want to move on though. Now, some of the crazy stuff you can see here, it's almost more outstanding than the GBT chat in some ways. You have plenty of options to edit your answers, copy them, like and dislike them to help you learn. You can also have it read things to you, continue responding, regenerate responses, or even just add things in your own voice. I can also go down here and this is crazy. I can mention another model and you will respond to this and think about it. Did you see that? I just got my other model.
Talk to my current one. That's just strange, right? Let's try to get them to have a conversation. They are going to have a conversation. What are they going to talk about? Let's bring back LAMA two to ask the question. This is very funny. I love this so much. Okay, anyway, I can spend all day doing this. We can also upload files with this plus sign. This includes many things. Let's try, do I have any documents here? I'll just copy and paste the content of an article, save it, and that will be our file. Summarize this. You can see our GPU in use here.
I love that. Running

locally

. Cool. We can also add images for multimodal models. I'm not sure a coma can do that. Let's try it real quick. So alama can't do it, but there is a multimodal model called lava. Let's bring that down real quick with extracted lava, let's go to our browser here. Once again, we'll update it and change our model to lava. Add the image. That's really scary. Here we go. That is very beautiful. Now, in a moment, I'll show you how we can generate images right here in this web interface using stable diffusion. But first let's play a little more.
And actually, the first place I want to go is the admin panel. For you, the administrator, we have a user and if we click on the top right, we have the administrator settings. This is where a ton of power first comes into play. We can restrict the registration of people. We can say enabled or disabled. Now, right now, by default it's enabled. That's perfect. And when they try to sign up initially, there will be a pending user until you are approved, let me show you. Now, real quick, if you want someone else to use this server on your laptop or computer or whatever, you can access it from anywhere as long as you have their IP address.
So let me sign up as a new user real quick just to show you. I'll open an incognito window, create an account and watch. He's saying: Hey, you have to wait. Your guy has to approve of you. And if we go here and refresh our page on the dashboard, there's Bernard's trick. Well, we can say, you know what? You are a user or click it again, you are an administrator. No, no, he's not. He is going to be a user. And if we check it again, boom, we have access. Now, the really cool thing is if I go to the admin settings and users, I can say, Hey, you know what?
Don't allow Chad's elimination, which is good. If I'm trying to monitor what my daughters do in their chats, I can also whitelist models. So you know what? You are only allowed to use LAMA two and that's it. So when you go back to the Bernard Hack Well session here, you should only have access to LAMA two. He's pretty sick and gets even better when you can create your own restricted models. We'll take you to the section called model files up here. And we'll click create a model file. You can also go to the community and see what people have created.
That is very beautiful. I'm going to show you what I've done for my daughter Chloe to stop her from cheating on me. She called her assistant Deborah. And here is the content. I'm going to paste it right now. The main thing is up here from where she says, and you choose your model. Then she calls two. And then you have the system message, which will be enclosed in three double quotes. And I have all this stuff that says what you can and can't do, what Chloe is allowed to ask. And it ends here with three double quotes.
You can do a few more things. I'm just going to say, as an educational assistant, save and create. Then I'll go through my settings one more time and make sure that for users, this model is whitelisted. I'll add one more. Debra Note that this is now an option. What if Bernard is going to try to use Debra and tell me Debra's article on the Civil War. And they immediately shut me up and I was like, Hey, that's cheating. Now Call two, the model that we are using, is fine. There is a better one called a mixed roll.
Let me, let me show you Terry. I'll use Deborah or Deb and say, write me an article about Benjamin Franklin. I notice he didn't write it to me, but he says he's going to guide me. And that's what I told him to do so that he could be a guide. I tried to pressure him and he said no. That's great. You can customize these prompts, put up some guardrails for people who don't need full access to that kind of thing right now. I think it's amazing. Now, OpenWeb UI has a few more features, but I want to move on to configuring stable broadcast.
This thing is so cool and powerful. Step three, stable diffusion. I didn't think generating images locally would be as fun or as powerful as GPT chat, but what's more, it's crazy. You have to see it. Now we will install Stable Broadcast with a UI called Automatic 1 1 1 1. So let's delete it. Now before installing it, we have some prerequisites and one of them is an amazing tool. I've been using a lot called PI ENV, which helps us manage our Python versions and switch between them, which is normally a big hassle. Anyway, the first thing we need to do is make sure we have several prerequisites installed.
Go ahead and copy and paste this from Network Check Academy. Let him do his thing for a while. And with the prerequisites installed, we'll copy and paste this command, a curl command that will automatically do everything for us. I love it. Run that. And then here it tells us that we need to add all of this or just run this command to put it in our RC bash file. Then we can use the EMV foot command. I'm just going to copy this, paste it, and then we'll type B RC source to update our terminal. And let's see if pi ENV works, PI ENV, we'll do an H script to see if it's up and running.
Perfect. Now let's make sure we have a working Python installation version for most of our stuff. We will do the installation of PI ENV three point 10. This will, of course, install Python three point 10, the latest version. Excellent Python three point 10 installed. We'll turn it into our global Python by typing PI ENV global three point 10. Perfect. And now we're going to install the automatic 1, 1, 1, 1. The first thing we will do is create a new directory M-K-D-A-R to create the directory, we will call it stable. And then we'll jump in there. Stable CD difference. And then we will use this W get command to get this BS script.
We'll write it Ls to make sure it's there. There is. Let's go ahead and make that sucker executable by typing CH mod. We'll do a plus x and then the sh web ui. It is now executable. Now we can run it. Slash and dot web ui sh. Ready, Set, Go. This is going to do many things. It will install everything you need for the open web UI. It will install PyTorch and download the stable broadcast. Is awesome. Again, a short coffee break. Well, that took a minute, a long time. I hope you had a lot of coffee.
Now it may look like it's not ready, but it's actually running and you'll see the URL pop up over here. It's a bit messy, but it runs on port 78 60. Let's try it. And this is fun. Oh my God. Local host 78 60, what you are seeing here is difficult to explain. Let me show you and let's generate, okay, got confused. Let me take away the MPA Lupa part from you. But this is not accelerating. That's how fast this is. No, that's a little terrible. What are you saying? We make it look a little better. Okay, that's scary. But it's just one of the many things you can do with your own AI.
Now you can download other models. Let me show you what it looks like on Terry and my new editor, Mike, tell me, do this. That's weird. Let's make it take more time. But look how fast this is. It's happening in real time as I'm speaking to you right now. But if you've ever created images with GT four, it will take forever. But I love the fact that this runs on my own hardware and is powerful stuff. Let me know in the comments below what your favorite image is, post it on Twitter and tag me. This is amazing.
Now, this is not going to be a deep dive into stable diffusion. I barely know what I'm doing. But let me show you very quickly how you can easily integrate automatic 1, 1 1, 1 1. Did I have to make enough? I'm not sure. And they have a stable diffusion within the Open Web ui. So it's right here in the Open Web ui. If we go down to our little settings here and go to settings, you'll see an option for images here. We can put our automatic base URL 1 1 1 1, which will just be HTTP colon whack wack 1 2 7 0 0 1, which is the same as saying Local Host Port 78.
What is it? 0 6 60 60 I think it is what it is, we'll hit the refresh option here to make sure it works. And actually no, it wasn't like that. And here's why. There is one more thing you should know. Here we have OpenWeb UI running in our terminal. The C head control will prevent it from working. To make it work with an open web UI, we need to use two switches to make it work. So let's go ahead and run our script once again. Open the web ui or sh web ui. And we will do Dash Listen and Dash API once we see the URL appear.
Okay, great, it's working. We can come back here and say: Why don't you try again, friend? Perfect. And then here we have experimental imaging. They are still testing it. We'll say go ahead and we'll say save. So if we go to any message, let's make a new chat and we'll chat with caller two. I will say, it describes a man in a dog suit. This is for a stable broadcast message. A little wordy for my taste. But then notice that we have a new icon. This is so cool. Boom. An image icon. And all we have to do is click on that to generate an image based on that message.
I clicked on it, it's doing it. And there it is, right online. That's so cool. And that's really scary. I love this. It's so fun. This video is getting too long, but there are still two more things I want to show you. I'm going to do it real quick right now. The first is that it is simply magical. Check it out. There's another option here within the Open Web ui, a little section here called Documents. Here. We can simply add a document. I'll add that one from before it's available to us. And now, when we have a new chat, I'll talk to Code Gemma.
All I have to do is create a hashtag and say, let's talk about this and say, give me five bullet points about this. Cool. Give me three social media posts. Well, go Gemma. Let me try again. What just happened? Okay, let's make a new accessory. Ah, there we go. And I'm just scratching the surface. Now the second thing I want to show you, the last thing. I'm a big obsidian nerd. It's my notes app. It's what I use for everything. It has been very recent. I haven't made a video about this, but I plan to. But one of the cool things about this local private note taking app is that you can add your own local GBT to it, like what we just implemented.
Look at this. I'm going to go to settings. I'll go to the community plugins. I'll look for one. I'm going to look for one called B-M-O-B-M-O Chatbot. I'm going to install that, enable it. And then I'm going to go to settings. I will have BMO chatbots. And right here I can have an Alama connection, which will connect to, let's say, Terry. Then I'll plug them into Terry and choose my model. I'll use Call Two, why not? And now, here in my note, I can have a chatbot come to the side and say, Hey, how's it going?
And I can do things like look at the help file and see what I can use here. Oh, activate the reference. So I'm going to say reference in, now it's going to reference the current note that I'm on. Tell me about the system message. Yes, there it is. And it's actually reviewing and informing me about the grade I'm on. So I have a chatbot right there, always available to ask questions about what I'm doing. And I can even go in here and highlight this, do a little message, select generate, it's generating right now and just generate some things for me.
I'm going to undo that. Let me make another note. That's why I want to tell a story about a man dressed as a dog. I'll quickly talk to my chatbot and start doing some things that are pretty crazy. And I think this to me is just a taste of how to run local AI privately in your home on your own hardware. This is actually very powerful and I can't wait to do more things with this. Now. I'd love to hear what you've done with your own projects. If you tried this, if you have it running in your lab, please let me know in the comments below.
Also, do you know of any other projectsinteresting that I can try and make a video about? I'd love to hear that. I think AI is the best, but also privacy is a big concern for me. So being able to run the AI ​​locally and play with it like this is the best thing ever. Anyway, that's all I have. If you want to continue the conversation and talk more about this, check out our Discord community. The best way to join is to go through our free Network Check Academy membership. And if you want to join the paid version, we've got some extra stuff for you there too, which will help support what we do here.
But I would love to hang out with you and talk more. That is all that I have. I'll see you next time. YO.

If you have any copyright issue, please Contact