DIY AI Infrastructure: Build Your Own Privacy-Preserving AI at Home

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In this enlightening video by IBM Technology, viewers learn how to host AI models at home, preserving privacy and gaining control over personal data. Robert Murray shares his DIY approach to building an AI infrastructure without high-end server farms, using Windows 11, WSL2, Docker, and a NAS system. The discussion covers system requirements, security measures like VPN and multi-factor authentication, and the benefits of using open-source models for transparency and adaptability.

Highlights

Martin and a colleague delve into making AI more personalized and privacy-focused from home. 🤓
Robert Murray's DIY AI setup uses Win 11, WSL2, and Docker to manage AI models like Llama and Granite at home. 🚀
Open WebUI is used for creating a chat UI, making human-AI interaction easier from any device. 🖥️
Security strategies include a home NAS system, VPN, and a multi-factor authentication setup. 🔒
The talk highlights the importance of open-source components for transparency and control in AI systems. 🌐

Key Takeaways

AI at home is more accessible than ever with tools like Docker and NAS systems, allowing for DIY privacy-preserving solutions. 🔧
Robert Murray demonstrates how you can build your own AI infrastructure without needing high-end servers. 🖥️
Open-source models like Llama and IBM's Granite provide many options while maintaining transparency and flexibility. 📂
Security is paramount: using private data stores, VPNs, and multi-factor authentication helps safeguard your AI activities. 🔒
Exploring open-source and home-hosted AI gives you better data control, avoiding the pitfalls of cloud-based models. 🌐

Overview

In the ever-evolving world of artificial intelligence, this video unveils the fascinating prospect of building your own AI infrastructure from the comfort of your home. IBM Technology introduces Robert Murray, an innovator who has diligently constructed a privacy-preserving AI system without resorting to extravagant hardware setups. His configuration employs Windows 11, WSL2, and Docker to run sophisticated models like Llama 3 and IBM's Granite.

Robert’s approach is refreshingly hands-on, utilizing open-source AI models downloaded from platforms like Ollama.com. The use of Docker containers not only aids in efficient model management but also provides a UI via Open WebUI, facilitating easy interaction through a simple web browser interface. His setup offers a peek into personal data sovereignty, connecting users to their AI via a VPN-enabled smartphone, enhancing access and flexibility.

An important aspect discussed is the comprehensive security measures believed necessary to protect personal data from unauthorized access. This includes running AI solutions on personal hardware, deploying a NAS system for secure data storage, and utilizing a combination of VPN and multi-factor authentication to secure remote access. This video demonstrates that with a bit of DIY spirit and technical know-how, cutting-edge AI is well within the reach of home tinkerers and privacy enthusiasts alike.

Chapters

00:00 - 00:30: Introduction to AI in Everyday Life This chapter discusses the pervasive presence of AI in modern life, focusing on its ability to understand and respond to natural language. The dialogue highlights how AI systems can personalize experiences, exemplified by a chatbot that provides expert car-buying advice, assists in evaluating different types of vehicles (gas, hybrid, and EV), and even helps in finding rebates that offer financial benefits. The chapter portrays AI as a user-friendly tool that can simplify decision-making processes.
00:30 - 01:00: DIY AI Project Introduction The chapter introduces the concept of DIY AI projects for tech enthusiasts who enjoy experimenting and building their own setups. It highlights the example of Robert Murray, a colleague who successfully hosted an AI instance in his home office without needing a large GPU server farm. The chapter sets the stage for learning how to create a personal AI setup with accessible resources.
01:00 - 01:30: Building AI Infrastructure at Home This chapter describes how Robert has built infrastructure to host AI models at home, using his personal computer. Rather than relying on cloud-hosted AI models, Robert has developed his own setup, enabling him to run AI models like Llama 3 and IBM's Granite locally. The chapter begins with Robert describing his computer setup, starting with Windows 11 as the operating system.
01:30 - 02:00: Technical Setup: OS, WSL2, Docker The chapter discusses setting up a technical environment with Windows 11, WSL2, and Docker. It begins by acknowledging Windows 11 as the base operating system. The chapter then introduces WSL2 (Windows Subsystem for Linux 2) as a way to run a Linux environment within Windows, describing it as Linux on Windows. There's a virtualization layer involved, which leads into an explanation about Docker, a containerization platform, running on top of this setup. The setup allows for a flexible and powerful development environment combining the strengths of both Windows and Linux.
02:00 - 02:30: Acquiring AI Models The chapter discusses the acquisition of AI models from various sources, highlighting Ollama.com as a primary source. It mentions specific models such as IBM's Granite and Llama, and notes the availability of many open-source models for use.
02:30 - 03:00: Command Line Operation and UI Integration In this chapter titled 'Command Line Operation and UI Integration', the focus is on the utilization of Docker machine with Windows 11 and WSL2 to run models downloaded from Ollama. The chapter explores the capability of operating directly from the command line, allowing users to access models like Llama or Granite quickly and efficiently. Furthermore, it discusses the common occurrence of using a chat interface or UI for cloud-hosted AI models and delves into how a user interface can be integrated using Docker containers.
03:00 - 03:30: Remote Access with VPN The chapter titled 'Remote Access with VPN' explores the use of Docker containers, emphasizing the application of Open WebUI for user interface needs. The speaker describes Open WebUI as an effective and easy-to-use solution that allows users to choose models and send requests through a browser. The narrative includes a personal account of working from home using this setup. The discussion then transitions to the need for remote access when traveling, suggesting the use of an additional Docker container to address this requirement.
03:30 - 04:30: Hardware Requirements and Performance The chapter discusses the setup of a VPN container configured with a custom domain name. The focus is on enhancing accessibility, allowing the user to access their system via phone or any internet connection, highlighting the convenience and flexibility of the system.
04:30 - 05:00: Self-hosted Data Handling The chapter discusses the system requirements for hosting a server, specifically focusing on RAM and storage. The minimum recommended RAM is 8 gigabytes, but the speaker notes that they are using 96 gigabytes in practice. For storage, at least one terabyte is recommended due to the large size of some models.
05:00 - 06:00: Security and Privacy Measures The chapter 'Security and Privacy Measures' discusses different models and their parameter sizes, specifically mentioning Granite and Llama models, which range between 7 to 14 billion parameters, and references running models up to 70 billion parameters, noting that such large models operate slowly. There is also mention of GPUs in the context of system requirements for these models.
06:00 - 10:00: Conclusion and Open Discussion The chapter discusses the importance of having more GPUs for improved performance. The speaker reflects on the need to provide documentation to large language models (LLMs) when engaging in discussions or tasks involving documents. Concerns about using cloud-based models that require uploading sensitive documents to external servers are raised. The speaker hints at the development of a self-contained solution that likely addresses these concerns, though details of the solution are not provided.

DIY AI Infrastructure: Build Your Own Privacy-Preserving AI at Home Transcription

00:00 - 00:30 Martin, it seems like AI is everywhere these days. Finally, we have a computer that actually understands my language instead of me having to learn its language. A system that understands me. For instance, what if I'm looking to buy a new car and I need to do some research on the alternatives? Yeah, you could tell the chatbot to act as a car expert and then you can ask it, what would be the difference in cost to operate a gas powered car versus a hybrid car versus an EV car and then get guidance on the decision. And if it helped me find a rebate from the power company, it could pay for itself in just one instance,
00:30 - 01:00 and if I enjoyed tinkering and DIY projects, wouldn't it be cool to learn how the technology works and host my very own instance of all of this? Yeah, very cool. And in fact, we have a colleague, Robert Murray, who has done just that with equipment in his own home office. Wait, you mean without a server farm of GPUs that dim the lights every time you ask it to do something? Absolutely. So let's bring him in to tell us how he did it.
01:00 - 01:30 Today, requests to Generative AI typically connect to an AI model hosted somewhere on a cloud, but Robert here has built an infrastructure to host AI models like Llama 3 and IBM's Granite on his own personal infrastructure. So Robert, I want to understand how you did this. Absolutely. o let's start with this box, which represents your computer at home. So tell me sort of the stack that you built here. Sure. So I started with Windows 11.
01:30 - 02:00 All right, so it's just a straight up. Because I have it. Yeah, ok. That was the reason, just because it's there. It's there. OK, so you've got Wins 11, and then what's on top of that? Well, I unleashed WSL2. Now you're gonna have to tell me what WSL2 does. It's basically Linux on Windows. I'm going to think that there's probably a virtualization layer coming. Yes, there definitely is and that is Docker. Ok, Docker is running on top of all of this.
02:00 - 02:30 Now, we need some AI models. So where did you get your AI models from? I pulled them down from Ollama.com. OK, so if we take a look at the AI models, what are some of the models that you actually took? Oh, so I started with Granite. Right, IBM's granite model, yeah. Llama, and there's so many other models that you can pull down. Yeah. They're there, Open source. A whole bunch of open source models.
02:30 - 03:00 Okay, so we've got a Docker machine here with Windows 11, WSL2. You've downloaded these models from Ollama. Is this now the solution? Well, I actually can use this. I can run all this right from the command line. Wow. Okay. So you can open a terminal window and then start chatting with Llama or Granite. Yes. Very, very fast. But most of the AM models that are cloud hosted, you do that on a chat interface, a UI. So how are you able to add a UI to all of this? Docker containers.
03:00 - 03:30 Ah, okay, all right. So let's put some Docker containers in. What did you have for the UI? I used Open WebUI. It's one of the many solutions that a person could use, but I found this to be extraordinarily helpful. Ok. It's easy to use. Yeah! So with Open WebUI, you can just open up a browser and then chat with the model, pick the model you want, and send requests to it. And there I was, and that's what I was working with for a long time right out of my home. But what if you're on the go? Well, that's where another container comes in.
03:30 - 04:00 Okay, what have you got here? So it's a VPN container configured with my own domain. All right, so what can access this guy? This. Ah, ok, your phone. So now I am able to access my system from my phone or basically any internet connection. It's awesome. How very cool. All right, well, let's say that
04:00 - 04:30 I wanted to actually replicate what you've done here and build it. I'm gonna ask you about this server itself. What are the system requirements? So let's start with RAM. How much RAM do I need for this? I would recommend at least 8 gigabytes. 8 gigabytes. That's not much. How much do you actually use? Well, I'm using 96 OK, slightly above the minimum requirement. Absolutely. All right, so let's RAM. What about storage? Storage, I would recommend having at least one terabyte. OK, because some of these models can get pretty big.
04:30 - 05:00 Yes, they can. Now, these models come in different sizes. So what's parameter count sizes we're using with Granite and Llama? I'm using anywhere between 7 and 14 billion parameters. 7 to 14 billion, okay. I have run up to 70. 70? How did that work out? Slow. I can imagine. OK. So the other thing that people often talk about in terms of system requirements are GPUs. So should I be using GPUs for this?
05:00 - 05:30 Well. My initial configuration. I had no GPUs, but more GPUs the better The more, the better, right. So, we've got this self-contained solution now, and it's got me thinking that when I talk to a large language model, I often want to provide it documentation in order to chat with that document. Absolutely. Now, if I'm using a cloud-based model, I need to take my document and upload it to somebody else's server so that the AI model can see it. I take it that you have a better solution to that.
05:30 - 06:00 I do. I use my own NAS system. Okay, so you have a NAS server setup. And from that NAS system, I pull in my documents, pull them into the open web UI, and chat away. And I'm doing it every single day. So Robert, the other thing I like about this architecture is at least to my mind, this looks like a really secure solution. Hold the phone there just a second, nice job AI guy, but let's really look at the security on this Robert. First of all, I think it is a good job here and I think you've put in some features that will help preserve security and privacy,
06:00 - 06:30 but let's take a look at what some of those are because what you don't want is your data is our data, we want your data is your data, not your data is our business model. So how do we make sure that we're not falling into the same trap that a lot of those other chatbots that are free apps on the app store that you can download aren't falling into. Well, first off, I put it on my own hardware.
06:30 - 07:00 Yeah, exactly. So I see that very clearly. It's on your hardware, so you control the infrastructure. You can decide when to turn the thing on and off. It's your data on your system. So that's the first point. Absolutely. Yeah, and then also it looks like that you included a private data store. So now it's not your information is training somebody else's model, and you're pulling information that might be poisoned or anything like that. You have some control over that as well. Yes, and interesting enough, that's what's actually got me started on this whole path.
07:00 - 07:30 By having a NAS, I wanted my data to be my data. And data is the real core of an AI system anyway, so that makes a lot of sense. Also, I noticed some open source components. So you've got one right here, you've got open source models here as well. And that's a good idea, because instead of proprietary stuff, in these cases, at least we have an idea that the worldwide open source community has had a chance to look at this and vet it. Now granted, there's a lot of information to be vetted, so it's not trivial, no guarantees.
07:30 - 08:00 Maybe it's a little more secure because more people have had a chance to look at what's actually happening under the covers. Agreed. And then also I notice you want to be able to access this from anywhere, which is one of the really cool aspects and we want to make sure that that access is also secured. So I see you put a VPN over here so that you can connect your phone in and do that securely. And how are you making sure everybody else in the world can't connect their phone in here as well?
08:00 - 08:30 multi-factor. multi-factor authentication, and now we know it's really you, and we know the information is exchanged in a secure way. So a lot of features that you put in here, I think it's a nice job. Thank you. Yeah. And one other thing to think about, because these components, we really don't know what all of them would do, it is still possible that one of these things could be phoning home and sending data to the mothership, even without your knowledge.
08:30 - 09:00 So one of the things that might be useful is put a network tap on your home network, and then that way you could see if there are any outbound connections from this, because there shouldn't be based upon the way you've built that. Well, that's a really great idea, Jeff. I'm going to have to look into that. Okay, there you go with the improvements for version two. Hey, Jeff. Oh, hey, Martin. Nice to have you back. Yeah, it seems like Robert's really done some nice work with this, don't you think? For sure. It just goes to show that you can now run sophisticated AI models on a home computer to build a personal chatbot.
09:00 - 09:30 Yeah. Something like that would have been science fiction just a few short years ago, but now it's available to anyone who really wants to spend the time to assemble it all. Right. and you'd like. So much more about a technology by really digging into it and getting your hands dirty with it. Yeah, and by the looks of your hands, you've been doing a lot of digging because those things are filthy, and the added bonus is that you end up with a better assurance that your data is your data because you have more control and you can ensure that privacy is protected in the process. Spoken like a true security guy that you are, Jeff.
09:30 - 10:00 All right, so you've seen Robert's approach. So how would you, dear viewer, do anything differently to make the system even better? Let us know in the comments.