A lot of projects that revive old hardware feel like an excuse to tinker rather than something you’ll actually use. Turning old PCs into NAS boxes or DNS resolvers can be genuinely practical, but many other home lab projects are often abandoned by their creators, or were only spun up as an experiment in the first place.
I wouldn’t blame you for thinking that local AI in the form of an LLM could have the same fate written all over it, and I thought the same thing until I set one up for myself. Even with my relatively modest hardware, spinning up a local AI assistant was easier than I thought, and was genuinely useful.
Why set up a local LLM
Control, privacy, and customizability
The obvious appeal of a local LLM is privacy. When you run everything on your own machine, nothing you paste in leaves your network. That means logs, configuration files, error output, half-finished drafts, and essentially all other data you put in stays local. For anyone even slightly concerned about their footprint, that alone is compelling.
There’s also the cost angle. When using an old gaming rig you have lying around, the cost of setup is essentially zero. It might add a bit to your power bill, but there are ways to mitigate that, and all-in-all, it’s not as expensive as you might think, especially if you compare it with a monthly subscription.
To me, the most interesting and compelling reason to set up a local LLM is the customizability. You dictate every part of the experience, down to the responses you get. You get to choose the model and all the parameters it uses. This can be quite overwhelming, but you don’t actually need to get into the nitty-gritty if you don’t want to.
Getting started
There are “soft” hardware and software requirements
In terms of a GPU, anything with more than 8 GB of VRAM is a good place to start. Older Pascal-era Nvidia cards are perfectly usable in this application. In my situation, I have a dusty ASUS STRIX GTX 1070 in my repurposed gaming rig. CPU performance matters far less for raw inference (actually generating the words you see, or “tokens” as they’re known), but it does matter for larger prompts and context windows, especially if multiple people are using the system at the same time. I have an i7-6700K in my system.
In regard to RAM, the same rules as VRAM apply here: more is better. I have 16 GB in my setup, which is workable for most setups, especially running one model at a time. For storage, I have an 500 GB NVMe drive that’s been diced up among my Proxmox VMs, but I’ve allocated 200 GB here, which is overkill if you’re not running many models, but I plan to try many different ones, and want the space to do so. I recommend allocating similar amounts of storage if you don’t already have a model in mind.
At the core, you need a local LLM runtime, some kind of model manager, and optionally, a web interface to interact with the LLM from other devices. You’ll be best off running some sort of server OS and running in a VM inside of a hypervisor like Proxmox, and hosting it from there. This makes the most sense for an old gaming PC you plan on turning into a home lab.
How I turned my old PC into the ultimate LAN party game server
You don’t need a particularly powerful PC to become a game server.
Initial setup and choosing a model
Incredibly simple setup
To begin, I installed Ubuntu Server and began configuring GPU passthrough with Proxmox. This required installing the Nvidia driver manually and ensuring that I added it as a PCI device in the VM settings, and being sure to enable IOMMU in the BIOS, which is needed to pass PCI devices through a hypervisor. In my case, the setting was hidden well, and marked at VT-d. On AMD systems, it may be marked as SVM + IOMMU.
After using the nvidia-smi command to ensure my GTX 1070 was being passed through correctly, I installed Ollama, which is going to act as my LLM engine.
For a model, I decided on the latest Mistral model, which is relatively lightweight, and performs well on older GPUs like my own. There are so many to try, I recommend giving the Ollama site a browse. Do note that any models marked as “cloud” are sending prompts off of your device to a remote server, and are not truly local.
To load my model, I simply typed the Ollama run command, followed by the model name:
I now have a working, local LLM inside of a Proxmox VM.
Setting up the Web UI
A docker container is all you need
To access this from a web UI, I recommend setting up Open WebUI in a Docker container. To do so, I used the following configuration:
version: "3.9"
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- open-webui-data:/app/backend/data
restart: unless-stopped
volumes:
open-webui-data:
You’ll also need to create an override file that binds Ollama so that it’s reachable from your Docker container as well as other local devices. This is fine if you don’t plan to expose it to the Internet, but it does expose it to anyone connected to your network. If you want to shield it from specific devices and services, you’ll have to specify those yourself.
Once that’s done, once spinning up the Docker container, you should be able to access your web UI at VM_IP:3000 in your web browser if everything is set up correctly.
5 Docker mistakes beginners make in their first month
We’re all guilty of making them
A safe place for your documents, notes, and data
It won’t be a full cloud-model replacement, but a local LLM is great for parsing through collections of data, simple one-off queries, or monotonous text-based tasks. In my case, I use it for giving me a preliminary scanning of error logs, pointing me in the right direction to troubleshoot the issue, and is actually far faster and more useful than scanning forums or actually scanning the log files myself.

