I’m going to tell you something that would have sounded absolutely insane five years ago: I’m running artificial intelligence on a computer the size of a lunch box, it works offline, my data never leaves my house, and it costs me nothing beyond the electricity to keep it running.
No monthly subscription. No API fees. No sending my private documents to some server farm in Virginia. Just me, a Mac Mini M1, and a free and open-source software called Ollama that has quietly become one of the most important pieces of software I’ve used — and I say that as someone who has been reviewing software on this site since 2007.
If you’ve been curious about running AI locally but thought you needed a $5,000 GPU rig and a computer science degree, this post is for you. I’m going to walk you through exactly how I set up my local AI hub, what I use it for, and why I think every tech enthusiast should consider doing the same.
Why Local AI? Why Now?
Let me give you some context. Like most people, I’ve been using cloud-based AI tools — ChatGPT, Claude, Gemini — and they’re incredible. But there are situations where sending your data to a cloud service isn’t ideal.
When I’m working on business documents for my offline ventures, I don’t necessarily want those financial projections living on someone else’s server. When I’m brainstorming ideas for my apps, I’d rather keep those early concepts private. When I’m processing data for my web projects, I want the flexibility to run queries without worrying about rate limits, usage caps, or monthly bills that scale with every prompt.
The privacy argument alone is compelling, but it’s not the only reason. Local AI is also fast — there’s no network latency, no waiting for servers, no “we’re experiencing high demand” messages. It works offline, which means I can use it on a plane, in a coffee shop with terrible Wi-Fi, or during one of those delightful Philippine internet outages that build character.
And perhaps most importantly for a guy who has been writing about free and open-source software for nearly two decades: local AI puts the power back in your hands. You own the hardware, you own the model weights, and there are no terms of service to violate. That’s the open-source philosophy I’ve been preaching since my Linux days, applied to the most transformative technology of our generation.
What You Need (Less Than You Think)
Here’s my setup:
Hardware: Mac Mini M1 (8GB Unified Memory)
That’s it. That’s the hardware. No dedicated GPU. No server rack. No liquid cooling. An old Mac Mini M1 — the base model with just 8GB of RAM — that I bought a few years ago and that sits quietly on my living room table consuming roughly the same power as a light bulb.
Now, let me be upfront: 8GB is the bare minimum for local AI. It’s not ideal. After macOS takes its share of memory (roughly 3-4GB for the operating system and background processes), you’re left with about 4-5GB of usable space for AI models. That means the popular 7B and 8B parameter models that most guides recommend are either too tight to run comfortably or will cause constant memory pressure and slowdowns on my machine. I learned this the hard way after watching my Mac Mini struggle and swap memory like it was reliving its Intel days.
But here’s the thing — you don’t need the biggest models to get genuinely useful results. The smaller models, in the 1B to 3.8B parameter range, run beautifully on 8GB machines. They’re fast, responsive, and for many everyday tasks, surprisingly capable. Are they as good as GPT-4 or Claude? Not even close. But for quick drafts, summarization, code snippets, brainstorming, and general Q&A, they get the job done without sending a single byte of your data to the cloud.
The secret sauce that makes even my base model Mac Mini viable is Apple Silicon’s unified memory architecture. Unlike traditional PCs where the CPU and GPU have separate memory pools, the M1’s unified memory means the GPU can directly access whatever RAM is available for AI inference. Even with just 8GB, the M1’s efficiency means small models can generate tokens at 30-60+ tokens per second — fast enough that responses feel nearly instant.
Could you do this on a Windows PC or a Linux machine? Absolutely. If you have a desktop with an NVIDIA GPU (even a used RTX 3060 for around $150), you’d get excellent performance with even bigger models. But for Mac users with older Apple Silicon hardware gathering dust, Ollama gives that machine a second life.
Minimum specs to get started:
Any Apple Silicon Mac (M1 or newer) with 8GB of RAM can run small models (1B-3.8B parameters). Think of these as quick, lightweight assistants good for summarization, simple coding help, and general Q&A. With 16GB, things get significantly better — you can comfortably run 7B-8B models at good speed and even some 14B models. With 32GB or more, you’re in serious territory — running models that rival cloud-based services for many tasks.
On the PC side, 16GB of system RAM plus a GPU with at least 8GB of VRAM is the sweet spot. More VRAM means bigger, better models.
Installing Ollama: Easier Than Installing Most Apps
Ollama is the foundation of my local AI setup. It’s a free and open-source tool that handles downloading, managing, and running large language models with absurd simplicity. If you can type a command into a terminal, you can run local AI.
Step 1: Install Ollama
On Mac, you have two options. The easiest is to download the app directly from ollama.com. (). Download the DMG, drag it to Applications, and launch. Done.
If you prefer Homebrew (and if you’re a developer, you probably do):
brew install ollama
On Linux:
curl -fsSL https://ollama.com/install.sh | sh
On Windows, simply download the installer from the Ollama website.
That’s the entire installation. No Python environment management. No dependency hell. No CUDA driver nightmares. It just works.
Step 2: Pull Your First Model
Open your terminal and type:
ollama pull llama3.2:3b
This downloads Meta’s Llama 3.2 3B model — one of the best small open-source language models available and the sweet spot for 8GB machines. It’s about 2GB on disk and runs comfortably without choking your system.
If you want something even lighter to start with:
ollama pull phi4-mini
Microsoft’s Phi-4 Mini (3.8B parameters) is another excellent choice for 8GB systems — strong instruction following and surprisingly good at code for its size.
Step 3: Start Chatting
ollama run llama3.2:3b
That’s it. You now have a local AI assistant running entirely on your machine. Ask it questions, have it summarize text, help with code, draft emails — whatever you need. Type your prompt, get a response. No account required. No internet required after the initial download.
The first time I ran this and got a coherent, helpful response from a model running entirely on my Mac Mini, I had the same feeling I had back in 2007 when I first booted Ubuntu and realized an entire operating system could be free. That feeling of “wait, this actually works, and it’s free?” — that’s the open-source magic I’ve been chasing for nearly 20 years.
The Models I Actually Use on 8GB
Ollama gives you access to a growing library of models. Here are the ones that work well on my 8GB Mac Mini and what I use them for:
1. Llama 3.2 3B — My go-to daily driver. This is the model I reach for most often. For a 3B model, the quality is genuinely impressive — it handles summarization, drafting, general Q&A, and brainstorming surprisingly well. On my M1, it runs at roughly 30-50 tokens per second, which means responses feel nearly instant. It’s the perfect balance of quality and speed for an 8GB machine.
2. Phi-4 Mini (3.8B) — My coding companion. Microsoft’s Phi-4 Mini punches well above its weight for code generation and technical tasks. When I’m working on my iOS apps or web projects and need a quick SwiftUI snippet, JSON formatting help, or a debugging nudge, this model delivers at around 15-20 tokens per second. It won’t replace Claude for complex architecture decisions, but for quick code help during focused development sessions, it’s remarkably useful.
3. Gemma 2B — My speedster for trivial tasks. Google’s smallest Gemma model is ultra-lightweight and blazing fast. I use it for simple reformatting, quick translations, and tasks where I just need a fast answer and don’t care about nuance. Think of it as the Puppy Linux of language models — tiny, fast, and gets the basics done.
4. Llama 3.2 1B — My offline emergency model. At just around 1.3GB, this model loads almost instantly and runs so fast it feels like autocomplete. The quality is basic, but when I need something working on minimal resources or want to run alongside other applications without memory pressure, it’s there.
Here’s the honest truth about running local AI on 8GB: you’re operating within constraints. Multi-turn conversations get noticeably weaker after several back-and-forth exchanges because the limited memory means shorter context windows. Complex reasoning tasks will sometimes produce mediocre results. And you’ll occasionally notice responses that are clearly “smaller model quality” compared to what you get from cloud services.
But for single-turn tasks — summarize this, draft that, reformat this JSON, explain this concept, help me with this code snippet — these small models are fast, private, and genuinely useful. It’s like having a competent junior assistant who works for free and never sleeps.
To switch between models, I just run a different command. Different models for different jobs — just like how I used to keep different Linux distros for different purposes back in my distro-hopping days.
Adding a Proper Interface: Open WebUI
Running Ollama from the terminal is fine for quick tasks, but for extended sessions, it gets clunky. You lose chat history, you can’t easily compare models, and scrolling through terminal output isn’t exactly a delightful user experience.
Enter Open WebUI — a free, open-source web interface that connects to Ollama and gives you a ChatGPT-like experience running entirely on your local machine.
If you have Docker installed, the setup is one command:
docker run -d -p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui --restart always \
ghcr.io/open-webui/open-webui:ollama
Open your browser, go to `http://localhost:3000`, create an account (this is local — nobody else sees it), and you’re in. Every model you’ve pulled with Ollama automatically appears in the interface.
Open WebUI is where the magic really happens. You get persistent chat history so you can pick up conversations where you left off. You can switch models mid-conversation to compare outputs. There are system prompt templates, temperature controls, and per-chat configuration settings. You can upload documents and use RAG (Retrieval Augmented Generation) to ask questions about your own files — PDFs, text documents, code files. It even supports web search integration, image generation, and voice input.
The interface looks and feels remarkably similar to ChatGPT, except everything is running on your own hardware. No cloud. No subscription. No data leaving your network.
I access Open WebUI from my Apple devices like my MacBook, my iPhone, and my iPad — all pointing to the Mac Mini sitting quietly on my living room table. It’s like having a private ChatGPT server for my household.
My Actual Workflows
Let me get specific about how I use this setup in real life, because “run AI locally” sounds cool in theory but means nothing without practical application.
1. For my blog (this site). When I’m researching topics for TechSource, I’ll dump my raw notes into a chat, ask the local model to identify the most interesting angles, suggest outlines, or flag gaps in my research. The model doesn’t write the posts for me — my writing voice is my own — but it’s an incredibly useful brainstorming partner.
2. For my iOS apps. I use Phi-4 Mini for quick SwiftUI help, JSON formatting, and debugging. Having a coding assistant that responds in under a second with no internet dependency is genuinely useful during focused development sessions.
3. For my offline businesses. I process business documents, draft communications, and analyze data without any of that information touching a third-party server. This is the use case where local AI’s privacy advantage matters most.
4. For website automation. I’ve built an automated pipeline that scrapes information from various sources and publishes curated content to my niche site. Ollama plays a role in processing and formatting that data. Having this run locally means the pipeline works even if my internet connection is spotty.
5. For learning. I feed technical articles, documentation, and research papers into the RAG system and then have conversations with the content. It’s like having a study partner who has perfect recall of everything you’ve uploaded.
How Does Local AI on 8GB Compare to ChatGPT and Claude?
I’m going to be honest with you, because that’s what TechSource has always been about.
On an 8GB machine running 3B models, local AI handles roughly 60-70% of the simple tasks I’d otherwise use cloud AI for. Summarization, quick drafts, code snippets, reformatting, basic Q&A — the small models get these done fast and privately.
For the remaining 30-40% — complex multi-step reasoning, nuanced creative writing, deep code architecture analysis, long conversations that require extensive context, and tasks requiring broad world knowledge — cloud models like Claude and GPT-4 are in a completely different league. There’s no sugarcoating this. My 3B model running locally isn’t competing with a 400B+ parameter model running on a data center full of A100 GPUs. That would be like comparing my Raspberry Pi to a supercomputer.
But that’s not the point. My approach is hybrid: local for privacy-sensitive work, quick tasks, and offline use. Cloud for complex, high-stakes tasks where quality matters more than privacy. The two complement each other perfectly. And if I ever upgrade to a Mac with 16GB or more RAM, those 7B-8B models become available and the quality gap narrows significantly.
What This Costs
Let’s do the math, because this is one of my favorite parts.
My setup costs:
Mac Mini M1 8GB (already owned and started to gather dust in my drawer): $0 additional cost. If buying used today, base M1 Mac Minis go for roughly $250-350 on resale markets — they’ve depreciated significantly, which makes them incredible value for a dedicated local AI server.
Ollama: Free, open-source.
Open WebUI: Free, open-source.
All AI models: Free, open-source.
Electricity: My Mac Mini draws about 20-39 watts during AI inference. Running it 8 hours a day costs roughly $2-3 per month in electricity.
Total monthly cost: About $3.
For comparison, ChatGPT Plus is $20/month. Claude Pro is $20/month. Running API calls at scale can easily cost $50-100+ per month depending on usage.
Even with the limitations of 8GB, my local setup handles enough daily tasks to reduce my reliance on paid subscriptions. Over a year, that adds up to meaningful savings — while giving me unlimited usage, complete privacy, and offline capability.
Tips I’ve Learned the Hard Way
After months of running this setup daily on constrained hardware, here are some practical lessons:
1. RAM is king. No, seriously. On 8GB, every megabyte counts. Close unnecessary applications before running models. Safari with 20 tabs open and Xcode running simultaneously will leave almost nothing for Ollama. I’ve learned to treat my AI sessions like focused work blocks — close everything else, then chat.
2. Smaller models, faster results. Don’t try to squeeze a 7B model onto an 8GB machine. I tried. It technically loads, but the constant memory swapping makes it painfully slow and the system becomes unusable for anything else. Stick to 3B and under for a smooth experience. A fast 3B model that responds instantly is infinitely more useful than a struggling 7B model that takes 10 seconds per response while your fans sound like a jet engine.
3. The 60-70% rule. Your model file should be no more than 60-70% of your total available memory (after macOS takes its share). On 8GB, that means model files of about 2-3GB maximum. This leaves enough room for the operating system, the context window (KV cache), and Ollama’s overhead.
4. Set Ollama as a network service. By default, Ollama only accepts connections from the local machine. If you want other devices on your network to access it (like I do with my MacBook and iPad), set the environment variable `OLLAMA_HOST=0.0.0.0` to allow connections from your local network. Just don’t expose it to the internet without authentication.
5. Different models for different jobs. I keep three to four small models installed and use them contextually. Phi-4 Mini for code, Llama 3.2 3B for general tasks, and Gemma 2B for quick throwaway queries. Specialization matters, even at the small model tier.
6. Keep an eye on model updates. The open-source AI community moves incredibly fast. Small models are improving at a staggering rate — the best 3B model today is dramatically better than the best 3B model from even six months ago. Check Ollama’s library periodically for new models. Pulling an update is just `ollama pull model-name`.
7. Plan your upgrade path. If local AI clicks for you (and I think it will), the single best upgrade you can make is more RAM. A used Mac Mini M1 with 16GB runs 7B-8B models comfortably and the quality jump from 3B to 8B is enormous. Consider it the best investment in your local AI future.
The Bigger Picture: This Is the Open-Source Revolution, Again
I started this site in 2007 writing about Linux because I believed free and open-source software could change the world. It did — Linux now powers 100% of the world’s top 500 supercomputers, 77% of web servers, and roughly half of all cloud workloads.
Now I’m watching the same thing happen with AI. Open-source models like Llama, Mistral, Qwen, Phi, Gemma, and DeepSeek are making AI accessible to anyone with a decent computer. Tools like Ollama and Open WebUI are making it easy. The barriers are falling fast.
A few years ago, running a useful AI model required cloud infrastructure and enterprise budgets. Today, you can do it on an old Mac Mini with 8GB of RAM that costs less than a pair of sneakers on the secondhand market. That trajectory reminds me of the early days of Linux, when something that was once the domain of server rooms gradually became something anyone could run on their desktop.
The fact that I can run a functional AI assistant on the most basic Apple Silicon Mac — the cheapest, lowest-spec model they ever made with an M1 chip — tells you everything about where this technology is headed. If this is what’s possible on 8GB today, imagine what the next generation of small models will do on the same hardware a year from now.
If you’ve been reading TechSource since the Ubuntu days, you already understand why this matters. The same principles that made open-source software transformative — transparency, control, community, freedom — are now being applied to artificial intelligence. And just like with Linux, you don’t need anyone’s permission to get started.
Pull up a terminal. Install Ollama. Run your first model. Welcome to the revolution. It’s local, it’s private, it’s free, and it could talk to your Linux-powered robot soon :)
For those of you who are curious, below is a photo of my old Mac Mini (named Murdoc) lying on my living room table, looking like a metal brick that does nothing:
![]() |
| Mac Mini (Murdoc) |
— Jun











