Most AI tools quietly assume a few things:
- You’re always online
- Pricing stays reasonable
- Models behave the same tomorrow as they do today
- Someone else is responsible for making it all work
Most of the time, that’s fine. Cloud AI is genuinely incredible.
But if you use AI regularly — to think, write, organize, or clean things up — those assumptions start to matter more than you expect.
This post is about when running AI locally makes sense, when it doesn’t, and why having the option at all is more powerful than most tools admit.
What people usually mean by “using AI”
For many of us, “using AI” looks like:
- opening a chat window
- pasting some text
- asking a question
- copying the result somewhere else
That experience is fast, polished, and impressive. It’s also very ephemeral.
Once you start using AI as part of your everyday work — not just for one-off questions — you start to notice friction:
- repeated copy/paste
- prompts you wish you could reuse
- outputs that are almost right
- work that depends on a service you don’t control
None of this is necessarily a dealbreaker. But it does shape how reliable and comfortable AI feels over time. And as AI becomes part of more of your workflows and tasks it becomes more and more of a time and quality suck.
What a local LLM actually is (plain English)
A local LLM is simply a large language model that runs on your own computer.
- No internet required
- No per-token billing
- No account needed once it’s installed
You download a model once, and it behaves like any other local tool. If you’ve ever installed a large app or game, the idea is similar — just applied to AI.
You’re not hosting anything.
You’re not managing servers.
You’re just running software.
Why local models are quietly great
Local models don’t replace cloud AI. They complement it.
Here’s where they shine.
Predictable cost
Once a local model is installed, using it is effectively free. There’s no meter running in the background, no surprise bill at the end of the month, and no need to think about whether another prompt is “worth it.”
That alone can change how freely you experiment.
Stability
The model you downloaded today is the same model tomorrow. It doesn’t silently update, change behavior, or shift guidelines unless you decide to swap it out.
That consistency matters more than people expect — especially for repeatable work.
Offline-friendly
Planes. Trains. Hotels. Bad Wi-Fi. Spotty connections.
Most AI tools simply stop working in these moments. Local models don’t. They keep going, quietly, like a calculator or a notebook.
A better fit for everyday AI
Local models are especially good at:
- rewriting
- cleanup
- extraction
- classification
- structured transformations
These are the kinds of tasks that come up constantly — and don’t always need the most advanced model available.
The honest downsides of local LLMs
Local models aren’t magic, and they’re not for everything.
They’re slower
On most computers, local models stream text more slowly than cloud models. You’ll notice it. Responses come back at a readable pace, not instantly.
For many tasks, that’s perfectly fine. For others, it can feel sluggish.
They’re not as capable as top-tier cloud models
If you need deep reasoning, very long context windows, or cutting-edge performance, cloud models still win.
Local models are good — just not that good.
Hardware matters
Performance depends on:
- your CPU
- available memory
- whether you have a GPU
- the size of the model you’re running
There’s no way around that.
The tradeoff is simple: local models give you control and predictability, but they ask for patience.
Realistic performance expectations
Before looking at speeds, it helps to clarify what we’re measuring.
A token is roughly:
- 3–4 characters of English text, or
- about ¾ of a word
So when we talk about tokens per second, we’re really talking about how fast text appears on screen.
Typical local model performance
Here’s what many people see on common machines when running 7–8B local models (a very practical size for everyday work):
| Machine | Tokens / second | Approx. characters / second | What it feels like |
|---|---|---|---|
| MacBook Air (M1 / M2) | 5–15 | ~15–60 chars/sec | Text appears steadily, line by line |
| MacBook Pro (M1 / M2 Pro / Max) | 10–25 | ~30–100 chars/sec | Smooth and readable as it streams |
| Mid-range Windows laptop (CPU-only) | 3–10 | ~10–40 chars/sec | Noticeably slower, but usable |
| Gaming PC / Alienware (dedicated GPU) | 25–60+ | ~75–240+ chars/sec | Feels fast; closer to cloud speed for many tasks |
Smaller models respond faster. Larger models respond more slowly. CPU speed, memory, and GPU availability all make a real difference.
How this compares to cloud models
For context, cloud-hosted models are typically much faster:
- Often 50–150+ tokens per second
- Text can appear almost instantly
- Ideal for interactive chat and long reasoning sessions
That speed difference is real — and expected.
Local models trade raw throughput for:
- zero per-token cost
- offline reliability
- predictable behavior
- no usage caps or throttling
Why this tradeoff is usually fine in spreadsheets
Most spreadsheet-based AI work isn’t conversational. You’re usually:
- running AI across many rows
- cleaning or transforming structured data
- waiting on results, not typing back and forth
In that context:
- a few extra seconds per column is rarely a problem
- predictability matters more than instant replies
- cost-free iteration changes how freely you experiment
Local models feel less like a chat partner and more like a quiet helper doing work in the background.
”This sounds complicated”
For many people, the idea of running AI locally feels intimidating.
It sounds like:
- terminals
- configuration
- things breaking
- becoming your own IT department
That fear is understandable — and mostly outdated.
Tools like Ollama make downloading and running local models surprisingly simple. In practice, it’s closer to installing a package manager than setting up infrastructure.
Once the model is running, the hardest part is already done.
Where Bwocks fits into this
This is where local models usually fall apart: usability.
Running a local model by itself is one thing. Actually using it for real work is another.
Bwocks gives local models a place to live.
Inside Bwocks:
- local and cloud models work the same way
- you use them in spreadsheet columns
- you pass in context
- you clean up outputs
- you stay in one place
You don’t have to choose one model forever. You can mix and match:
- cloud when you want speed or reasoning
- local when you want control or offline reliability
The point isn’t ideology. It’s optionality.
A very quick setup overview
At a high level, running a local model in Bwocks looks like this:
- Install Ollama
- Download a model
- Open Bwocks → Settings → Enable local LLMs
- Enter the model name
- Use it like any other AI column
That’s it.
No scripting. No prompt glue. No extra tools.
When to use local vs cloud LLMs
A simple way to think about it:
Use cloud models when:
- speed matters most
- you need the strongest reasoning
- you’re doing complex, one-off work
Use local models when:
- you’re cleaning or transforming data
- you’re iterating a lot
- you want predictable costs
- you’re offline or semi-offline
Most people end up using both — often in the same project.
Tools vs services
Cloud AI is a service.
Local AI feels like a tool.
Both are useful.
Bwocks is built around the idea that you shouldn’t be forced into one or the other. Sometimes you want peak performance. Sometimes you just want something that works, stays put, and doesn’t surprise you.
Having that choice turns AI from something you rent into something you actually own.