Why Running AI Locally Is Underrated (and When It Isn't)

Most AI tools quietly assume a few things:

You’re always online
Pricing stays reasonable
Models behave the same tomorrow as they do today
Someone else is responsible for making it all work

Most of the time, that’s fine. Cloud AI is genuinely incredible.

But if you use AI regularly — to think, write, organize, or clean things up — those assumptions start to matter more than you expect.

This post is about when running AI locally makes sense, when it doesn’t, and why having the option at all is more powerful than most tools admit.

What people usually mean by “using AI”

For many of us, “using AI” looks like:

opening a chat window
pasting some text
asking a question
copying the result somewhere else

That experience is fast, polished, and impressive. It’s also very ephemeral.

Once you start using AI as part of your everyday work — not just for one-off questions — you start to notice friction:

repeated copy/paste
prompts you wish you could reuse
outputs that are almost right
work that depends on a service you don’t control

None of this is necessarily a dealbreaker. But it does shape how reliable and comfortable AI feels over time. And as AI becomes part of more of your workflows and tasks it becomes more and more of a time and quality suck.

What a local LLM actually is (plain English)

A local LLM is simply a large language model that runs on your own computer.

No internet required
No per-token billing
No account needed once it’s installed

You download a model once, and it behaves like any other local tool. If you’ve ever installed a large app or game, the idea is similar — just applied to AI.

You’re not hosting anything.

You’re not managing servers.

You’re just running software.

Why local models are quietly great

Local models don’t replace cloud AI. They complement it.

Here’s where they shine.

Predictable cost

Once a local model is installed, using it is effectively free. There’s no meter running in the background, no surprise bill at the end of the month, and no need to think about whether another prompt is “worth it.”

That alone can change how freely you experiment.

Stability

The model you downloaded today is the same model tomorrow. It doesn’t silently update, change behavior, or shift guidelines unless you decide to swap it out.

That consistency matters more than people expect — especially for repeatable work.

Offline-friendly

Planes. Trains. Hotels. Bad Wi-Fi. Spotty connections.

Most AI tools simply stop working in these moments. Local models don’t. They keep going, quietly, like a calculator or a notebook.

A better fit for everyday AI

Local models are especially good at:

rewriting
cleanup
extraction
classification
structured transformations

These are the kinds of tasks that come up constantly — and don’t always need the most advanced model available.

The honest downsides of local LLMs

Local models aren’t magic, and they’re not for everything.

They’re slower

On most computers, local models stream text more slowly than cloud models. You’ll notice it. Responses come back at a readable pace, not instantly.

For many tasks, that’s perfectly fine. For others, it can feel sluggish.

They’re not as capable as top-tier cloud models

If you need deep reasoning, very long context windows, or cutting-edge performance, cloud models still win.

Local models are good — just not that good.

Hardware matters

Performance depends on:

your CPU
available memory
whether you have a GPU
the size of the model you’re running

There’s no way around that.

The tradeoff is simple: local models give you control and predictability, but they ask for patience.

Realistic performance expectations

Before looking at speeds, it helps to clarify what we’re measuring.

A token is roughly:

3–4 characters of English text, or
about ¾ of a word

So when we talk about tokens per second, we’re really talking about how fast text appears on screen.

Typical local model performance

Here’s what many people see on common machines when running 7–8B local models (a very practical size for everyday work):

Machine	Tokens / second	Approx. characters / second	What it feels like
MacBook Air (M1 / M2)	5–15	~15–60 chars/sec	Text appears steadily, line by line
MacBook Pro (M1 / M2 Pro / Max)	10–25	~30–100 chars/sec	Smooth and readable as it streams
Mid-range Windows laptop (CPU-only)	3–10	~10–40 chars/sec	Noticeably slower, but usable
Gaming PC / Alienware (dedicated GPU)	25–60+	~75–240+ chars/sec	Feels fast; closer to cloud speed for many tasks

Smaller models respond faster. Larger models respond more slowly. CPU speed, memory, and GPU availability all make a real difference.

How this compares to cloud models

For context, cloud-hosted models are typically much faster:

Often 50–150+ tokens per second
Text can appear almost instantly
Ideal for interactive chat and long reasoning sessions

That speed difference is real — and expected.

Local models trade raw throughput for:

zero per-token cost
offline reliability
predictable behavior
no usage caps or throttling

Why this tradeoff is usually fine in spreadsheets

Most spreadsheet-based AI work isn’t conversational. You’re usually:

running AI across many rows
cleaning or transforming structured data
waiting on results, not typing back and forth

In that context:

a few extra seconds per column is rarely a problem
predictability matters more than instant replies
cost-free iteration changes how freely you experiment

Local models feel less like a chat partner and more like a quiet helper doing work in the background.

”This sounds complicated”

For many people, the idea of running AI locally feels intimidating.

It sounds like:

terminals
configuration
things breaking
becoming your own IT department

That fear is understandable — and mostly outdated.

Tools like Ollama make downloading and running local models surprisingly simple. In practice, it’s closer to installing a package manager than setting up infrastructure.

Once the model is running, the hardest part is already done.

Where Bwocks fits into this

This is where local models usually fall apart: usability.

Running a local model by itself is one thing. Actually using it for real work is another.

Bwocks gives local models a place to live.

Inside Bwocks:

local and cloud models work the same way
you use them in spreadsheet columns
you pass in context
you clean up outputs
you stay in one place

You don’t have to choose one model forever. You can mix and match:

cloud when you want speed or reasoning
local when you want control or offline reliability

The point isn’t ideology. It’s optionality.

A very quick setup overview

At a high level, running a local model in Bwocks looks like this:

Install Ollama
Download a model
Open Bwocks → Settings → Enable local LLMs
Enter the model name
Use it like any other AI column

That’s it.

No scripting. No prompt glue. No extra tools.

When to use local vs cloud LLMs

A simple way to think about it:

Use cloud models when:

speed matters most
you need the strongest reasoning
you’re doing complex, one-off work

Use local models when:

you’re cleaning or transforming data
you’re iterating a lot
you want predictable costs
you’re offline or semi-offline

Most people end up using both — often in the same project.

Tools vs services

Cloud AI is a service.

Local AI feels like a tool.

Both are useful.

Bwocks is built around the idea that you shouldn’t be forced into one or the other. Sometimes you want peak performance. Sometimes you just want something that works, stays put, and doesn’t surprise you.

Having that choice turns AI from something you rent into something you actually own.