Most AI tools quietly assume a few things:

  • You’re always online
  • Pricing stays reasonable
  • Models behave the same tomorrow as they do today
  • Someone else is responsible for making it all work

Most of the time, that’s fine. Cloud AI is genuinely incredible.

But if you use AI regularly — to think, write, organize, or clean things up — those assumptions start to matter more than you expect.

This post is about when running AI locally makes sense, when it doesn’t, and why having the option at all is more powerful than most tools admit.

What people usually mean by “using AI”

For many of us, “using AI” looks like:

  • opening a chat window
  • pasting some text
  • asking a question
  • copying the result somewhere else

That experience is fast, polished, and impressive. It’s also very ephemeral.

Once you start using AI as part of your everyday work — not just for one-off questions — you start to notice friction:

  • repeated copy/paste
  • prompts you wish you could reuse
  • outputs that are almost right
  • work that depends on a service you don’t control

None of this is necessarily a dealbreaker. But it does shape how reliable and comfortable AI feels over time. And as AI becomes part of more of your workflows and tasks it becomes more and more of a time and quality suck.

What a local LLM actually is (plain English)

A local LLM is simply a large language model that runs on your own computer.

  • No internet required
  • No per-token billing
  • No account needed once it’s installed

You download a model once, and it behaves like any other local tool. If you’ve ever installed a large app or game, the idea is similar — just applied to AI.

You’re not hosting anything.

You’re not managing servers.

You’re just running software.

Why local models are quietly great

Local models don’t replace cloud AI. They complement it.

Here’s where they shine.

Predictable cost

Once a local model is installed, using it is effectively free. There’s no meter running in the background, no surprise bill at the end of the month, and no need to think about whether another prompt is “worth it.”

That alone can change how freely you experiment.

Stability

The model you downloaded today is the same model tomorrow. It doesn’t silently update, change behavior, or shift guidelines unless you decide to swap it out.

That consistency matters more than people expect — especially for repeatable work.

Offline-friendly

Planes. Trains. Hotels. Bad Wi-Fi. Spotty connections.

Most AI tools simply stop working in these moments. Local models don’t. They keep going, quietly, like a calculator or a notebook.

A better fit for everyday AI

Local models are especially good at:

  • rewriting
  • cleanup
  • extraction
  • classification
  • structured transformations

These are the kinds of tasks that come up constantly — and don’t always need the most advanced model available.

The honest downsides of local LLMs

Local models aren’t magic, and they’re not for everything.

They’re slower

On most computers, local models stream text more slowly than cloud models. You’ll notice it. Responses come back at a readable pace, not instantly.

For many tasks, that’s perfectly fine. For others, it can feel sluggish.

They’re not as capable as top-tier cloud models

If you need deep reasoning, very long context windows, or cutting-edge performance, cloud models still win.

Local models are good — just not that good.

Hardware matters

Performance depends on:

  • your CPU
  • available memory
  • whether you have a GPU
  • the size of the model you’re running

There’s no way around that.

The tradeoff is simple: local models give you control and predictability, but they ask for patience.

Realistic performance expectations

Before looking at speeds, it helps to clarify what we’re measuring.

A token is roughly:

  • 3–4 characters of English text, or
  • about ¾ of a word

So when we talk about tokens per second, we’re really talking about how fast text appears on screen.

Typical local model performance

Here’s what many people see on common machines when running 7–8B local models (a very practical size for everyday work):

MachineTokens / secondApprox. characters / secondWhat it feels like
MacBook Air (M1 / M2)5–15~15–60 chars/secText appears steadily, line by line
MacBook Pro (M1 / M2 Pro / Max)10–25~30–100 chars/secSmooth and readable as it streams
Mid-range Windows laptop (CPU-only)3–10~10–40 chars/secNoticeably slower, but usable
Gaming PC / Alienware (dedicated GPU)25–60+~75–240+ chars/secFeels fast; closer to cloud speed for many tasks

Smaller models respond faster. Larger models respond more slowly. CPU speed, memory, and GPU availability all make a real difference.

How this compares to cloud models

For context, cloud-hosted models are typically much faster:

  • Often 50–150+ tokens per second
  • Text can appear almost instantly
  • Ideal for interactive chat and long reasoning sessions

That speed difference is real — and expected.

Local models trade raw throughput for:

  • zero per-token cost
  • offline reliability
  • predictable behavior
  • no usage caps or throttling

Why this tradeoff is usually fine in spreadsheets

Most spreadsheet-based AI work isn’t conversational. You’re usually:

  • running AI across many rows
  • cleaning or transforming structured data
  • waiting on results, not typing back and forth

In that context:

  • a few extra seconds per column is rarely a problem
  • predictability matters more than instant replies
  • cost-free iteration changes how freely you experiment

Local models feel less like a chat partner and more like a quiet helper doing work in the background.

”This sounds complicated”

For many people, the idea of running AI locally feels intimidating.

It sounds like:

  • terminals
  • configuration
  • things breaking
  • becoming your own IT department

That fear is understandable — and mostly outdated.

Tools like Ollama make downloading and running local models surprisingly simple. In practice, it’s closer to installing a package manager than setting up infrastructure.

Once the model is running, the hardest part is already done.

Where Bwocks fits into this

This is where local models usually fall apart: usability.

Running a local model by itself is one thing. Actually using it for real work is another.

Bwocks gives local models a place to live.

Inside Bwocks:

  • local and cloud models work the same way
  • you use them in spreadsheet columns
  • you pass in context
  • you clean up outputs
  • you stay in one place

You don’t have to choose one model forever. You can mix and match:

  • cloud when you want speed or reasoning
  • local when you want control or offline reliability

The point isn’t ideology. It’s optionality.

A very quick setup overview

At a high level, running a local model in Bwocks looks like this:

  1. Install Ollama
  2. Download a model
  3. Open Bwocks → Settings → Enable local LLMs
  4. Enter the model name
  5. Use it like any other AI column

That’s it.

No scripting. No prompt glue. No extra tools.

When to use local vs cloud LLMs

A simple way to think about it:

Use cloud models when:

  • speed matters most
  • you need the strongest reasoning
  • you’re doing complex, one-off work

Use local models when:

  • you’re cleaning or transforming data
  • you’re iterating a lot
  • you want predictable costs
  • you’re offline or semi-offline

Most people end up using both — often in the same project.

Tools vs services

Cloud AI is a service.

Local AI feels like a tool.

Both are useful.

Bwocks is built around the idea that you shouldn’t be forced into one or the other. Sometimes you want peak performance. Sometimes you just want something that works, stays put, and doesn’t surprise you.

Having that choice turns AI from something you rent into something you actually own.