How to Install and Run Local LLMs in Bwocks (No Programming Required)

One of the most powerful features of Bwocks is that it can run AI locally on your computer.

No internet required.

No per-prompt billing.

No surprise usage limits.

If that sounds intimidating — don’t worry. You don’t need to be a developer, and you don’t need to understand how models work internally. This guide walks through everything step by step.

Why run AI locally at all?

Cloud AI is fast and convenient. But it also comes with tradeoffs you don’t always notice at first:

You’re dependent on pricing changes
Usage limits can kick in at inconvenient times
Model behavior can change without warning
You need a stable internet connection

Local models flip that around.

What local models give you

Predictable cost – free once downloaded
Offline use – planes, trains, bad Wi-Fi, no problem
Consistency – the model you download stays the same
Freedom to experiment – no “is this prompt worth it?” anxiety

Local models aren’t a replacement for cloud AI. They’re a second gear — especially good for cleanup, rewriting, extraction, classification, and structured work inside spreadsheets.

What you’ll install (just one thing)

To run local models, you’ll install a small helper app called Ollama.

Ollama handles:

downloading models
running them in the background
making them available to apps like Bwocks

You don’t need to configure servers or write code.

Step 1: Download Ollama

Go to: 👉 https://ollama.com

Download the version for your operating system:

macOS: download the .dmg
Windows: download the installer
Linux: follow the instructions on the site

Install it like any other app.

Once installed:

On macOS: open Ollama from Applications
On Windows: launch Ollama from the Start menu

Ollama should now be running quietly in the background.

Step 2: Open your terminal (don’t worry)

You’ll use the terminal once to download a model.

On macOS

Open Spotlight (⌘ + Space)
Type Terminal
Press Enter

On Windows

Press Start
Search for Command Prompt or PowerShell
Open it

That’s it — you’re in.

Step 3: Download a model

In the terminal, type:

ollama pull gemma3:4b

(If you want to replace the gemma3 model with another of your choice, you would replace this portion of the command. Above is simply for illustration. We’ll get into choosing a model a bit farther down the page.)

Press Enter.

This will:

download the model
store it locally on your computer
make it available instantly once finished

Depending on your internet speed, this can take a few minutes.

💡 You only do this once per model.

Step 4: Make sure Ollama is running

Ollama needs to be running in the background.

On macOS: make sure the Ollama app is open
On Windows: make sure Ollama is running in the system tray

If it’s running, you’re good.

Step 5: Enable local models in Bwocks

Now switch to Bwocks.

Open Settings
Enable Local LLMs
Enter the model name exactly as downloaded
- Example: gemma3:4b
Save

That’s it.

From now on, you can select that local model in:

AI Columns
AI Cleanup
Image or text generation (depending on the model)

Local and cloud models work the same way inside Bwocks.

What performance should you expect?

Local models are slower than cloud AI — that’s normal. To understand what that means, here’s a quick explanation.

A token is roughly:

3–4 characters of English text, or
about ¾ of a word

So “tokens per second” is basically “how fast text appears on screen.”

Typical local model performance (7–8B models)

Machine	Tokens / second	Approx. characters / second	What it feels like
MacBook Air (M1 / M2)	5–15	~15–60 chars/sec	Text appears steadily, line by line
MacBook Pro (M1 / M2 Pro / Max)	10–25	~30–100 chars/sec	Smooth and readable
Mid-range Windows laptop (CPU-only)	3–10	~10–40 chars/sec	Slower, but usable
Gaming PC / Alienware (dedicated GPU)	25–60+	~75–240+ chars/sec	Feels fast, close to cloud

For comparison, cloud models often run at 50–150+ tokens per second.

Local models trade speed for:

zero per-token cost
offline reliability
predictable behavior

In spreadsheets — where you’re often running AI across many rows — this tradeoff is usually worth it.

Choosing the right model (simple guidance)

You don’t need to try everything. Start small.

A great default: gemma3:4b

This is a fantastic all-around model:

fast on most machines
surprisingly creative
excellent at cleanup, rewriting, extraction, and classification
runs very well on gaming laptops and modern Macs

This is the model many people stick with for day-to-day work.

Other common choices (optional)

Mistral 7B – solid instruction following, lightweight
Llama 3 (larger variants) – better reasoning, slower
Code-focused models – useful if you’re generating or analyzing code
Vision models – for image + text workflows

Rule of thumb:

Smaller models = faster, cheaper, more predictable
Larger models = slower, smarter, heavier

You can always download more later.

When local models shine (and when they don’t)

Local models are especially good for:

AI Cleanup
text normalization
extraction into columns
rewriting and summarizing
structured, repeatable tasks

Cloud models are better when:

speed matters most
you need deep reasoning
you’re doing complex, one-off work

Bwocks lets you use both, side by side.

Final thoughts

Running AI locally used to feel like something only developers did.

That’s no longer true.

With tools like Ollama and Bwocks:

setup takes minutes
usage feels familiar
and the benefits compound quickly

You don’t have to switch everything to local models. But having the option — especially for cleanup, iteration, and offline work — changes how freely you can think and experiment.

And that’s the real win.