One of the most powerful features of Bwocks is that it can run AI locally on your computer.
No internet required.
No per-prompt billing.
No surprise usage limits.
If that sounds intimidating — don’t worry. You don’t need to be a developer, and you don’t need to understand how models work internally. This guide walks through everything step by step.
Why run AI locally at all?
Cloud AI is fast and convenient. But it also comes with tradeoffs you don’t always notice at first:
- You’re dependent on pricing changes
- Usage limits can kick in at inconvenient times
- Model behavior can change without warning
- You need a stable internet connection
Local models flip that around.
What local models give you
- Predictable cost – free once downloaded
- Offline use – planes, trains, bad Wi-Fi, no problem
- Consistency – the model you download stays the same
- Freedom to experiment – no “is this prompt worth it?” anxiety
Local models aren’t a replacement for cloud AI. They’re a second gear — especially good for cleanup, rewriting, extraction, classification, and structured work inside spreadsheets.
What you’ll install (just one thing)
To run local models, you’ll install a small helper app called Ollama.
Ollama handles:
- downloading models
- running them in the background
- making them available to apps like Bwocks
You don’t need to configure servers or write code.
Step 1: Download Ollama
Go to: 👉 https://ollama.com
Download the version for your operating system:
- macOS: download the
.dmg - Windows: download the installer
- Linux: follow the instructions on the site
Install it like any other app.
Once installed:
- On macOS: open Ollama from Applications
- On Windows: launch Ollama from the Start menu
Ollama should now be running quietly in the background.
Step 2: Open your terminal (don’t worry)
You’ll use the terminal once to download a model.
On macOS
- Open Spotlight (⌘ + Space)
- Type Terminal
- Press Enter
On Windows
- Press Start
- Search for Command Prompt or PowerShell
- Open it
That’s it — you’re in.
Step 3: Download a model
In the terminal, type:
ollama pull gemma3:4b
(If you want to replace the gemma3 model with another of your choice, you would replace this portion of the command. Above is simply for illustration. We’ll get into choosing a model a bit farther down the page.)
Press Enter.
This will:
- download the model
- store it locally on your computer
- make it available instantly once finished
Depending on your internet speed, this can take a few minutes.
💡 You only do this once per model.
Step 4: Make sure Ollama is running
Ollama needs to be running in the background.
- On macOS: make sure the Ollama app is open
- On Windows: make sure Ollama is running in the system tray
If it’s running, you’re good.
Step 5: Enable local models in Bwocks
Now switch to Bwocks.
- Open Settings
- Enable Local LLMs
- Enter the model name exactly as downloaded
- Example:
gemma3:4b
- Example:
- Save
That’s it.
From now on, you can select that local model in:
- AI Columns
- AI Cleanup
- Image or text generation (depending on the model)
Local and cloud models work the same way inside Bwocks.
What performance should you expect?
Local models are slower than cloud AI — that’s normal. To understand what that means, here’s a quick explanation.
A token is roughly:
- 3–4 characters of English text, or
- about ¾ of a word
So “tokens per second” is basically “how fast text appears on screen.”
Typical local model performance (7–8B models)
| Machine | Tokens / second | Approx. characters / second | What it feels like |
|---|---|---|---|
| MacBook Air (M1 / M2) | 5–15 | ~15–60 chars/sec | Text appears steadily, line by line |
| MacBook Pro (M1 / M2 Pro / Max) | 10–25 | ~30–100 chars/sec | Smooth and readable |
| Mid-range Windows laptop (CPU-only) | 3–10 | ~10–40 chars/sec | Slower, but usable |
| Gaming PC / Alienware (dedicated GPU) | 25–60+ | ~75–240+ chars/sec | Feels fast, close to cloud |
For comparison, cloud models often run at 50–150+ tokens per second.
Local models trade speed for:
- zero per-token cost
- offline reliability
- predictable behavior
In spreadsheets — where you’re often running AI across many rows — this tradeoff is usually worth it.
Choosing the right model (simple guidance)
You don’t need to try everything. Start small.
A great default: gemma3:4b
This is a fantastic all-around model:
- fast on most machines
- surprisingly creative
- excellent at cleanup, rewriting, extraction, and classification
- runs very well on gaming laptops and modern Macs
This is the model many people stick with for day-to-day work.
Other common choices (optional)
- Mistral 7B – solid instruction following, lightweight
- Llama 3 (larger variants) – better reasoning, slower
- Code-focused models – useful if you’re generating or analyzing code
- Vision models – for image + text workflows
Rule of thumb:
- Smaller models = faster, cheaper, more predictable
- Larger models = slower, smarter, heavier
You can always download more later.
When local models shine (and when they don’t)
Local models are especially good for:
- AI Cleanup
- text normalization
- extraction into columns
- rewriting and summarizing
- structured, repeatable tasks
Cloud models are better when:
- speed matters most
- you need deep reasoning
- you’re doing complex, one-off work
Bwocks lets you use both, side by side.
Final thoughts
Running AI locally used to feel like something only developers did.
That’s no longer true.
With tools like Ollama and Bwocks:
- setup takes minutes
- usage feels familiar
- and the benefits compound quickly
You don’t have to switch everything to local models. But having the option — especially for cleanup, iteration, and offline work — changes how freely you can think and experiment.
And that’s the real win.