Recommended Local Models

Run AI skills entirely on your Mac. No API keys, no cloud, no cost. Ever.

VoxChimp keeps your voice on-device either way; local models extend that to Agent Mode text too.

This is an advanced feature for power users comfortable with installing local software. VoxChimp works great out of the box with cloud providers too.

What are AI skills?

VoxChimp's AI skills let you do more than just dictate. Speak a command and the AI writes SOAP notes, drafts emails, summarises text, translates, rewrites for tone, and more. Skills need an AI model to work. You can use a paid cloud provider, or run one locally for free.

Total privacy

With a local model, your voice commands and text never leave your Mac. No data sent to OpenAI, Google, or anyone else. Perfect for medical notes, legal work, or anything confidential. See the full data flow.

Zero ongoing cost

Cloud providers charge per token. The more you use skills, the more you pay. Local models are completely free. Use them as much as you want, forever. No API keys, no billing surprises.

Works offline

No Wi-Fi? No problem. Local models run entirely on your hardware. Use AI skills on a plane, in a rural clinic, or anywhere without an internet connection.

Cloud providers still have their place. Services like Claude, ChatGPT, and Gemini offer the highest quality output and require no local setup, just an API key. Local models are the best choice when privacy, cost, or offline access matter most to you.

Pick your model

These models work great with VoxChimp via LM Studio or Ollama. All free, all local.

Model	Size	Min RAM	Best For	Tool Calling	Arena Elo
Gemma 4 31B New	31B	24 GB	Reasoning + agent tasks	Excellent	1452
Gemma 4 26B MoE New	26B	16 GB	Efficient reasoning (fewer active params)	Excellent	1441
Qwen 3.5 27B	27B	16 GB	Complex reasoning + search	Excellent	1450
Qwen 2.5 7B Instruct	7B	8 GB	General + search	Excellent	-
Gemma 3 12B	12B	16 GB	Fast reasoning	Good	-
Llama 3.1 8B Instruct	8B	16 GB	General purpose	Good	-
Phi-4 Mini 3.8B	3.8B	8 GB	Lightweight tasks	Good	-
Nemotron 3 Nano 4B	4B	8 GB	Lightweight + search	Good	-

Gemma 4 by Google DeepMind

Open-weight models built for reasoning and agent tasks, not just chat. The 31B model ranks #3 on Arena AI, outperforming models up to 20x larger. The 26B MoE variant ranks #6 and uses fewer active parameters per step, making it more efficient on consumer hardware.

VoxChimp uses smart pre-search for local models, so web search works even without native tool calling. The "Tool Calling" column reflects each model's general capability.

Set up local models

Step-by-step setup guides for each runner now live in the help centre.

Browse the full local-models section in the help centre.

Performance tips

Get the most out of your local setup.

Pick the right quant

Q4_K_M is the sweet spot for most users. Good quality, fits in less RAM. Q8_0 gives near-full quality at 2x the size. If RAM is tight, go Q4_K_S.

GPU offloading

Apple Silicon Macs use Metal automatically in both Ollama and LM Studio. More GPU layers = faster inference. On M-series chips, expect 2-3x speedup over CPU-only.

Benchmark your setup

In Ollama, run a model and check the eval rate. In LM Studio, the UI shows tok/s in real-time. Aim for 10+ tok/s for comfortable use, 20+ feels instant.

Ready to go local?

Download VoxChimp and connect your favourite model in minutes.

Download for macOS