Skip to content
Features Use Cases Local Models Compare Pricing Help Download for Mac

Recommended Local Models

Run AI skills entirely on your Mac. No API keys, no cloud, no cost. Ever.

This is an advanced feature for power users comfortable with installing local software. VoxChimp works great out of the box with cloud providers too.

What are AI skills?

VoxChimp's AI skills let you do more than just dictate. Speak a command and the AI writes SOAP notes, drafts emails, summarises text, translates, rewrites for tone, and more. Skills need an AI model to work. You can use a paid cloud provider, or run one locally for free.

Total privacy

With a local model, your voice commands and text never leave your Mac. No data sent to OpenAI, Google, or anyone else. Perfect for medical notes, legal work, or anything confidential.

Zero ongoing cost

Cloud providers charge per token. The more you use skills, the more you pay. Local models are completely free. Use them as much as you want, forever. No API keys, no billing surprises.

Works offline

No Wi-Fi? No problem. Local models run entirely on your hardware. Use AI skills on a plane, in a rural clinic, or anywhere without an internet connection.

Cloud providers still have their place. Services like Claude, GPT-4, and Gemini offer the highest quality output and require no local setup, just an API key. Local models are the best choice when privacy, cost, or offline access matter most to you.

Pick your model

These models work great with VoxChimp via LM Studio or Ollama. All free, all local.

Model Size Min RAM Best For Tool Calling Arena Elo
Gemma 4 31B New 31B 24 GB Reasoning + agent tasks Excellent 1452
Gemma 4 26B MoE New 26B 16 GB Efficient reasoning (fewer active params) Excellent 1441
Qwen 3.5 27B 27B 16 GB Complex reasoning + search Excellent 1450
Qwen 2.5 7B Instruct 7B 8 GB General + search Excellent -
Gemma 3 12B 12B 16 GB Fast reasoning Good -
Llama 3.1 8B Instruct 8B 16 GB General purpose Good -
Phi-4 Mini 3.8B 3.8B 8 GB Lightweight tasks Good -
Nemotron 3 Nano 4B 4B 8 GB Lightweight + search Good -

Gemma 4 by Google DeepMind

Open-weight models built for reasoning and agent tasks, not just chat. The 31B model ranks #3 on Arena AI, outperforming models up to 20x larger. The 26B MoE variant ranks #6 and uses fewer active parameters per step, making it more efficient on consumer hardware.

VoxChimp uses smart pre-search for local models, so web search works even without native tool calling. The "Tool Calling" column reflects each model's general capability.

Set up local models

Step-by-step setup guides for each runner now live in the help centre.

Performance tips

Get the most out of your local setup.

Pick the right quant

Q4_K_M is the sweet spot for most users. Good quality, fits in less RAM. Q8_0 gives near-full quality at 2x the size. If RAM is tight, go Q4_K_S.

GPU offloading

Apple Silicon Macs use Metal automatically in both Ollama and LM Studio. More GPU layers = faster inference. On M-series chips, expect 2-3x speedup over CPU-only.

Benchmark your setup

In Ollama, run a model and check the eval rate. In LM Studio, the UI shows tok/s in real-time. Aim for 10+ tok/s for comfortable use, 20+ feels instant.

Ready to go local?

Download VoxChimp and connect your favourite model in minutes.

Download for macOS