Recommended Local Models
Run AI skills entirely on your Mac. No API keys, no cloud, no cost. Ever.
This is an advanced feature for power users comfortable with installing local software. VoxChimp works great out of the box with cloud providers too.
What are AI skills?
VoxChimp's AI skills let you do more than just dictate. Speak a command and the AI writes SOAP notes, drafts emails, summarises text, translates, rewrites for tone, and more. Skills need an AI model to work. You can use a paid cloud provider, or run one locally for free.
Total privacy
With a local model, your voice commands and text never leave your Mac. No data sent to OpenAI, Google, or anyone else. Perfect for medical notes, legal work, or anything confidential.
Zero ongoing cost
Cloud providers charge per token. The more you use skills, the more you pay. Local models are completely free. Use them as much as you want, forever. No API keys, no billing surprises.
Works offline
No Wi-Fi? No problem. Local models run entirely on your hardware. Use AI skills on a plane, in a rural clinic, or anywhere without an internet connection.
Cloud providers still have their place. Services like Claude, GPT-4, and Gemini offer the highest quality output and require no local setup, just an API key. Local models are the best choice when privacy, cost, or offline access matter most to you.
Pick your model
These models work great with VoxChimp via LM Studio or Ollama. All free, all local.
Gemma 4 by Google DeepMind
Open-weight models built for reasoning and agent tasks, not just chat. The 31B model ranks #3 on Arena AI, outperforming models up to 20x larger. The 26B MoE variant ranks #6 and uses fewer active parameters per step, making it more efficient on consumer hardware.
VoxChimp uses smart pre-search for local models, so web search works even without native tool calling. The "Tool Calling" column reflects each model's general capability.
Set up local models
Step-by-step setup guides for each runner now live in the help centre.
Performance tips
Get the most out of your local setup.
Pick the right quant
Q4_K_M is the sweet spot for most users. Good quality, fits in less RAM. Q8_0 gives near-full quality at 2x the size. If RAM is tight, go Q4_K_S.
GPU offloading
Apple Silicon Macs use Metal automatically in both Ollama and LM Studio. More GPU layers = faster inference. On M-series chips, expect 2-3x speedup over CPU-only.
Benchmark your setup
In Ollama, run a model and check the eval rate. In LM Studio, the UI shows tok/s in real-time. Aim for 10+ tok/s for comfortable use, 20+ feels instant.
Ready to go local?
Download VoxChimp and connect your favourite model in minutes.
Download for macOS