TL;DR: Runbrew install ollama, thenollama run qwen3.5:4b. That's a capable local LLM on any Apple Silicon Mac with 8GB+ RAM in about 5 minutes. 16GB Macs should step up to Qwen3.5 9B; 24GB+ unlocks Qwen3.5 35B-A3B.
This is the hands-on CLI walkthrough: every terminal command from a clean Mac to your first running model, plus the exact model picks for your RAM. Prefer a structured overview with troubleshooting cards? See the companion Ollama setup guide.
Get started with local AI in minutes
What Do You Need Before Installing?
- Mac with Apple Silicon (M1, M2, M3, or M4)
- macOS 12.3+
- At least 8GB RAM (16GB recommended)
Step 1: Install Ollama
The easiest way is via Homebrew:
brew install ollama
Or download the app directly from ollama.com.
Step 2: Verify Installation
ollama --version
You should see the installed version (e.g., 0.3.0).
Step 3: Download and Run a Model
Let's start with a lightweight yet capable model:
ollama run qwen3.5:4b
The first run downloads the model (~3GB). After that, it launches instantly.
Step 4: Interact with the Model
Once launched, you have an interactive prompt:
>>> Explain programming in 3 simple sentences
Press Ctrl+D to exit.
Which Model to Choose?
For Beginners (8GB RAM)
| Model | Command | Best For |
|---|---|---|
| Qwen3.5 4B | ollama run qwen3.5:4b | General chat, multimodal |
| Qwen3.5 2B | ollama run qwen3.5:2b | Very light tasks |
| Gemma 4 E2B | ollama run gemma4:e2b | Basic tasks, fastest |
For Better Quality (16GB+ RAM)
| Model | Command | Best For |
|---|---|---|
| Qwen3.5 9B | ollama run qwen3.5:9b | All-round use, vision |
| Gemma 4 E4B | ollama run gemma4:e4b | General use, light RAM |
For Power Users (24GB+ RAM)
| Model | Command | Best For |
|---|---|---|
| Qwen3.5 35B-A3B | ollama run qwen3.5:35b-a3b | Complex tasks, near-frontier |
| Qwen3.6 27B | ollama run qwen3.6:27b | Coding |
Which Commands Should You Know?
List installed models
ollama list
Remove a model
ollama rm qwen3.5:4b
Run without interactive mode
ollama run qwen3.5:4b "Explain photosynthesis"
API Server
ollama serve
Ollama starts a local API server at http://localhost:11434.
Integration with modelfit.io
Use modelfit.io to:
- Find the best model for your exact configuration
- Compare speeds (tokens/second)
- See RAM usage for each model
What If Something Goes Wrong?
"Error: model not found"
Make sure you've downloaded the model:
ollama pull qwen3.5:4b
Extreme slowness
First tokens can be slow. Generation improves after a few seconds.
Mac heating up
This is normal. LLMs stress the GPU. Use a cooling stand for long sessions.
Next Steps
- Explore available models
- Try different quantizations (Q4, Q8)
- Integrate Ollama into your apps via API
Frequently Asked Questions
How much RAM do I need to run Ollama on Mac?
A minimum of 8GB RAM runs small models like Qwen3.5 4B. 16GB is recommended for quality models like Qwen3.5 9B. 24GB+ opens access to powerful models like Qwen3.5-35B-A3B. Check your Mac's capabilities on our device pages.
Is Ollama free to use?
Yes. Ollama is completely free and open-source. All models run locally on your hardware with zero API costs. You can download and run unlimited models with no subscription or usage limits.
Which Ollama model should I start with?
For 8GB Macs, start with ollama run qwen3.5:4b (fast, ~3GB download). For 16GB+ Macs, ollama run qwen3.5:9b offers much better quality. See our benchmark page for quality comparisons across models.
Does Ollama work on Intel Macs?
Ollama works on Intel Macs but performance is significantly lower without Apple Silicon's unified memory and Neural Engine. For Intel Macs, stick to small models (3B-7B). Apple Silicon Macs (M1 or newer) are strongly recommended.
How do I run Ollama as a background API server?
Run ollama serve to start a local API server at http://localhost:11434. This enables integration with tools like Open WebUI, Continue.dev, and other applications. The API follows the OpenAI-compatible format.
---
Related Model Families:- Llama Models: Most popular choice for beginners
- Qwen Models: Best quality-per-size ratio
- Phi Models: Tiny models for low-RAM devices
Where to Buy for Local AI
best configsRuns 30B models with headroom; active cooling sustains long inference without throttling.
Check price on AmazonMax headroomLoads 70B models locally, the most capable AI laptop config.
Check price on AmazonPrefer to buy direct? Buy from Apple (same price, no affiliate link).
Archive your model library off the internal drive. Quantized models run 5 to 40GB each, so 2TB holds dozens with room to spare.
Check price on Amazon40Gbps external storage fast enough to run models from. Pair it with an M.2 drive for a portable model vault.
Check price on AmazonThe fanless MacBook Air heat-soaks on long inference runs. An aluminum riser lifts the chassis so it sheds heat better off the desk.
Check price on AmazonMore ports for the external drives, displays and peripherals around a local-AI workstation.
Check price on AmazonModelFit may earn a commission on purchases through these links, at no extra cost to you.
Want a Model Bigger Than This Mac Runs? Rent a Cloud GPU
by the hour70B+ and frontier open-weight models that won't fit in unified memory run great on an hourly rented GPU, same open weights, same Ollama workflow, no subscription.
ModelFit may earn a commission on sign-ups made through these links, at no extra cost to you.
The weekly local-AI refresh
New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.
By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.
Have questions? Reach out on X/Twitter