2026-02-24
How to Install Ollama on Mac — Apple Silicon Guide (2026)
Want to run language models locally on your Mac? This guide shows you how to install Ollama and launch your first model in 5 minutes.
Get started with local AI in minutes
Requirements
- Mac with Apple Silicon (M1, M2, M3, or M4)
- macOS 12.3+
- At least 8GB RAM (16GB recommended)
Step 1: Install Ollama
The easiest way is via Homebrew:
brew install ollama
Or download the app directly from ollama.com.
Step 2: Verify Installation
ollama --version
You should see the installed version (e.g., 0.3.0).
Step 3: Download and Run a Model
Let's start with a lightweight yet capable model:
ollama run llama3.2:3b
The first run downloads the model (~2GB). After that, it launches instantly.
Step 4: Interact with the Model
Once launched, you have an interactive prompt:
>>> Explain programming in 3 simple sentences
Press Ctrl+D to exit.
Which Model to Choose?
For Beginners (8GB RAM)
| Model | Size | Best For |
|---|---|---|
| Llama 3.2 3B | Fast | General chat |
| Gemma 2 2B | Very light | Basic tasks |
For Better Quality (16GB+ RAM)
| Model | Size | Best For |
|---|---|---|
| Llama 3.1 8B | Excellent quality/speed | All-round use |
| Qwen2.5 7B | Top coding | Programming |
| Mistral 7B | Good balance | General use |
For Power Users (24GB+ RAM)
| Model | Size | Best For |
|---|---|---|
| Qwen3.5 35B-A3B | Near-frontier | Complex tasks |
| Llama 3.1 70B | High quality | Research |
Useful Commands
List installed models
ollama list
Remove a model
ollama rm llama3.2:3b
Run without interactive mode
ollama run llama3.2:3b "Explain photosynthesis"
API Server
ollama serve
Ollama starts a local API server at http://localhost:11434.
Integration with modelfit.io
Use modelfit.io to:
- Find the best model for your exact configuration
- Compare speeds (tokens/second)
- See RAM usage for each model
Common Issues
"Error: model not found"
Make sure you've downloaded the model:
ollama pull llama3.2:3b
Extreme slowness
First tokens can be slow. Generation improves after a few seconds.
Mac heating up
This is normal — LLMs stress the GPU. Use a cooling stand for long sessions.
Next Steps
- Explore available models
- Try different quantizations (Q4, Q8)
- Integrate Ollama into your apps via API
Frequently Asked Questions
How much RAM do I need to run Ollama on Mac?
A minimum of 8GB RAM runs small models (3B parameters). 16GB is recommended for quality 7-8B models like Llama 3.1 8B. 24GB+ opens access to powerful models like Qwen3.5-35B-A3B. Check your Mac's capabilities on our device pages.
Is Ollama free to use?
Yes. Ollama is completely free and open-source. All models run locally on your hardware with zero API costs. You can download and run unlimited models with no subscription or usage limits.
Which Ollama model should I start with?
For 8GB Macs, start with ollama run llama3.2:3b (fast, 2GB download). For 16GB+ Macs, ollama run llama3.1:8b offers much better quality. See our benchmark page for quality comparisons across models.
Does Ollama work on Intel Macs?
Ollama works on Intel Macs but performance is significantly lower without Apple Silicon's unified memory and Neural Engine. For Intel Macs, stick to small models (3B-7B). Apple Silicon Macs (M1 or newer) are strongly recommended.
How do I run Ollama as a background API server?
Run ollama serve to start a local API server at http://localhost:11434. This enables integration with tools like Open WebUI, Continue.dev, and other applications. The API follows the OpenAI-compatible format.
---
Guide updated February 24, 2026. For personalized recommendations, visit modelfit.io.Have questions? Reach out on X/Twitter