How to Install Ollama on Mac (Apple Silicon, 2026): M1-M5

Q: Which Ollama model should I start with?

For 8GB Macs, start with ollama run qwen3.5:4b (fast, ~3GB download). For 16GB+ Macs, ollama run qwen3.5:9b offers much better quality. See our benchmark page for quality comparisons across models.

Q: How do I run Ollama as a background API server?

Run ollama serve to start a local API server at http://localhost:11434. This enables integration with tools like Open WebUI, Continue.dev, and other applications. The API follows the OpenAI-compatible format. --- Related Model Families: Llama Models: Most popular choice for beginners Qwen Models: Best quality-per-size ratio Phi Models: Tiny models for low-RAM devices Guide updated June 6, 2026 with current model picks. For personalized recommendations, visit modelfit.io.

TL;DR: Run brew install ollama, then ollama run qwen3.5:4b. That's a capable local LLM on any Apple Silicon Mac with 8GB+ RAM in about 5 minutes. 16GB Macs should step up to Qwen3.5 9B; 24GB+ unlocks Qwen3.5 35B-A3B.

This is the hands-on CLI walkthrough: every terminal command from a clean Mac to your first running model, plus the exact model picks for your RAM. Prefer a structured overview with troubleshooting cards? See the companion Ollama setup guide.

Get started with local AI in minutes

What Do You Need Before Installing?

Mac with Apple Silicon (M1, M2, M3, or M4)
macOS 12.3+
At least 8GB RAM (16GB recommended)

Step 1: Install Ollama

The easiest way is via Homebrew:

brew install ollama

Or download the app directly from ollama.com.

Step 2: Verify Installation

ollama --version

You should see the installed version (e.g., 0.3.0).

Step 3: Download and Run a Model

Let's start with a lightweight yet capable model:

ollama run qwen3.5:4b

The first run downloads the model (~3GB). After that, it launches instantly.

Step 4: Interact with the Model

Once launched, you have an interactive prompt:

>>> Explain programming in 3 simple sentences

Press Ctrl+D to exit.

Which Model to Choose?

For Beginners (8GB RAM)

Model	Command	Best For
Qwen3.5 4B	`ollama run qwen3.5:4b`	General chat, multimodal
Qwen3.5 2B	`ollama run qwen3.5:2b`	Very light tasks
Gemma 4 E2B	`ollama run gemma4:e2b`	Basic tasks, fastest

For Better Quality (16GB+ RAM)

Model	Command	Best For
Qwen3.5 9B	`ollama run qwen3.5:9b`	All-round use, vision
Gemma 4 E4B	`ollama run gemma4:e4b`	General use, light RAM

For Power Users (24GB+ RAM)

Model	Command	Best For
Qwen3.5 35B-A3B	`ollama run qwen3.5:35b-a3b`	Complex tasks, near-frontier
Qwen3.6 27B	`ollama run qwen3.6:27b`	Coding

Different models for different needs and hardware

Which Commands Should You Know?

List installed models

ollama list

Remove a model

ollama rm qwen3.5:4b

Run without interactive mode

ollama run qwen3.5:4b "Explain photosynthesis"

API Server

ollama serve

Ollama starts a local API server at http://localhost:11434.

Integration with modelfit.io

Use modelfit.io to:

Find the best model for your exact configuration
Compare speeds (tokens/second)
See RAM usage for each model

What If Something Goes Wrong?

"Error: model not found"

Make sure you've downloaded the model:

ollama pull qwen3.5:4b

Extreme slowness

First tokens can be slow. Generation improves after a few seconds.

Mac heating up

This is normal. LLMs stress the GPU. Use a cooling stand for long sessions.

Next Steps

Explore available models
Try different quantizations (Q4, Q8)
Integrate Ollama into your apps via API

Related: Check our detailed Ollama setup guide for advanced configuration, see the best LLMs for MacBook, or compare MacBook Air vs Pro for LLMs.

Frequently Asked Questions

How much RAM do I need to run Ollama on Mac?

A minimum of 8GB RAM runs small models like Qwen3.5 4B. 16GB is recommended for quality models like Qwen3.5 9B. 24GB+ opens access to powerful models like Qwen3.5-35B-A3B. Check your Mac's capabilities on our device pages.

Is Ollama free to use?

Yes. Ollama is completely free and open-source. All models run locally on your hardware with zero API costs. You can download and run unlimited models with no subscription or usage limits.

Which Ollama model should I start with?

For 8GB Macs, start with ollama run qwen3.5:4b (fast, ~3GB download). For 16GB+ Macs, ollama run qwen3.5:9b offers much better quality. See our benchmark page for quality comparisons across models.

Does Ollama work on Intel Macs?

Ollama works on Intel Macs but performance is significantly lower without Apple Silicon's unified memory and Neural Engine. For Intel Macs, stick to small models (3B-7B). Apple Silicon Macs (M1 or newer) are strongly recommended.

How do I run Ollama as a background API server?

Run ollama serve to start a local API server at http://localhost:11434. This enables integration with tools like Open WebUI, Continue.dev, and other applications. The API follows the OpenAI-compatible format.

---

Related Model Families:

Llama Models: Most popular choice for beginners
Qwen Models: Best quality-per-size ratio
Phi Models: Tiny models for low-RAM devices

Guide updated June 6, 2026 with current model picks. For personalized recommendations, visit modelfit.io.

How to Install Ollama on Mac (Apple Silicon, 2026): M1-M5

What Do You Need Before Installing?

Step 1: Install Ollama

Step 2: Verify Installation

Step 3: Download and Run a Model

Step 4: Interact with the Model

Which Model to Choose?

For Beginners (8GB RAM)

For Better Quality (16GB+ RAM)

For Power Users (24GB+ RAM)

Which Commands Should You Know?

List installed models

Remove a model

Run without interactive mode

API Server

Integration with modelfit.io

What If Something Goes Wrong?

"Error: model not found"

Extreme slowness

Mac heating up

Next Steps

Frequently Asked Questions

How much RAM do I need to run Ollama on Mac?

Is Ollama free to use?

Which Ollama model should I start with?

Does Ollama work on Intel Macs?

How do I run Ollama as a background API server?

Where to Buy for Local AI

Want a Model Bigger Than This Mac Runs? Rent a Cloud GPU

How to Install Ollama on Mac (Apple Silicon, 2026): M1-M5

What Do You Need Before Installing?

Step 1: Install Ollama

Step 2: Verify Installation

Step 3: Download and Run a Model

Step 4: Interact with the Model

Which Model to Choose?

For Beginners (8GB RAM)

For Better Quality (16GB+ RAM)

For Power Users (24GB+ RAM)

Which Commands Should You Know?

List installed models

Remove a model

Run without interactive mode

API Server

Integration with modelfit.io

What If Something Goes Wrong?

"Error: model not found"

Extreme slowness

Mac heating up

Next Steps

Frequently Asked Questions

How much RAM do I need to run Ollama on Mac?

Is Ollama free to use?

Which Ollama model should I start with?

Does Ollama work on Intel Macs?

How do I run Ollama as a background API server?

Where to Buy for Local AI

Want a Model Bigger Than This Mac Runs? Rent a Cloud GPU

The weekly local-AI refresh