By ModelFit Team · 2026-02-24

How to Install Ollama on Mac (Apple Silicon, 2026): M1-M5

TL;DR: Run brew install ollama, then ollama run qwen3.5:4b. That's a capable local LLM on any Apple Silicon Mac with 8GB+ RAM in about 5 minutes. 16GB Macs should step up to Qwen3.5 9B; 24GB+ unlocks Qwen3.5 35B-A3B.

This is the hands-on CLI walkthrough: every terminal command from a clean Mac to your first running model, plus the exact model picks for your RAM. Prefer a structured overview with troubleshooting cards? See the companion Ollama setup guide.

Ollama Setup on Mac Get started with local AI in minutes

What Do You Need Before Installing?

  • Mac with Apple Silicon (M1, M2, M3, or M4)
  • macOS 12.3+
  • At least 8GB RAM (16GB recommended)

Step 1: Install Ollama

The easiest way is via Homebrew:

brew install ollama

Or download the app directly from ollama.com.

Step 2: Verify Installation

ollama --version

You should see the installed version (e.g., 0.3.0).

Step 3: Download and Run a Model

Let's start with a lightweight yet capable model:

ollama run qwen3.5:4b

The first run downloads the model (~3GB). After that, it launches instantly.

Step 4: Interact with the Model

Once launched, you have an interactive prompt:

>>> Explain programming in 3 simple sentences

Press Ctrl+D to exit.

Which Model to Choose?

For Beginners (8GB RAM)

ModelCommandBest For
Qwen3.5 4Bollama run qwen3.5:4bGeneral chat, multimodal
Qwen3.5 2Bollama run qwen3.5:2bVery light tasks
Gemma 4 E2Bollama run gemma4:e2bBasic tasks, fastest

For Better Quality (16GB+ RAM)

ModelCommandBest For
Qwen3.5 9Bollama run qwen3.5:9bAll-round use, vision
Gemma 4 E4Bollama run gemma4:e4bGeneral use, light RAM

For Power Users (24GB+ RAM)

ModelCommandBest For
Qwen3.5 35B-A3Bollama run qwen3.5:35b-a3bComplex tasks, near-frontier
Qwen3.6 27Bollama run qwen3.6:27bCoding
Different models for different needs and hardware

Which Commands Should You Know?

List installed models

ollama list

Remove a model

ollama rm qwen3.5:4b

Run without interactive mode

ollama run qwen3.5:4b "Explain photosynthesis"

API Server

ollama serve

Ollama starts a local API server at http://localhost:11434.

Integration with modelfit.io

Use modelfit.io to:

  • Find the best model for your exact configuration
  • Compare speeds (tokens/second)
  • See RAM usage for each model

What If Something Goes Wrong?

"Error: model not found"

Make sure you've downloaded the model:

ollama pull qwen3.5:4b

Extreme slowness

First tokens can be slow. Generation improves after a few seconds.

Mac heating up

This is normal. LLMs stress the GPU. Use a cooling stand for long sessions.

Next Steps

  • Explore available models
  • Try different quantizations (Q4, Q8)
  • Integrate Ollama into your apps via API
Related: Check our detailed Ollama setup guide for advanced configuration, see the best LLMs for MacBook, or compare MacBook Air vs Pro for LLMs.

Frequently Asked Questions

How much RAM do I need to run Ollama on Mac?

A minimum of 8GB RAM runs small models like Qwen3.5 4B. 16GB is recommended for quality models like Qwen3.5 9B. 24GB+ opens access to powerful models like Qwen3.5-35B-A3B. Check your Mac's capabilities on our device pages.

Is Ollama free to use?

Yes. Ollama is completely free and open-source. All models run locally on your hardware with zero API costs. You can download and run unlimited models with no subscription or usage limits.

Which Ollama model should I start with?

For 8GB Macs, start with ollama run qwen3.5:4b (fast, ~3GB download). For 16GB+ Macs, ollama run qwen3.5:9b offers much better quality. See our benchmark page for quality comparisons across models.

Does Ollama work on Intel Macs?

Ollama works on Intel Macs but performance is significantly lower without Apple Silicon's unified memory and Neural Engine. For Intel Macs, stick to small models (3B-7B). Apple Silicon Macs (M1 or newer) are strongly recommended.

How do I run Ollama as a background API server?

Run ollama serve to start a local API server at http://localhost:11434. This enables integration with tools like Open WebUI, Continue.dev, and other applications. The API follows the OpenAI-compatible format.

---

Related Model Families: Guide updated June 6, 2026 with current model picks. For personalized recommendations, visit modelfit.io.

Where to Buy for Local AI

best configs

Prefer to buy direct? Buy from Apple (same price, no affiliate link).

ModelFit may earn a commission on purchases through these links, at no extra cost to you.

Want a Model Bigger Than This Mac Runs? Rent a Cloud GPU

by the hour

70B+ and frontier open-weight models that won't fit in unified memory run great on an hourly rented GPU, same open weights, same Ollama workflow, no subscription.

RunPodHourly GPU pods (RTX 4090 to H100) with one-click Ollama/vLLM templates.Rent
Vast.aiMarketplace of rented GPUs, usually the cheapest per-hour prices.Rent

ModelFit may earn a commission on sign-ups made through these links, at no extra cost to you.

See how this changes your recommendation
Run the wizard

The weekly local-AI refresh

New open-weight models, real Apple Silicon benchmarks, and the one model worth running on your Mac this week. Free, one email a week, unsubscribe anytime.

By subscribing you agree to our Privacy Policy and to receive the weekly email. Unsubscribe anytime.

Have questions? Reach out on X/Twitter