2026-02-24

How to Install Ollama on Mac — Apple Silicon Guide (2026)

Want to run language models locally on your Mac? This guide shows you how to install Ollama and launch your first model in 5 minutes.

Ollama Setup on Mac Get started with local AI in minutes

Requirements

  • Mac with Apple Silicon (M1, M2, M3, or M4)
  • macOS 12.3+
  • At least 8GB RAM (16GB recommended)

Step 1: Install Ollama

The easiest way is via Homebrew:

brew install ollama

Or download the app directly from ollama.com.

Step 2: Verify Installation

ollama --version

You should see the installed version (e.g., 0.3.0).

Step 3: Download and Run a Model

Let's start with a lightweight yet capable model:

ollama run llama3.2:3b

The first run downloads the model (~2GB). After that, it launches instantly.

Step 4: Interact with the Model

Once launched, you have an interactive prompt:

>>> Explain programming in 3 simple sentences

Press Ctrl+D to exit.

Which Model to Choose?

For Beginners (8GB RAM)

ModelSizeBest For
Llama 3.2 3BFastGeneral chat
Gemma 2 2BVery lightBasic tasks

For Better Quality (16GB+ RAM)

ModelSizeBest For
Llama 3.1 8BExcellent quality/speedAll-round use
Qwen2.5 7BTop codingProgramming
Mistral 7BGood balanceGeneral use

For Power Users (24GB+ RAM)

ModelSizeBest For
Qwen3.5 35B-A3BNear-frontierComplex tasks
Llama 3.1 70BHigh qualityResearch
Different models for different needs and hardware

Useful Commands

List installed models

ollama list

Remove a model

ollama rm llama3.2:3b

Run without interactive mode

ollama run llama3.2:3b "Explain photosynthesis"

API Server

ollama serve

Ollama starts a local API server at http://localhost:11434.

Integration with modelfit.io

Use modelfit.io to:

  • Find the best model for your exact configuration
  • Compare speeds (tokens/second)
  • See RAM usage for each model

Common Issues

"Error: model not found"

Make sure you've downloaded the model:

ollama pull llama3.2:3b

Extreme slowness

First tokens can be slow. Generation improves after a few seconds.

Mac heating up

This is normal — LLMs stress the GPU. Use a cooling stand for long sessions.

Next Steps

  • Explore available models
  • Try different quantizations (Q4, Q8)
  • Integrate Ollama into your apps via API
Related: Check our detailed Ollama setup guide for advanced configuration, see the best LLMs for MacBook, or compare MacBook Air vs Pro for LLMs.

Frequently Asked Questions

How much RAM do I need to run Ollama on Mac?

A minimum of 8GB RAM runs small models (3B parameters). 16GB is recommended for quality 7-8B models like Llama 3.1 8B. 24GB+ opens access to powerful models like Qwen3.5-35B-A3B. Check your Mac's capabilities on our device pages.

Is Ollama free to use?

Yes. Ollama is completely free and open-source. All models run locally on your hardware with zero API costs. You can download and run unlimited models with no subscription or usage limits.

Which Ollama model should I start with?

For 8GB Macs, start with ollama run llama3.2:3b (fast, 2GB download). For 16GB+ Macs, ollama run llama3.1:8b offers much better quality. See our benchmark page for quality comparisons across models.

Does Ollama work on Intel Macs?

Ollama works on Intel Macs but performance is significantly lower without Apple Silicon's unified memory and Neural Engine. For Intel Macs, stick to small models (3B-7B). Apple Silicon Macs (M1 or newer) are strongly recommended.

How do I run Ollama as a background API server?

Run ollama serve to start a local API server at http://localhost:11434. This enables integration with tools like Open WebUI, Continue.dev, and other applications. The API follows the OpenAI-compatible format.

---

Guide updated February 24, 2026. For personalized recommendations, visit modelfit.io.

Have questions? Reach out on X/Twitter