Ollama is a free, open-source tool that makes it easy to run large language models locally on your Mac. It handles model downloads, GPU acceleration, and provides a simple command-line interface for running AI models.

Is Ollama free to use?

Yes, Ollama is completely free and open-source. You only pay for the electricity to run the models on your own hardware. Many popular models like Llama, Mistral, and Qwen are also free to use.

Does Ollama work on Apple Silicon?

Yes, Ollama works excellently on Apple Silicon Macs (M1, M2, M3, M4). It automatically uses Metal GPU acceleration for fast inference, making Macs some of the best devices for local AI.

How much storage do I need for Ollama models?

Model sizes vary: 1B models need ~1GB, 7B models need ~4-5GB, 13B models need ~8GB, and 70B models need ~40GB. We recommend at least 50GB free space to try various models comfortably.

How to Run AI Models Locally with Ollama

What is Ollama?

Ollama is a free, open-source tool that makes running large language models (LLMs) locally incredibly simple. Think of it as the “Docker for AI models” — it handles all the complexity of downloading, configuring, and running AI models on your own hardware.

🔒

Private

Your data never leaves your device

⚡

Fast

Native GPU acceleration on Apple Silicon

🆓

Free

Open-source with no usage limits

With Ollama, you can run popular models like Llama, Mistral, Qwen, and many others directly on your Mac — no internet connection required after initial download, no API keys, and no subscription fees.

System Requirements

Minimum Requirements

• macOS 11 Big Sur or later
• Apple Silicon Mac (M1 or newer)
• 8GB RAM (16GB recommended)
• 10GB free storage

Recommended for Best Experience

• macOS 14 Sonoma or later
• M2, M3, or M4 Mac
• 16GB+ RAM
• 50GB+ free storage

Note: While Ollama primarily targets Apple Silicon, experimental Intel Mac support exists but with significantly reduced performance. For the best experience, we strongly recommend using an M-series Mac.

Installation Steps

Download Ollama

Visit the official Ollama website and download the macOS version, or use the command line:

curl -fsSL https://ollama.com/install.sh | sh

The install script will download and install Ollama to /usr/local/bin

Verify Installation

Open a new Terminal window and verify Ollama is installed correctly:

ollama --version

You should see the version number, like ollama version 0.3.0

Start Ollama Service

Ollama runs as a background service. Start it with:

ollama serve

Keep this terminal window open, or run it in the background. On macOS, Ollama typically auto-starts after installation.

Downloading Your First Model

Ollama makes downloading models as simple as running a single command. Let's start with Llama 3.2, Meta's latest efficient model that works well on most Macs.

Download Llama 3.2 (3B)

ollama pull llama3.2:3b

This downloads the 3 billion parameter version (~2GB). Progress will be displayed during download.

Other great starter models to try:

ollama pull qwen2.5:7b— Excellent 7B model for coding and chat (4.5GB)

ollama pull mistral:7b— Popular, well-tested model (4.1GB)

ollama pull gemma2:2b— Google's efficient small model (1.6GB)

You can browse all available models at ollama.com/library.

Running the Model

Interactive Chat Mode

Start an interactive chat session with your downloaded model:

ollama run llama3.2:3b

You'll see a prompt where you can type messages. Press Ctrl+D or type /bye to exit.

Single Prompt Mode

Send a single prompt and get a response:

ollama run llama3.2:3b "Explain quantum computing"

Using the API

Ollama provides a local API for building applications:

curl http://localhost:11434/api/generate -d '{"model": "llama3.2:3b", "prompt": "Hello!"}'

Essential Commands

Command	Description
ollama list	Show all downloaded models
ollama pull <model>	Download a model
ollama run <model>	Run a model (downloads if needed)
ollama rm <model>	Remove a model to free space
ollama cp <src> <dst>	Copy a model
ollama show <model>	Display model information
ollama ps	Show running models

Troubleshooting

Model downloads are slow

Downloads happen directly from model hosts. Try using a VPN if your connection is slow, or download during off-peak hours. You can also resume interrupted downloads by running the pull command again.

Out of memory errors

Your Mac doesn't have enough RAM for the model you're trying to run. Try a smaller model (use 3B instead of 7B) or close other applications to free up memory. Models require roughly 1GB RAM per billion parameters.

Ollama command not found

The installation directory might not be in your PATH. Add /usr/local/bin to your PATH or restart your terminal. You can also try reinstalling using the official installer from ollama.com.

Connection refused errors

The Ollama service isn't running. Start it with ollama serve in a separate terminal window, or check if it's running with ollama ps.

Next Steps

Congratulations! You now have local AI running on your Mac. Here are some ways to expand your setup: