AI Model Families for Local Inference

Browse open-weight model families you can run locally with Ollama. Each family page shows all variants, RAM requirements, device compatibility, and performance expectations.

🧠

Local-Runnable Models

Qwen

Alibaba Cloud

19 models|0.5B – 235B

Qwen is Alibaba Cloud's open-weight model family with the widest range of sizes, from 0.5B to 235B parameters. Known for strong multilingual performance and coding ability.

Widest size range (0.5B to 235B)Strong multilingual and coding performance

Llama

DeepSeek

DeepSeek AI

4 models|7B – 671B

DeepSeek specializes in reasoning and coding models. DeepSeek R1 introduced chain-of-thought reasoning that rivals proprietary models, while V3 is a massive MoE model.

Best-in-class reasoning with R1 modelsStrong coding performance

Mistral

Mistral AI

5 models|7B – 46.7B

Mistral AI's models are known for efficiency and strong performance relative to their size. Mistral 7B was a breakthrough that proved small models could compete with much larger ones.

Excellent performance-per-parameter ratioSliding window attention for efficiency

Gemma

Google DeepMind

7 models|1B – 27B

Gemma is Google DeepMind's lightweight open model family. Known for excellent quality at small sizes and strong safety tuning.

Excellent quality at small sizes (1B-9B)Strong safety and instruction tuning

Phi

Microsoft

4 models|3.8B – 14B

Phi is Microsoft's small-but-mighty model family. Phi-4 Mini punches far above its weight at just 3.8B parameters, rivaling 7B models from other families.

Best quality-per-parameter in small sizesStrong reasoning for its size

LFM2

Liquid AI

2 models|8B – 24B

LFM2 by Liquid AI uses a novel liquid neural network architecture. These models are designed for agentic workflows and tool use with efficient inference.

Novel liquid neural network architectureOptimized for agentic tool use

SmolLM

Hugging Face

1 models|0.36B – 0.36B

SmolLM is Hugging Face's ultra-tiny model designed for the most constrained devices. At just 360M parameters, it runs on virtually anything.

Ultra-tiny at 360M parametersRuns on any device with 1GB free RAM

Explore by Hardware

Apple Devices

MacBooks, Mac Studio, iPhones

NVIDIA GPUs

RTX 3060 to RTX 5090

Ollama Setup

Install and run your first model

Benchmarks

Compare model performance