AI Model Families for Local Inference

Browse open-weight model families you can run locally with Ollama. Each family page shows all variants, RAM requirements, device compatibility, and performance expectations.

Local-Runnable Models

AI Model Families for Local Use

Alibaba Cloud

Qwen

Qwen is Alibaba Cloud's open-weight model family with the widest range of sizes, from 0.5B to 235B parameters. Known for strong multilingual performance and coding ability.

20 models·0.5B–235B

Widest size range (0.5B to 235B)Strong multilingual and coding performance

Meta

Llama

Llama is Meta's open-weight model family and the most popular choice for local AI. Known for strong general reasoning and a massive community ecosystem.

9 models·1B–405B

DeepSeek AI

DeepSeek

DeepSeek specializes in reasoning and coding models. DeepSeek R1 introduced chain-of-thought reasoning that rivals proprietary models, while V3 is a massive MoE model.

4 models·7B–671B

Best-in-class reasoning with R1 modelsStrong coding performance

Mistral AI

Mistral

Mistral AI's models are known for efficiency and strong performance relative to their size. Mistral 7B was a breakthrough that proved small models could compete with much larger ones.

5 models·7B–46.7B

Excellent performance-per-parameter ratioSliding window attention for efficiency

Google DeepMind

Gemma

Gemma is Google DeepMind's lightweight open model family. Known for excellent quality at small sizes and strong safety tuning.

12 models·1B–31B

Excellent quality at small sizes (1B-9B)Strong safety and instruction tuning

Microsoft

Phi

Phi is Microsoft's small-but-mighty model family, built on the idea that careful training data beats raw parameter count. Phi-4 Mini packs strong reasoning into just 3.8B parameters, while Phi-4 14B competes with much larger models on quality. Both run locally with Ollama and pair naturally with low-RAM Apple Silicon Macs.

4 models·3.8B–14B

Best quality-per-gigabyte at small sizesPhi-4 Mini 3.8B loads in just 3.2GB (7GB min RAM)

Liquid AI

LFM2

LFM2 is Liquid AI's efficiency-focused model family, built on a hybrid architecture rather than a standard dense transformer. Its flagship, LFM2 24B-A2B, is a sparse mixture-of-experts model that activates only 2B of its 24B parameters per token. That design makes it fast on consumer hardware and well suited to agent workflows, tool calling, and privacy-sensitive local setups.

2 models·8.3B–24B

Hybrid MoE design: 24B total parameters, only 2B active per tokenLoads in about 14GB, fits any 16GB Mac

Hugging Face

SmolLM

SmolLM is Hugging Face's ultra-tiny model family for the most constrained devices. SmolLM2 360M loads in about 0.5GB and runs on anything with 1GB of RAM, from old Macs and iPhones to embedded boards. It is the smallest model in our database and the fastest by speed score.

1 models·0.36B–0.36B

Tiny 360M-parameter model from Hugging FaceLoads in about 0.5GB, needs just 1GB of RAM

Explore by Hardware

Apple Devices

MacBooks, Mac Studio, iPhones

NVIDIA GPUs

RTX 3060 to RTX 5090

Ollama Setup

Install and run your first model

Benchmarks

Compare model performance