AI Model Families for Local Inference

Browse open-weight model families you can run locally with Ollama. Each family page shows all variants, RAM requirements, device compatibility, and performance expectations.

Local-Runnable Models

AI Model Families for Local Use

Alibaba Cloud
Qwen

Qwen is Alibaba Cloud's open-weight model family with the widest range of sizes, from 0.5B to 235B parameters. Known for strong multilingual performance and coding ability.

20 models·0.5B–235B
Widest size range (0.5B to 235B)Strong multilingual and coding performance
Meta
Llama

Llama is Meta's open-weight model family and the most popular choice for local AI. Known for strong general reasoning and a massive community ecosystem.

9 models·1B–405B
Most popular open-weight model familyStrong general reasoning and instruction following
DeepSeek AI
DeepSeek

DeepSeek specializes in reasoning and coding models. DeepSeek R1 introduced chain-of-thought reasoning that rivals proprietary models, while V3 is a massive MoE model.

4 models·7B–671B
Best-in-class reasoning with R1 modelsStrong coding performance
Mistral AI
Mistral

Mistral AI's models are known for efficiency and strong performance relative to their size. Mistral 7B was a breakthrough that proved small models could compete with much larger ones.

5 models·7B–46.7B
Excellent performance-per-parameter ratioSliding window attention for efficiency
Google DeepMind
Gemma

Gemma is Google DeepMind's lightweight open model family. Known for excellent quality at small sizes and strong safety tuning.

12 models·1B–31B
Excellent quality at small sizes (1B-9B)Strong safety and instruction tuning
Microsoft
Phi

Phi is Microsoft's small-but-mighty model family, built on the idea that careful training data beats raw parameter count. Phi-4 Mini packs strong reasoning into just 3.8B parameters, while Phi-4 14B competes with much larger models on quality. Both run locally with Ollama and pair naturally with low-RAM Apple Silicon Macs.

4 models·3.8B–14B
Best quality-per-gigabyte at small sizesPhi-4 Mini 3.8B loads in just 3.2GB (7GB min RAM)
Liquid AI
LFM2

LFM2 is Liquid AI's efficiency-focused model family, built on a hybrid architecture rather than a standard dense transformer. Its flagship, LFM2 24B-A2B, is a sparse mixture-of-experts model that activates only 2B of its 24B parameters per token. That design makes it fast on consumer hardware and well suited to agent workflows, tool calling, and privacy-sensitive local setups.

2 models·8.3B–24B
Hybrid MoE design: 24B total parameters, only 2B active per tokenLoads in about 14GB, fits any 16GB Mac
Hugging Face
SmolLM

SmolLM is Hugging Face's ultra-tiny model family for the most constrained devices. SmolLM2 360M loads in about 0.5GB and runs on anything with 1GB of RAM, from old Macs and iPhones to embedded boards. It is the smallest model in our database and the fastest by speed score.

1 models·0.36B–0.36B
Tiny 360M-parameter model from Hugging FaceLoads in about 0.5GB, needs just 1GB of RAM

Explore by Hardware