How Much RAM Do You Need for a Local LLM?

At Q4_K_M, a local LLM needs ~0.6 GB of memory per billion parameters. 8GB runs up to ~7B, 16GB up to ~14B, 24GB up to ~27B, 32GB up to ~32B, and 64GB up to ~70B. The full model-size-to-memory matrix is below — figures are ModelFit estimates, not measured benchmarks.

By ModelFit Team · Updated 2026-06-15
Quick answer

A local LLM needs roughly 0.6 GB of RAM per billion parameters at Q4_K_M quantization. Add ~30% headroom for the OS, app windows, and context. So an 8 GB device runs up to ~7B models, 16 GB up to ~14B, 24 GB up to ~27B, 32 GB up to ~32B, and 64 GB up to ~70B. ModelFit tracks 59 local models across 17 families against these tiers.

Free to cite with attribution to ModelFit (modelfit.io). Sizing is a first-party estimate from quantization math, not a measured benchmark.

8 GB
~8.3B max
16 GB
~14B max
24 GB
~27B max
32 GB
~35B max

Model Size to RAM Matrix (Q4 and Q8)

How much memory each model size loads at Q4_K_M (the common default) and Q8 (higher quality), plus the smallest unified-memory tier that fits it with headroom. The same per-parameter math applies to GPU VRAM.

ParametersQ4_K_M SizeQ8 SizeMin Unified RAM (Q4)
1B~0.6 GB~1.1 GB8 GB
3B~1.8 GB~3.2 GB8 GB
7B~4.2 GB~7.4 GB8 GB
8B~4.8 GB~8.5 GB8 GB
9B~5.4 GB~9.5 GB8 GB
13B~7.8 GB~13.8 GB16 GB
14B~8.4 GB~14.8 GB16 GB
27B~16.2 GB~28.6 GB24 GB
32B~19.2 GB~33.9 GB32 GB
70B~42 GB~74.2 GB64 GB
122B~73.2 GB~129.3 GB128 GB

Estimates from quantization math (~0.6 GB/B at Q4, ~1.06 GB/B at Q8) and ModelFit's 70% memory budget. Real usage varies with context length and the model's architecture.

What Each RAM Tier Runs in ModelFit's Catalog

Drawn straight from ModelFit's 59-model local catalog: the largest model each tier fits, how many models qualify, and the highest-quality pick. See the full breakdown on the hardware stats page.

Device RAMModel BudgetMax ParamsModels That FitTop-Quality Pick
8 GB~5.6 GB~8.3B23DeepSeek-R1 Distill Qwen 7B
16 GB~11.2 GB~14B36Qwen2.5 Coder 14B
24 GB~16.8 GB~27B41Qwen2.5 Coder 14B
32 GB~22.4 GB~35B49Qwen3 30B
48 GB~33.6 GB~46.7B50Mixtral 8x7B Instruct
64 GB~44.8 GB~70B53Llama 3.1 70B Instruct
128 GB~89.6 GB~122B55Llama 3.1 70B Instruct

Can 8 GB Run a Local LLM?

Yes — an 8 GB device runs 3B-7B models comfortably (up to ~8.3B; 23 models fit). The catch is headroom: a 7B model loads ~4.2 GB, leaving little for macOS and your browser. Close other apps during inference, or pick a 3B-4B model for a smoother experience. The base MacBook Air, Mac Mini, and most iPhones live here — see MacBook Air picks.

What Fits in 16 GB?

16 GB is the sweet spot for local AI: it comfortably runs 7B-9B models (up to ~14B; 36 models fit) with room for context and other apps. A 14B model fits but leaves little headroom on a 16 GB machine. For the jump to 14B-27B models, step up to 24-32 GB — compare the tiers on the 16 GB vs 32 GB breakdown.

How to Size a Model to Your RAM

  1. Estimate the load. Parameter count × ~0.6 GB for Q4_K_M. A 7B model ≈ 4.2 GB; a 14B ≈ 8.4 GB.
  2. Add headroom. Reserve ~30% of your memory for macOS, apps, and the KV-cache.
  3. Match your tier. Use the matrix above, or run the ModelFit wizard for your exact chip and RAM.
  4. Pull it. Install Ollama (setup guide) and run the one-command pull for your pick.

Frequently Asked Questions

How much RAM do I need to run a local LLM?

At Q4_K_M quantization a local LLM needs roughly 0.6 GB of memory per billion parameters, plus headroom for the OS and context. In practice, 8 GB runs up to ~7B models, 16 GB up to ~14B, 24 GB up to ~27B, 32 GB up to ~32B, and 64 GB up to ~70B. ModelFit tracks 59 local models across these tiers.

Can 8 GB of RAM run a local LLM?

Yes. An 8 GB device runs local models up to ~8.3B parameters at Q4 — 23 of ModelFit's 59 local models fit its ~5.6 GB budget. The top-quality fit is DeepSeek-R1 Distill Qwen 7B. Stick to 3B-7B models and close other apps for the smoothest experience.

What size LLM fits in 16 GB of RAM?

A 16 GB device comfortably runs local models up to ~14B parameters at Q4, with 36 of ModelFit's 59 local models fitting its ~11.2 GB budget. A strong pick is Qwen2.5 Coder 14B (~11 GB loaded). 16 GB is the sweet spot for 7B-9B models.

Does quantization change how much RAM I need?

Yes. Q4_K_M (the common default) needs ~0.6 GB per billion parameters. Q8 roughly doubles that to ~1.06 GB/B for higher quality, and full FP16 needs ~2 GB/B. Lower quantization saves memory at a small quality cost — Q4_K_M is the best balance for most local setups.

Is unified memory (Mac) different from VRAM (GPU) for LLMs?

For sizing, the per-parameter rule is the same. The difference is the pool: Apple Silicon shares all unified memory between CPU and GPU, so a 32 GB Mac can dedicate ~22 GB to a model. A 12 GB GPU is hard-capped at 12 GB of VRAM — exceed it and the model spills to system RAM and slows dramatically.

Size It for Your Hardware

Want the exact model for your RAM?
The wizard sizes a pick to your chip and memory in seconds.
Open the wizard