How Much RAM Do You Need for a Local LLM?
At Q4_K_M, a local LLM needs ~0.6 GB of memory per billion parameters. 8GB runs up to ~7B, 16GB up to ~14B, 24GB up to ~27B, 32GB up to ~32B, and 64GB up to ~70B. The full model-size-to-memory matrix is below — figures are ModelFit estimates, not measured benchmarks.
A local LLM needs roughly 0.6 GB of RAM per billion parameters at Q4_K_M quantization. Add ~30% headroom for the OS, app windows, and context. So an 8 GB device runs up to ~7B models, 16 GB up to ~14B, 24 GB up to ~27B, 32 GB up to ~32B, and 64 GB up to ~70B. ModelFit tracks 59 local models across 17 families against these tiers.
Free to cite with attribution to ModelFit (modelfit.io). Sizing is a first-party estimate from quantization math, not a measured benchmark.
Model Size to RAM Matrix (Q4 and Q8)
How much memory each model size loads at Q4_K_M (the common default) and Q8 (higher quality), plus the smallest unified-memory tier that fits it with headroom. The same per-parameter math applies to GPU VRAM.
| Parameters | Q4_K_M Size | Q8 Size | Min Unified RAM (Q4) |
|---|---|---|---|
| 1B | ~0.6 GB | ~1.1 GB | 8 GB |
| 3B | ~1.8 GB | ~3.2 GB | 8 GB |
| 7B | ~4.2 GB | ~7.4 GB | 8 GB |
| 8B | ~4.8 GB | ~8.5 GB | 8 GB |
| 9B | ~5.4 GB | ~9.5 GB | 8 GB |
| 13B | ~7.8 GB | ~13.8 GB | 16 GB |
| 14B | ~8.4 GB | ~14.8 GB | 16 GB |
| 27B | ~16.2 GB | ~28.6 GB | 24 GB |
| 32B | ~19.2 GB | ~33.9 GB | 32 GB |
| 70B | ~42 GB | ~74.2 GB | 64 GB |
| 122B | ~73.2 GB | ~129.3 GB | 128 GB |
Estimates from quantization math (~0.6 GB/B at Q4, ~1.06 GB/B at Q8) and ModelFit's 70% memory budget. Real usage varies with context length and the model's architecture.
What Each RAM Tier Runs in ModelFit's Catalog
Drawn straight from ModelFit's 59-model local catalog: the largest model each tier fits, how many models qualify, and the highest-quality pick. See the full breakdown on the hardware stats page.
| Device RAM | Model Budget | Max Params | Models That Fit | Top-Quality Pick |
|---|---|---|---|---|
| 8 GB | ~5.6 GB | ~8.3B | 23 | DeepSeek-R1 Distill Qwen 7B |
| 16 GB | ~11.2 GB | ~14B | 36 | Qwen2.5 Coder 14B |
| 24 GB | ~16.8 GB | ~27B | 41 | Qwen2.5 Coder 14B |
| 32 GB | ~22.4 GB | ~35B | 49 | Qwen3 30B |
| 48 GB | ~33.6 GB | ~46.7B | 50 | Mixtral 8x7B Instruct |
| 64 GB | ~44.8 GB | ~70B | 53 | Llama 3.1 70B Instruct |
| 128 GB | ~89.6 GB | ~122B | 55 | Llama 3.1 70B Instruct |
Can 8 GB Run a Local LLM?
Yes — an 8 GB device runs 3B-7B models comfortably (up to ~8.3B; 23 models fit). The catch is headroom: a 7B model loads ~4.2 GB, leaving little for macOS and your browser. Close other apps during inference, or pick a 3B-4B model for a smoother experience. The base MacBook Air, Mac Mini, and most iPhones live here — see MacBook Air picks.
What Fits in 16 GB?
16 GB is the sweet spot for local AI: it comfortably runs 7B-9B models (up to ~14B; 36 models fit) with room for context and other apps. A 14B model fits but leaves little headroom on a 16 GB machine. For the jump to 14B-27B models, step up to 24-32 GB — compare the tiers on the 16 GB vs 32 GB breakdown.
How to Size a Model to Your RAM
- Estimate the load. Parameter count × ~0.6 GB for Q4_K_M. A 7B model ≈ 4.2 GB; a 14B ≈ 8.4 GB.
- Add headroom. Reserve ~30% of your memory for macOS, apps, and the KV-cache.
- Match your tier. Use the matrix above, or run the ModelFit wizard for your exact chip and RAM.
- Pull it. Install Ollama (setup guide) and run the one-command pull for your pick.
Frequently Asked Questions
How much RAM do I need to run a local LLM?
At Q4_K_M quantization a local LLM needs roughly 0.6 GB of memory per billion parameters, plus headroom for the OS and context. In practice, 8 GB runs up to ~7B models, 16 GB up to ~14B, 24 GB up to ~27B, 32 GB up to ~32B, and 64 GB up to ~70B. ModelFit tracks 59 local models across these tiers.
Can 8 GB of RAM run a local LLM?
Yes. An 8 GB device runs local models up to ~8.3B parameters at Q4 — 23 of ModelFit's 59 local models fit its ~5.6 GB budget. The top-quality fit is DeepSeek-R1 Distill Qwen 7B. Stick to 3B-7B models and close other apps for the smoothest experience.
What size LLM fits in 16 GB of RAM?
A 16 GB device comfortably runs local models up to ~14B parameters at Q4, with 36 of ModelFit's 59 local models fitting its ~11.2 GB budget. A strong pick is Qwen2.5 Coder 14B (~11 GB loaded). 16 GB is the sweet spot for 7B-9B models.
Does quantization change how much RAM I need?
Yes. Q4_K_M (the common default) needs ~0.6 GB per billion parameters. Q8 roughly doubles that to ~1.06 GB/B for higher quality, and full FP16 needs ~2 GB/B. Lower quantization saves memory at a small quality cost — Q4_K_M is the best balance for most local setups.
Is unified memory (Mac) different from VRAM (GPU) for LLMs?
For sizing, the per-parameter rule is the same. The difference is the pool: Apple Silicon shares all unified memory between CPU and GPU, so a 32 GB Mac can dedicate ~22 GB to a model. A 12 GB GPU is hard-capped at 12 GB of VRAM — exceed it and the model spills to system RAM and slows dramatically.
Size It for Your Hardware
Ranked picks for fanless Apple Silicon
MacBook Pro (16-128 GB)14B-70B models with active cooling
Mac Studio (64-512 GB)70B+ workstation-class local AI
NVIDIA GPUs (8-32 GB VRAM)VRAM-to-model fit, RTX 3060 to 5090
Best LLM for MacBook15 models ranked by RAM tier
Hardware Stats & DatasetCitable RAM-tier facts + open dataset