Local LLM Hardware Stats
How much memory each model tier needs, and what fits your device. Derived from ModelFit's 59 local models. Updated 2026-06-15.
How much RAM do you need to run a local LLM? An 8GB device runs small 3-4B models, a 16GB device comfortably runs ~9B models, 32GB unlocks 14B, and 64GB+ runs 30-70B models. At Q4_K_M a model needs roughly 0.6 GB per billion parameters, and ModelFit budgets ~70% of unified memory for the model. ModelFit tracks 59 local models across 17 families.
Model size by memory budget
| Memory (RAM / VRAM) | Model budget (~70%) | Runs up to | Models that fit | Strong pick |
|---|---|---|---|---|
| 8 GB | ~5.6 GB | ~8.3B params | 23 / 59 | DeepSeek-R1 Distill Qwen 7B |
| 16 GB | ~11.2 GB | ~14B params | 36 / 59 | Qwen2.5 Coder 14B |
| 24 GB | ~16.8 GB | ~27B params | 41 / 59 | Qwen2.5 Coder 14B |
| 32 GB | ~22.4 GB | ~35B params | 49 / 59 | Qwen3 30B |
| 48 GB | ~33.6 GB | ~46.7B params | 50 / 59 | Mixtral 8x7B Instruct |
| 64 GB | ~44.8 GB | ~70B params | 53 / 59 | Llama 3.1 70B Instruct |
| 128 GB | ~89.6 GB | ~122B params | 55 / 59 | Llama 3.1 70B Instruct |
Q4_K_M assumed. tok/s and fit are estimates from ModelFit's dataset, not measured benchmarks.
Key facts
- ModelFit tracks 90 AI models across 17 families; 59 run locally via Ollama on Apple Silicon or NVIDIA GPUs (ModelFit, 2026).
- At Q4_K_M quantization, a local LLM needs roughly 0.6 GB of memory per billion parameters (ModelFit, 2026).
- ModelFit sizes recommendations to ~70% of a device’s unified memory, leaving headroom for the OS, context, and KV-cache (ModelFit, 2026).
- A 8GB device comfortably runs local models up to ~8.3B parameters at Q4; 23 of ModelFit’s 59 local models fit (ModelFit, 2026).
- A 16GB device comfortably runs local models up to ~14B parameters at Q4; 36 of ModelFit’s 59 local models fit (ModelFit, 2026).
- A 24GB device comfortably runs local models up to ~27B parameters at Q4; 41 of ModelFit’s 59 local models fit (ModelFit, 2026).
- A 32GB device comfortably runs local models up to ~35B parameters at Q4; 49 of ModelFit’s 59 local models fit (ModelFit, 2026).
- A 48GB device comfortably runs local models up to ~46.7B parameters at Q4; 50 of ModelFit’s 59 local models fit (ModelFit, 2026).
- A 64GB device comfortably runs local models up to ~70B parameters at Q4; 53 of ModelFit’s 59 local models fit (ModelFit, 2026).
- A 128GB device comfortably runs local models up to ~122B parameters at Q4; 55 of ModelFit’s 59 local models fit (ModelFit, 2026).
Frequently asked questions
How much RAM do I need to run a local LLM?
8GB runs small 3-4B models, 16GB comfortably runs ~9B models (the sweet spot for most laptops), 32GB unlocks 14B, and 64GB or more runs 30-70B models. ModelFit sizes this at ~70% of unified memory, since a Q4 model needs roughly 0.6 GB per billion parameters.
What size LLM can I run on 16GB of RAM?
On 16GB you can comfortably run local models up to ~14B parameters at Q4 — 36 of ModelFit's 59 local models fit, with Qwen2.5 Coder 14B a strong pick.
Can 8GB of RAM run a local LLM?
Yes. 8GB runs 23 of ModelFit's local models — small 3-4B models up to ~8.3B at Q4. Expect tight headroom; close other apps for the best speed.
Cite this page
Free to reuse with attribution (CC BY 4.0). Full machine-readable data: the compatibility dataset and a JSON export.
ModelFit — Local LLM Hardware Stats (2026). https://modelfit.io/stats/ (accessed 2026-06-15).