Best NVIDIA GPUs for Local AI

Find the right GPU for running AI models locally with Ollama. From budget 12GB cards to the 32GB RTX 5090, compare speeds, VRAM, and model compatibility.

Estimated Speed

Qwen3 8B Q4_K @ 16K context

RTX 5090

87 t/s RTX 4080 SUPER

79 t/s RTX 4070 Ti SUPER

72 t/s RTX 5070

59 t/s RTX 4070 SUPER

$

Budget

RTX 4060

Up to 8B parameter models

RTX 3060

Up to 9B parameter models

RTX 4060 Ti

Up to 14B parameter models

RTX 5060 Ti

Up to 14B parameter models

$$

Mid-Range

RTX 4070

Up to 9B parameter models

RTX 4070 SUPER

Up to 9B parameter models

RTX 5070

Up to 9B parameter models

RTX 5070 Ti

Up to 14B parameter models

RTX 4070 Ti SUPER

Up to 14B parameter models

$$$

High-End

RTX 4080 SUPER

Up to 14B parameter models

RTX 5080

Up to 14B parameter models

RTX 3090

Up to 32B parameter models

RTX 4090

Up to 32B parameter models

$$$$

Ultra

RTX 5090

Up to 70B parameter models

RTX PRO 6000

Up to 120B parameter models

Running Two or Three Cards? Pool the VRAM

GPU 1GPU 2GPU 3

Pick at least two cards (duplicates allowed, e.g. 2x RTX 5060 Ti) to see what the pooled VRAM runs.

Pooling works: Ollama and llama.cpp split layers across cards automatically. With the default layer split, throughput stays near single-card speed and mixed cards run at the slower card's pace. Two identical cards using llama.cpp's row (tensor) split can decode faster than one card, since both memory buses read weights in parallel, though transfer overhead keeps it below 2x. Mixing NVIDIA and AMD in one rig requires the Vulkan build. Fit is what pooling reliably buys you; per-card speed estimates on the GPU pages are single-card only. Long context also eats into the budget (KV-cache), so leave headroom.

VRAM Guide: What Models Can You Run?

VRAM	Max Model Size	Example Models
12 GB	Up to 9B (Q4)	Qwen2.5 7B, Llama 3.2 8B, Mistral 7B
16 GB	Up to 14B-27B (Q4)	Qwen2.5 14B, DeepSeek-R1 14B
24 GB	Up to 32B (Q4)	Qwen2.5 32B, DeepSeek-R1 32B
32 GB	Up to 70B (Q4)	Llama 3.1 70B, Qwen2.5 72B

Explore More

How Much RAM / VRAM?

Model-size-to-memory matrix for every tier

Best LLM for MacBook

Apple Silicon picks by RAM tier, M1-M5

Browse by Model

Qwen, Llama, DeepSeek, Mistral & more

Apple Devices

MacBooks, Mac Studio, iPhones

Have an Apple Silicon Mac Instead?

ModelFit also supports MacBook Air, MacBook Pro, Mac Studio, Mac Mini, and iPhone.

Open ModelFit Wizard