Best NVIDIA GPUs for Local AI
Find the right GPU for running AI models locally with Ollama. From budget 12GB cards to the 32GB RTX 5090, compare speeds, VRAM, and model compatibility.
$
Budget
$$
Mid-Range
RTX 4070
MEMORY
12 GB GDDR6X
SPEED
52 tok/s
ARCH
Ada Lovelace
PRICE
$579
Up to 9B parameter models
RTX 4070 SUPER
MEMORY
12 GB GDDR6X
SPEED
56 tok/s
ARCH
Ada Lovelace
PRICE
$759
Up to 9B parameter models
RTX 5070
MEMORY
12 GB GDDR7
SPEED
59 tok/s
ARCH
Blackwell
PRICE
$579
Up to 9B parameter models
RTX 5070 Ti
MEMORY
16 GB GDDR7
SPEED
87 tok/s
ARCH
Blackwell
PRICE
$749
Up to 14B parameter models
RTX 4070 Ti SUPER
MEMORY
16 GB GDDR6X
SPEED
72 tok/s
ARCH
Ada Lovelace
PRICE
$1,148
Up to 14B parameter models
$$$
High-End
RTX 4080 SUPER
MEMORY
16 GB GDDR6X
SPEED
79 tok/s
ARCH
Ada Lovelace
PRICE
$1,597
Up to 14B parameter models
RTX 5080
MEMORY
16 GB GDDR7
SPEED
94 tok/s
ARCH
Blackwell
PRICE
$999
Up to 14B parameter models
RTX 3090
MEMORY
24 GB GDDR6X
SPEED
87 tok/s
ARCH
Ampere
PRICE
$900*
Up to 32B parameter models
RTX 4090
MEMORY
24 GB GDDR6X
SPEED
104 tok/s
ARCH
Ada Lovelace
PRICE
$2,574
Up to 32B parameter models
$$$$
Ultra
VRAM Guide: What Models Can You Run?
| VRAM | Max Model Size | Example Models |
|---|---|---|
| 12 GB | Up to 9B (Q4) | Qwen2.5 7B, Llama 3.2 8B, Mistral 7B |
| 16 GB | Up to 14B-27B (Q4) | Qwen2.5 14B, DeepSeek-R1 14B |
| 24 GB | Up to 32B (Q4) | Qwen2.5 32B, DeepSeek-R1 32B |
| 32 GB | Up to 70B (Q4) | Llama 3.1 70B, Qwen2.5 72B |
Have an Apple Silicon Mac Instead?
ModelFit also supports MacBook Air, MacBook Pro, Mac Studio, Mac Mini, and iPhone.
Open ModelFit Wizard