Which AI models run on a NVIDIA RTX 5090?

With 32 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~32.0 GB system RAM assumed), ranked by popularity.

VRAM

32 GB

Vendor

NVIDIA

Fits in VRAM

38 models

Assumed RAM

32.0 GB

The NVIDIA RTX 5090 comes with 32 GB of VRAM. Among the popular GGUF models we track, it can run 38 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Qwen3-4B-GGUF, gpt-oss-20b-GGUF.

Larger models such as gpt-oss-120b-GGUF still run on a NVIDIA RTX 5090 but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.

New to this? Read: How much VRAM do you need?

Model	Size	Quant.	Quality	Memory	Speed~	Verdict
hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF	1.24B	Q8_0	Excellent	2.57 GB	325.1 t/s	Fits in VRAM
Qwen/Qwen3-4B-GGUF	4.02B	Q8_0	Excellent	5.75 GB	100.3 t/s	Fits in VRAM
unsloth/gpt-oss-20b-GGUF	20.91B	F16	Very good	13.83 GB	31.1 t/s	Fits in VRAM
janhq/Jan-v3.5-4B-gguf	4.41B	GGUF	Excellent	10.04 GB	48.6 t/s	Fits in VRAM
bartowski/gemma-2-2b-it-GGUF	2.61B	F32	Excellent	11.33 GB	41.0 t/s	Fits in VRAM
MaziyarPanahi/Qwen3-0.6B-GGUF	0.75B	GGUF	Excellent	2.62 GB	284.6 t/s	Fits in VRAM
unsloth/Qwen3-Coder-Next-GGUF	79.67B	IQ3_XXS	Low	31.63 GB	15.1 t/s	Fits in VRAM
MaziyarPanahi/Qwen3-14B-GGUF	14.77B	GGUF	Excellent	30.17 GB	14.5 t/s	Fits in VRAM
MaziyarPanahi/Qwen3-8B-GGUF	8.19B	GGUF	Excellent	17.44 GB	26.2 t/s	Fits in VRAM
MaziyarPanahi/Qwen3-32B-GGUF	32.76B	Q6_K	Excellent	28.6 GB	16.0 t/s	Fits in VRAM
MaziyarPanahi/Qwen3-1.7B-GGUF	2.03B	GGUF	Excellent	5.28 GB	105.5 t/s	Fits in VRAM
MaziyarPanahi/Qwen3-30B-A3B-GGUF	30.53B	Q6_K	Excellent	26.84 GB	17.1 t/s	Fits in VRAM
bartowski/Meta-Llama-3.1-8B-Instruct-GGUF	8.03B	Q8_0	Excellent	10.12 GB	50.3 t/s	Fits in VRAM
Qwen/Qwen2.5-Coder-32B-Instruct-GGUF	32.76B	Q2_K	Very good	26.5 GB	17.4 t/s	Fits in VRAM
Qwen/Qwen2.5-1.5B-Instruct-GGUF	1.78B	GGUF	Excellent	4.76 GB	120.6 t/s	Fits in VRAM
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF	30.53B	Q6_K_XL	Excellent	28.0 GB	16.3 t/s	Fits in VRAM
MaziyarPanahi/Phi-3.5-mini-instruct-GGUF	3.82B	Q8_0	Excellent	5.53 GB	105.8 t/s	Fits in VRAM
Qwen/Qwen2.5-3B-Instruct-GGUF	3.4B	GGUF	Excellent	8.02 GB	63.2 t/s	Fits in VRAM
bartowski/Llama-3.2-3B-Instruct-GGUF	3.21B	F16	Excellent	7.66 GB	66.8 t/s	Fits in VRAM
Qwen/Qwen2.5-0.5B-Instruct-GGUF	0.63B	GGUF	Excellent	2.36 GB	339.1 t/s	Fits in VRAM
MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF	4.02B	GGUF	Excellent	9.27 GB	53.3 t/s	Fits in VRAM
LiquidAI/LFM2.5-8B-A1B-GGUF	8.47B	BF16	Excellent	17.99 GB	25.3 t/s	Fits in VRAM
MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF	7.25B	GGUF	Excellent	15.6 GB	29.6 t/s	Fits in VRAM
MaziyarPanahi/gemma-3-4b-it-GGUF	3.88B	GGUF	Excellent	8.98 GB	55.3 t/s	Fits in VRAM
MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF	8.03B	GGUF	Excellent	17.13 GB	26.7 t/s	Fits in VRAM
MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF	140.62B	IQ1_S	Very low	29.28 GB	14.5 t/s	Fits in VRAM
MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF	7.62B	GGUF	Excellent	16.32 GB	28.2 t/s	Fits in VRAM
MaziyarPanahi/Phi-4-mini-instruct-GGUF	3.84B	GGUF	Excellent	8.9 GB	55.9 t/s	Fits in VRAM
MaziyarPanahi/Llama-3.3-70B-Instruct-GGUF	70.55B	Q2_K	Low	29.42 GB	16.3 t/s	Fits in VRAM
MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF	1.48B	GGUF	Excellent	4.14 GB	145.4 t/s	Fits in VRAM
MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF	8.19B	GGUF	Excellent	17.44 GB	26.2 t/s	Fits in VRAM
MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF	12.25B	GGUF	Excellent	25.31 GB	17.5 t/s	Fits in VRAM
MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF	8.03B	GGUF	Excellent	17.13 GB	26.7 t/s	Fits in VRAM
MaziyarPanahi/gemma-3-1b-it-GGUF	1.0B	GGUF	Excellent	3.15 GB	214.0 t/s	Fits in VRAM
MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF	70.55B	Q2_K	Low	29.42 GB	16.3 t/s	Fits in VRAM
TheBloke/Mistral-7B-Instruct-v0.2-GGUF	7.24B	Q8_0	Excellent	9.27 GB	55.8 t/s	Fits in VRAM
MaziyarPanahi/Yi-Coder-9B-Chat-GGUF	8.83B	GGUF	Excellent	18.68 GB	24.3 t/s	Fits in VRAM
MaziyarPanahi/gemma-3-12b-it-GGUF	11.77B	GGUF	Excellent	24.38 GB	18.2 t/s	Fits in VRAM
unsloth/gpt-oss-120b-GGUF	116.83B	F16	Good	61.96 GB	0.8 t/s	Offload

"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.

Frequently asked questions

How much VRAM does the NVIDIA RTX 5090 have?

The NVIDIA RTX 5090 has 32 GB of VRAM, which determines how large a model it can run entirely on the GPU.

What is the best LLM to run on a NVIDIA RTX 5090?

Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA RTX 5090 using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.

Can a NVIDIA RTX 5090 run a 7–8B model?

Yes. A 7–8B model like Qwen3-8B-GGUF fits entirely in the 32 GB of a NVIDIA RTX 5090 (GGUF).

Can a NVIDIA RTX 5090 run a 13–14B model?

Yes. A 13–14B model like Qwen3-14B-GGUF fits entirely in the 32 GB of a NVIDIA RTX 5090 (GGUF).

Can a NVIDIA RTX 5090 run a 70B model?

Yes. A 70B model like Qwen3-Coder-Next-GGUF fits entirely in the 32 GB of a NVIDIA RTX 5090 (IQ3_XXS).

Another graphics card

NVIDIA RTX 4090 24 GB NVIDIA RTX 3090 Ti 24 GB NVIDIA RTX 3090 24 GB NVIDIA RTX 5080 16 GB NVIDIA RTX 5070 Ti 16 GB NVIDIA RTX 4080 Super 16 GB NVIDIA RTX 4080 16 GB NVIDIA RTX 4070 Ti Super 16 GB NVIDIA RTX 5060 Ti 16 GB 16 GB NVIDIA RTX 4060 Ti 16 GB 16 GB NVIDIA RTX 5070 12 GB NVIDIA RTX 4070 Ti 12 GB NVIDIA RTX 4070 Super 12 GB NVIDIA RTX 4070 12 GB NVIDIA RTX 3080 Ti 12 GB NVIDIA RTX 3060 12 GB 12 GB NVIDIA RTX 2080 Ti 11 GB NVIDIA RTX 3080 10 GB NVIDIA RTX 5060 8 GB NVIDIA RTX 4060 Ti 8 GB 8 GB NVIDIA RTX 4060 8 GB NVIDIA RTX 3070 Ti 8 GB NVIDIA RTX 3070 8 GB NVIDIA RTX 3060 Ti 8 GB NVIDIA RTX 2080 Super 8 GB NVIDIA RTX 2070 Super 8 GB NVIDIA RTX 2060 Super 8 GB NVIDIA RTX 3050 8 GB NVIDIA RTX 2060 6 GB NVIDIA GTX 1660 Ti 6 GB NVIDIA GTX 1660 Super 6 GB NVIDIA GTX 1660 6 GB NVIDIA GTX 1650 4 GB AMD Radeon RX 7900 XTX 24 GB AMD Radeon RX 7900 XT 20 GB AMD Radeon RX 7800 XT 16 GB AMD Radeon RX 7600 XT 16 GB AMD Radeon RX 6950 XT 16 GB AMD Radeon RX 6800 XT 16 GB AMD Radeon RX 6800 16 GB AMD Radeon RX 7700 XT 12 GB AMD Radeon RX 6750 XT 12 GB AMD Radeon RX 6700 XT 12 GB AMD Radeon RX 7600 8 GB AMD Radeon RX 6650 XT 8 GB AMD Radeon RX 6600 8 GB Intel Arc A770 16 GB Intel Arc B580 12 GB Intel Arc A750 8 GB