Which AI models run on a NVIDIA GTX 1650?

With 4 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~16.0 GB system RAM assumed), ranked by popularity.

VRAM
4 GB
Vendor
NVIDIA
Fits in VRAM
17 models
Assumed RAM
16.0 GB

The NVIDIA GTX 1650 comes with 4 GB of VRAM. Among the popular GGUF models we track, it can run 17 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Jan-v3.5-4B-gguf, gemma-2-2b-it-GGUF.

Larger models such as Qwen3-4B-GGUF still run on a NVIDIA GTX 1650 but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.

New to this? Read: How much VRAM do you need?

ModelSize Quant.Quality MemorySpeed~ Verdict
hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF 1.24B Q8_0 Excellent 2.57 GB 325.1 t/s Fits in VRAM
janhq/Jan-v3.5-4B-gguf 4.41B Q3_K_M Fair 3.9 GB 191.5 t/s Fits in VRAM
bartowski/gemma-2-2b-it-GGUF 2.61B Q6_K_L Excellent 3.72 GB 187.2 t/s Fits in VRAM
MaziyarPanahi/Qwen3-0.6B-GGUF 0.75B GGUF Excellent 2.62 GB 284.6 t/s Fits in VRAM
MaziyarPanahi/Qwen3-1.7B-GGUF 2.03B Q6_K Excellent 3.05 GB 256.7 t/s Fits in VRAM
Qwen/Qwen2.5-1.5B-Instruct-GGUF 1.78B Q8_0 Excellent 3.21 GB 226.7 t/s Fits in VRAM
MaziyarPanahi/Phi-3.5-mini-instruct-GGUF 3.82B Q4_K_M Very good 3.97 GB 179.5 t/s Fits in VRAM
Qwen/Qwen2.5-3B-Instruct-GGUF 3.4B Q5_K_M Very good 3.96 GB 176.1 t/s Fits in VRAM
bartowski/Llama-3.2-3B-Instruct-GGUF 3.21B Q5_K_L Very good 3.92 GB 177.7 t/s Fits in VRAM
Qwen/Qwen2.5-0.5B-Instruct-GGUF 0.63B GGUF Excellent 2.36 GB 339.1 t/s Fits in VRAM
MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF 4.02B Q3_K_L Good 3.85 GB 191.8 t/s Fits in VRAM
MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF 7.25B IQ1_M Very low 3.74 GB 244.4 t/s Fits in VRAM
MaziyarPanahi/gemma-3-4b-it-GGUF 3.88B Q4_K_S Good 3.96 GB 180.6 t/s Fits in VRAM
MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF 7.62B IQ1_S Very low 3.9 GB 225.6 t/s Fits in VRAM
MaziyarPanahi/Phi-4-mini-instruct-GGUF 3.84B Q4_K_S Good 3.92 GB 183.7 t/s Fits in VRAM
MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF 1.48B Q5_K_M Very good 2.41 GB 390.4 t/s Fits in VRAM
MaziyarPanahi/gemma-3-1b-it-GGUF 1.0B GGUF Excellent 3.15 GB 214.0 t/s Fits in VRAM
Qwen/Qwen3-4B-GGUF 4.02B Q8_0 Excellent 5.75 GB 12.5 t/s Offload
unsloth/gpt-oss-20b-GGUF 20.91B F16 Very good 13.83 GB 3.9 t/s Offload
MaziyarPanahi/Qwen3-14B-GGUF 14.77B Q6_K Excellent 13.94 GB 4.4 t/s Offload
MaziyarPanahi/Qwen3-8B-GGUF 8.19B GGUF Excellent 17.44 GB 3.3 t/s Offload
MaziyarPanahi/Qwen3-32B-GGUF 32.76B Q3_K_L Good 19.7 GB 3.1 t/s Offload
MaziyarPanahi/Qwen3-30B-A3B-GGUF 30.53B Q3_K_L Fair 18.27 GB 3.4 t/s Offload
bartowski/Meta-Llama-3.1-8B-Instruct-GGUF 8.03B Q8_0 Excellent 10.12 GB 6.3 t/s Offload
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF 30.53B Q4_K_XL Good 19.92 GB 3.0 t/s Offload
LiquidAI/LFM2.5-8B-A1B-GGUF 8.47B BF16 Excellent 17.99 GB 3.2 t/s Offload
MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF 8.03B GGUF Excellent 17.13 GB 3.3 t/s Offload
MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF 8.19B GGUF Excellent 17.44 GB 3.3 t/s Offload
MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF 12.25B Q8_0 Excellent 14.62 GB 4.1 t/s Offload
MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF 8.03B GGUF Excellent 17.13 GB 3.3 t/s Offload
MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF 70.55B IQ1_S Very low 19.14 GB 3.5 t/s Offload
TheBloke/Mistral-7B-Instruct-v0.2-GGUF 7.24B Q8_0 Excellent 9.27 GB 7.0 t/s Offload
MaziyarPanahi/Yi-Coder-9B-Chat-GGUF 8.83B GGUF Excellent 18.68 GB 3.0 t/s Offload
MaziyarPanahi/gemma-3-12b-it-GGUF 11.77B Q8_0 Excellent 14.11 GB 4.3 t/s Offload

"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.

Frequently asked questions

How much VRAM does the NVIDIA GTX 1650 have?

The NVIDIA GTX 1650 has 4 GB of VRAM, which determines how large a model it can run entirely on the GPU.

What is the best LLM to run on a NVIDIA GTX 1650?

Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA GTX 1650 using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.

Can a NVIDIA GTX 1650 run a 7–8B model?

Yes. A 7–8B model like Mistral-7B-Instruct-v0.3-GGUF fits entirely in the 4 GB of a NVIDIA GTX 1650 (IQ1_M).

Can a NVIDIA GTX 1650 run a 13–14B model?

Only with offloading. A 13–14B model like Qwen3-14B-GGUF runs on a NVIDIA GTX 1650 by using system RAM in addition to its 4 GB, which is slower.

Can a NVIDIA GTX 1650 run a 70B model?

Only with offloading. A 70B model like Meta-Llama-3.1-70B-Instruct-GGUF runs on a NVIDIA GTX 1650 by using system RAM in addition to its 4 GB, which is slower.

Another graphics card