Which AI models run on a NVIDIA GTX 1650?
With 4 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~16.0 GB system RAM assumed), ranked by popularity.
The NVIDIA GTX 1650 comes with 4 GB of VRAM. Among the popular GGUF models we track, it can run 17 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Jan-v3.5-4B-gguf, gemma-2-2b-it-GGUF.
Larger models such as Qwen3-4B-GGUF still run on a NVIDIA GTX 1650 but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.
| Model | Size | Quant. | Quality | Memory | Speed~ | Verdict |
|---|---|---|---|---|---|---|
| hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF | 1.24B | Q8_0 | Excellent | 2.57 GB | 325.1 t/s | Fits in VRAM |
| janhq/Jan-v3.5-4B-gguf | 4.41B | Q3_K_M | Fair | 3.9 GB | 191.5 t/s | Fits in VRAM |
| bartowski/gemma-2-2b-it-GGUF | 2.61B | Q6_K_L | Excellent | 3.72 GB | 187.2 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-0.6B-GGUF | 0.75B | GGUF | Excellent | 2.62 GB | 284.6 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-1.7B-GGUF | 2.03B | Q6_K | Excellent | 3.05 GB | 256.7 t/s | Fits in VRAM |
| Qwen/Qwen2.5-1.5B-Instruct-GGUF | 1.78B | Q8_0 | Excellent | 3.21 GB | 226.7 t/s | Fits in VRAM |
| MaziyarPanahi/Phi-3.5-mini-instruct-GGUF | 3.82B | Q4_K_M | Very good | 3.97 GB | 179.5 t/s | Fits in VRAM |
| Qwen/Qwen2.5-3B-Instruct-GGUF | 3.4B | Q5_K_M | Very good | 3.96 GB | 176.1 t/s | Fits in VRAM |
| bartowski/Llama-3.2-3B-Instruct-GGUF | 3.21B | Q5_K_L | Very good | 3.92 GB | 177.7 t/s | Fits in VRAM |
| Qwen/Qwen2.5-0.5B-Instruct-GGUF | 0.63B | GGUF | Excellent | 2.36 GB | 339.1 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF | 4.02B | Q3_K_L | Good | 3.85 GB | 191.8 t/s | Fits in VRAM |
| MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF | 7.25B | IQ1_M | Very low | 3.74 GB | 244.4 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-4b-it-GGUF | 3.88B | Q4_K_S | Good | 3.96 GB | 180.6 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF | 7.62B | IQ1_S | Very low | 3.9 GB | 225.6 t/s | Fits in VRAM |
| MaziyarPanahi/Phi-4-mini-instruct-GGUF | 3.84B | Q4_K_S | Good | 3.92 GB | 183.7 t/s | Fits in VRAM |
| MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF | 1.48B | Q5_K_M | Very good | 2.41 GB | 390.4 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-1b-it-GGUF | 1.0B | GGUF | Excellent | 3.15 GB | 214.0 t/s | Fits in VRAM |
| Qwen/Qwen3-4B-GGUF | 4.02B | Q8_0 | Excellent | 5.75 GB | 12.5 t/s | Offload |
| unsloth/gpt-oss-20b-GGUF | 20.91B | F16 | Very good | 13.83 GB | 3.9 t/s | Offload |
| MaziyarPanahi/Qwen3-14B-GGUF | 14.77B | Q6_K | Excellent | 13.94 GB | 4.4 t/s | Offload |
| MaziyarPanahi/Qwen3-8B-GGUF | 8.19B | GGUF | Excellent | 17.44 GB | 3.3 t/s | Offload |
| MaziyarPanahi/Qwen3-32B-GGUF | 32.76B | Q3_K_L | Good | 19.7 GB | 3.1 t/s | Offload |
| MaziyarPanahi/Qwen3-30B-A3B-GGUF | 30.53B | Q3_K_L | Fair | 18.27 GB | 3.4 t/s | Offload |
| bartowski/Meta-Llama-3.1-8B-Instruct-GGUF | 8.03B | Q8_0 | Excellent | 10.12 GB | 6.3 t/s | Offload |
| unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF | 30.53B | Q4_K_XL | Good | 19.92 GB | 3.0 t/s | Offload |
| LiquidAI/LFM2.5-8B-A1B-GGUF | 8.47B | BF16 | Excellent | 17.99 GB | 3.2 t/s | Offload |
| MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF | 8.03B | GGUF | Excellent | 17.13 GB | 3.3 t/s | Offload |
| MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF | 8.19B | GGUF | Excellent | 17.44 GB | 3.3 t/s | Offload |
| MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF | 12.25B | Q8_0 | Excellent | 14.62 GB | 4.1 t/s | Offload |
| MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF | 8.03B | GGUF | Excellent | 17.13 GB | 3.3 t/s | Offload |
| MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF | 70.55B | IQ1_S | Very low | 19.14 GB | 3.5 t/s | Offload |
| TheBloke/Mistral-7B-Instruct-v0.2-GGUF | 7.24B | Q8_0 | Excellent | 9.27 GB | 7.0 t/s | Offload |
| MaziyarPanahi/Yi-Coder-9B-Chat-GGUF | 8.83B | GGUF | Excellent | 18.68 GB | 3.0 t/s | Offload |
| MaziyarPanahi/gemma-3-12b-it-GGUF | 11.77B | Q8_0 | Excellent | 14.11 GB | 4.3 t/s | Offload |
"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.
Frequently asked questions
How much VRAM does the NVIDIA GTX 1650 have?
The NVIDIA GTX 1650 has 4 GB of VRAM, which determines how large a model it can run entirely on the GPU.
What is the best LLM to run on a NVIDIA GTX 1650?
Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA GTX 1650 using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.
Can a NVIDIA GTX 1650 run a 7–8B model?
Yes. A 7–8B model like Mistral-7B-Instruct-v0.3-GGUF fits entirely in the 4 GB of a NVIDIA GTX 1650 (IQ1_M).
Can a NVIDIA GTX 1650 run a 13–14B model?
Only with offloading. A 13–14B model like Qwen3-14B-GGUF runs on a NVIDIA GTX 1650 by using system RAM in addition to its 4 GB, which is slower.
Can a NVIDIA GTX 1650 run a 70B model?
Only with offloading. A 70B model like Meta-Llama-3.1-70B-Instruct-GGUF runs on a NVIDIA GTX 1650 by using system RAM in addition to its 4 GB, which is slower.