Which AI models run on a NVIDIA GTX 1660?

With 6 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~16.0 GB system RAM assumed), ranked by popularity.

VRAM
6 GB
Vendor
NVIDIA
Fits in VRAM
25 models
Assumed RAM
16.0 GB

The NVIDIA GTX 1660 comes with 6 GB of VRAM. Among the popular GGUF models we track, it can run 25 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Qwen3-4B-GGUF, Jan-v3.5-4B-gguf.

Larger models such as gpt-oss-20b-GGUF still run on a NVIDIA GTX 1660 but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.

New to this? Read: How much VRAM do you need?

ModelSize Quant.Quality MemorySpeed~ Verdict
hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF 1.24B Q8_0 Excellent 2.57 GB 325.1 t/s Fits in VRAM
Qwen/Qwen3-4B-GGUF 4.02B Q8_0 Excellent 5.75 GB 100.3 t/s Fits in VRAM
janhq/Jan-v3.5-4B-gguf 4.41B Q6_K Excellent 5.19 GB 118.5 t/s Fits in VRAM
bartowski/gemma-2-2b-it-GGUF 2.61B Q8_0 Excellent 4.17 GB 154.2 t/s Fits in VRAM
MaziyarPanahi/Qwen3-0.6B-GGUF 0.75B GGUF Excellent 2.62 GB 284.6 t/s Fits in VRAM
MaziyarPanahi/Qwen3-8B-GGUF 8.19B Q2_K Low 5.24 GB 130.9 t/s Fits in VRAM
MaziyarPanahi/Qwen3-1.7B-GGUF 2.03B GGUF Excellent 5.28 GB 105.5 t/s Fits in VRAM
bartowski/Meta-Llama-3.1-8B-Instruct-GGUF 8.03B Q3_K_M Fair 5.91 GB 106.9 t/s Fits in VRAM
Qwen/Qwen2.5-1.5B-Instruct-GGUF 1.78B GGUF Excellent 4.76 GB 120.6 t/s Fits in VRAM
MaziyarPanahi/Phi-3.5-mini-instruct-GGUF 3.82B Q8_0 Excellent 5.53 GB 105.8 t/s Fits in VRAM
Qwen/Qwen2.5-3B-Instruct-GGUF 3.4B Q8_0 Excellent 5.06 GB 118.8 t/s Fits in VRAM
bartowski/Llama-3.2-3B-Instruct-GGUF 3.21B Q8_0 Excellent 4.85 GB 125.5 t/s Fits in VRAM
Qwen/Qwen2.5-0.5B-Instruct-GGUF 0.63B GGUF Excellent 2.36 GB 339.1 t/s Fits in VRAM
MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF 4.02B Q6_K Excellent 4.85 GB 129.9 t/s Fits in VRAM
MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF 7.25B Q4_K_S Good 5.96 GB 103.6 t/s Fits in VRAM
MaziyarPanahi/gemma-3-4b-it-GGUF 3.88B Q8_0 Excellent 5.6 GB 104.0 t/s Fits in VRAM
MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF 8.03B Q3_K_M Fair 5.91 GB 106.9 t/s Fits in VRAM
MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF 7.62B Q3_K_L Good 5.94 GB 105.1 t/s Fits in VRAM
MaziyarPanahi/Phi-4-mini-instruct-GGUF 3.84B Q8_0 Excellent 5.55 GB 105.1 t/s Fits in VRAM
MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF 1.48B GGUF Excellent 4.14 GB 145.4 t/s Fits in VRAM
MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF 8.19B Q2_K Low 5.24 GB 130.9 t/s Fits in VRAM
MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF 8.03B Q3_K_M Fair 5.91 GB 106.9 t/s Fits in VRAM
MaziyarPanahi/gemma-3-1b-it-GGUF 1.0B GGUF Excellent 3.15 GB 214.0 t/s Fits in VRAM
TheBloke/Mistral-7B-Instruct-v0.2-GGUF 7.24B Q4_K_S Good 5.95 GB 103.7 t/s Fits in VRAM
MaziyarPanahi/Yi-Coder-9B-Chat-GGUF 8.83B Q3_K_S Fair 5.87 GB 110.1 t/s Fits in VRAM
unsloth/gpt-oss-20b-GGUF 20.91B F16 Very good 13.83 GB 3.9 t/s Offload
MaziyarPanahi/Qwen3-14B-GGUF 14.77B Q6_K Excellent 13.94 GB 4.4 t/s Offload
MaziyarPanahi/Qwen3-32B-GGUF 32.76B Q4_K_M Good 21.97 GB 2.7 t/s Offload
MaziyarPanahi/Qwen3-30B-A3B-GGUF 30.53B Q4_K_M Good 20.75 GB 2.9 t/s Offload
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF 30.53B Q4_1 Very good 21.34 GB 2.8 t/s Offload
LiquidAI/LFM2.5-8B-A1B-GGUF 8.47B BF16 Excellent 17.99 GB 3.2 t/s Offload
MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF 12.25B Q8_0 Excellent 14.62 GB 4.1 t/s Offload
MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF 70.55B IQ1_M Very low 20.45 GB 3.2 t/s Offload
MaziyarPanahi/gemma-3-12b-it-GGUF 11.77B Q8_0 Excellent 14.11 GB 4.3 t/s Offload

"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.

Frequently asked questions

How much VRAM does the NVIDIA GTX 1660 have?

The NVIDIA GTX 1660 has 6 GB of VRAM, which determines how large a model it can run entirely on the GPU.

What is the best LLM to run on a NVIDIA GTX 1660?

Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA GTX 1660 using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.

Can a NVIDIA GTX 1660 run a 7–8B model?

Yes. A 7–8B model like Qwen3-8B-GGUF fits entirely in the 6 GB of a NVIDIA GTX 1660 (Q2_K).

Can a NVIDIA GTX 1660 run a 13–14B model?

Only with offloading. A 13–14B model like Qwen3-14B-GGUF runs on a NVIDIA GTX 1660 by using system RAM in addition to its 6 GB, which is slower.

Can a NVIDIA GTX 1660 run a 70B model?

Only with offloading. A 70B model like Meta-Llama-3.1-70B-Instruct-GGUF runs on a NVIDIA GTX 1660 by using system RAM in addition to its 6 GB, which is slower.

Another graphics card