Which AI models run on a NVIDIA GTX 1660 Ti?

With 6 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~16.0 GB system RAM assumed), ranked by popularity.

VRAM
6 GB
Vendor
NVIDIA
Fits in VRAM
25 models
Assumed RAM
16.0 GB

The NVIDIA GTX 1660 Ti comes with 6 GB of VRAM. Among the popular GGUF models we track, it can run 25 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Qwen3-4B-GGUF, Jan-v3.5-4B-gguf.

Larger models such as gpt-oss-20b-GGUF still run on a NVIDIA GTX 1660 Ti but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.

New to this? Read: How much VRAM do you need?

ModelSize Quant.Quality MemorySpeed~ Verdict
hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF 1.24B Q8_0 Excellent 2.57 GB 325.1 t/s Fits in VRAM
Qwen/Qwen3-4B-GGUF 4.02B Q8_0 Excellent 5.75 GB 100.3 t/s Fits in VRAM
janhq/Jan-v3.5-4B-gguf 4.41B Q6_K Excellent 5.19 GB 118.5 t/s Fits in VRAM
bartowski/gemma-2-2b-it-GGUF 2.61B Q8_0 Excellent 4.17 GB 154.2 t/s Fits in VRAM
MaziyarPanahi/Qwen3-0.6B-GGUF 0.75B GGUF Excellent 2.62 GB 284.6 t/s Fits in VRAM
MaziyarPanahi/Qwen3-8B-GGUF 8.19B Q2_K Low 5.24 GB 130.9 t/s Fits in VRAM
MaziyarPanahi/Qwen3-1.7B-GGUF 2.03B GGUF Excellent 5.28 GB 105.5 t/s Fits in VRAM
bartowski/Meta-Llama-3.1-8B-Instruct-GGUF 8.03B Q3_K_M Fair 5.91 GB 106.9 t/s Fits in VRAM
Qwen/Qwen2.5-1.5B-Instruct-GGUF 1.78B GGUF Excellent 4.76 GB 120.6 t/s Fits in VRAM
MaziyarPanahi/Phi-3.5-mini-instruct-GGUF 3.82B Q8_0 Excellent 5.53 GB 105.8 t/s Fits in VRAM
Qwen/Qwen2.5-3B-Instruct-GGUF 3.4B Q8_0 Excellent 5.06 GB 118.8 t/s Fits in VRAM
bartowski/Llama-3.2-3B-Instruct-GGUF 3.21B Q8_0 Excellent 4.85 GB 125.5 t/s Fits in VRAM
Qwen/Qwen2.5-0.5B-Instruct-GGUF 0.63B GGUF Excellent 2.36 GB 339.1 t/s Fits in VRAM
MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF 4.02B Q6_K Excellent 4.85 GB 129.9 t/s Fits in VRAM
MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF 7.25B Q4_K_S Good 5.96 GB 103.6 t/s Fits in VRAM
MaziyarPanahi/gemma-3-4b-it-GGUF 3.88B Q8_0 Excellent 5.6 GB 104.0 t/s Fits in VRAM
MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF 8.03B Q3_K_M Fair 5.91 GB 106.9 t/s Fits in VRAM
MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF 7.62B Q3_K_L Good 5.94 GB 105.1 t/s Fits in VRAM
MaziyarPanahi/Phi-4-mini-instruct-GGUF 3.84B Q8_0 Excellent 5.55 GB 105.1 t/s Fits in VRAM
MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF 1.48B GGUF Excellent 4.14 GB 145.4 t/s Fits in VRAM
MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF 8.19B Q2_K Low 5.24 GB 130.9 t/s Fits in VRAM
MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF 8.03B Q3_K_M Fair 5.91 GB 106.9 t/s Fits in VRAM
MaziyarPanahi/gemma-3-1b-it-GGUF 1.0B GGUF Excellent 3.15 GB 214.0 t/s Fits in VRAM
TheBloke/Mistral-7B-Instruct-v0.2-GGUF 7.24B Q4_K_S Good 5.95 GB 103.7 t/s Fits in VRAM
MaziyarPanahi/Yi-Coder-9B-Chat-GGUF 8.83B Q3_K_S Fair 5.87 GB 110.1 t/s Fits in VRAM
unsloth/gpt-oss-20b-GGUF 20.91B F16 Very good 13.83 GB 3.9 t/s Offload
MaziyarPanahi/Qwen3-14B-GGUF 14.77B Q6_K Excellent 13.94 GB 4.4 t/s Offload
MaziyarPanahi/Qwen3-32B-GGUF 32.76B Q4_K_M Good 21.97 GB 2.7 t/s Offload
MaziyarPanahi/Qwen3-30B-A3B-GGUF 30.53B Q4_K_M Good 20.75 GB 2.9 t/s Offload
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF 30.53B Q4_1 Very good 21.34 GB 2.8 t/s Offload
LiquidAI/LFM2.5-8B-A1B-GGUF 8.47B BF16 Excellent 17.99 GB 3.2 t/s Offload
MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF 12.25B Q8_0 Excellent 14.62 GB 4.1 t/s Offload
MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF 70.55B IQ1_M Very low 20.45 GB 3.2 t/s Offload
MaziyarPanahi/gemma-3-12b-it-GGUF 11.77B Q8_0 Excellent 14.11 GB 4.3 t/s Offload

"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.

Frequently asked questions

How much VRAM does the NVIDIA GTX 1660 Ti have?

The NVIDIA GTX 1660 Ti has 6 GB of VRAM, which determines how large a model it can run entirely on the GPU.

What is the best LLM to run on a NVIDIA GTX 1660 Ti?

Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA GTX 1660 Ti using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.

Can a NVIDIA GTX 1660 Ti run a 7–8B model?

Yes. A 7–8B model like Qwen3-8B-GGUF fits entirely in the 6 GB of a NVIDIA GTX 1660 Ti (Q2_K).

Can a NVIDIA GTX 1660 Ti run a 13–14B model?

Only with offloading. A 13–14B model like Qwen3-14B-GGUF runs on a NVIDIA GTX 1660 Ti by using system RAM in addition to its 6 GB, which is slower.

Can a NVIDIA GTX 1660 Ti run a 70B model?

Only with offloading. A 70B model like Meta-Llama-3.1-70B-Instruct-GGUF runs on a NVIDIA GTX 1660 Ti by using system RAM in addition to its 6 GB, which is slower.

Another graphics card