Which AI models run on a NVIDIA RTX 2080 Ti?
With 11 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~16.0 GB system RAM assumed), ranked by popularity.
The NVIDIA RTX 2080 Ti comes with 11 GB of VRAM. Among the popular GGUF models we track, it can run 30 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Qwen3-4B-GGUF, Jan-v3.5-4B-gguf.
Larger models such as gpt-oss-20b-GGUF still run on a NVIDIA RTX 2080 Ti but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.
| Model | Size | Quant. | Quality | Memory | Speed~ | Verdict |
|---|---|---|---|---|---|---|
| hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF | 1.24B | Q8_0 | Excellent | 2.57 GB | 325.1 t/s | Fits in VRAM |
| Qwen/Qwen3-4B-GGUF | 4.02B | Q8_0 | Excellent | 5.75 GB | 100.3 t/s | Fits in VRAM |
| janhq/Jan-v3.5-4B-gguf | 4.41B | GGUF | Excellent | 10.04 GB | 48.6 t/s | Fits in VRAM |
| bartowski/gemma-2-2b-it-GGUF | 2.61B | Q8_0 | Excellent | 4.17 GB | 154.2 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-0.6B-GGUF | 0.75B | GGUF | Excellent | 2.62 GB | 284.6 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-14B-GGUF | 14.77B | Q3_K_L | Good | 10.01 GB | 54.4 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-8B-GGUF | 8.19B | Q6_K | Excellent | 8.44 GB | 63.9 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-1.7B-GGUF | 2.03B | GGUF | Excellent | 5.28 GB | 105.5 t/s | Fits in VRAM |
| bartowski/Meta-Llama-3.1-8B-Instruct-GGUF | 8.03B | Q8_0 | Excellent | 10.12 GB | 50.3 t/s | Fits in VRAM |
| Qwen/Qwen2.5-1.5B-Instruct-GGUF | 1.78B | GGUF | Excellent | 4.76 GB | 120.6 t/s | Fits in VRAM |
| unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF | 30.53B | Q1_0 | Very low | 10.92 GB | 53.7 t/s | Fits in VRAM |
| MaziyarPanahi/Phi-3.5-mini-instruct-GGUF | 3.82B | Q8_0 | Excellent | 5.53 GB | 105.8 t/s | Fits in VRAM |
| Qwen/Qwen2.5-3B-Instruct-GGUF | 3.4B | GGUF | Excellent | 8.02 GB | 63.2 t/s | Fits in VRAM |
| bartowski/Llama-3.2-3B-Instruct-GGUF | 3.21B | F16 | Excellent | 7.66 GB | 66.8 t/s | Fits in VRAM |
| Qwen/Qwen2.5-0.5B-Instruct-GGUF | 0.63B | GGUF | Excellent | 2.36 GB | 339.1 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF | 4.02B | GGUF | Excellent | 9.27 GB | 53.3 t/s | Fits in VRAM |
| LiquidAI/LFM2.5-8B-A1B-GGUF | 8.47B | Q8_0 | Excellent | 10.6 GB | 47.7 t/s | Fits in VRAM |
| MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF | 7.25B | Q8_0 | Excellent | 9.27 GB | 55.8 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-4b-it-GGUF | 3.88B | GGUF | Excellent | 8.98 GB | 55.3 t/s | Fits in VRAM |
| MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF | 8.03B | Q8_0 | Excellent | 10.12 GB | 50.3 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF | 7.62B | Q8_0 | Excellent | 9.67 GB | 53.0 t/s | Fits in VRAM |
| MaziyarPanahi/Phi-4-mini-instruct-GGUF | 3.84B | GGUF | Excellent | 8.9 GB | 55.9 t/s | Fits in VRAM |
| MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF | 1.48B | GGUF | Excellent | 4.14 GB | 145.4 t/s | Fits in VRAM |
| MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF | 8.19B | Q6_K | Excellent | 8.44 GB | 63.9 t/s | Fits in VRAM |
| MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF | 12.25B | Q5_K_M | Very good | 10.62 GB | 49.2 t/s | Fits in VRAM |
| MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF | 8.03B | Q8_0 | Excellent | 10.12 GB | 50.3 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-1b-it-GGUF | 1.0B | GGUF | Excellent | 3.15 GB | 214.0 t/s | Fits in VRAM |
| TheBloke/Mistral-7B-Instruct-v0.2-GGUF | 7.24B | Q8_0 | Excellent | 9.27 GB | 55.8 t/s | Fits in VRAM |
| MaziyarPanahi/Yi-Coder-9B-Chat-GGUF | 8.83B | Q5_K_M | Very good | 8.06 GB | 68.6 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-12b-it-GGUF | 11.77B | Q5_K_M | Very good | 10.32 GB | 50.9 t/s | Fits in VRAM |
| unsloth/gpt-oss-20b-GGUF | 20.91B | F16 | Very good | 13.83 GB | 3.9 t/s | Offload |
| unsloth/Qwen3-Coder-Next-GGUF | 79.67B | IQ2_XXS | Very low | 26.82 GB | 2.3 t/s | Offload |
| MaziyarPanahi/Qwen3-32B-GGUF | 32.76B | Q5_K_M | Very good | 25.18 GB | 2.3 t/s | Offload |
| MaziyarPanahi/Qwen3-30B-A3B-GGUF | 30.53B | Q6_K | Excellent | 26.84 GB | 2.1 t/s | Offload |
| Qwen/Qwen2.5-Coder-32B-Instruct-GGUF | 32.76B | Q2_K | Very good | 26.5 GB | 2.2 t/s | Offload |
| MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF | 70.55B | IQ2_XS | Very low | 24.54 GB | 2.5 t/s | Offload |
"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.
Frequently asked questions
How much VRAM does the NVIDIA RTX 2080 Ti have?
The NVIDIA RTX 2080 Ti has 11 GB of VRAM, which determines how large a model it can run entirely on the GPU.
What is the best LLM to run on a NVIDIA RTX 2080 Ti?
Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA RTX 2080 Ti using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.
Can a NVIDIA RTX 2080 Ti run a 7–8B model?
Yes. A 7–8B model like Qwen3-8B-GGUF fits entirely in the 11 GB of a NVIDIA RTX 2080 Ti (Q6_K).
Can a NVIDIA RTX 2080 Ti run a 13–14B model?
Yes. A 13–14B model like Qwen3-14B-GGUF fits entirely in the 11 GB of a NVIDIA RTX 2080 Ti (Q3_K_L).
Can a NVIDIA RTX 2080 Ti run a 70B model?
Only with offloading. A 70B model like Qwen3-Coder-Next-GGUF runs on a NVIDIA RTX 2080 Ti by using system RAM in addition to its 11 GB, which is slower.