Which AI models run on a NVIDIA RTX 2070 Super?
With 8 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~16.0 GB system RAM assumed), ranked by popularity.
The NVIDIA RTX 2070 Super comes with 8 GB of VRAM. Among the popular GGUF models we track, it can run 28 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Qwen3-4B-GGUF, Jan-v3.5-4B-gguf.
Larger models such as gpt-oss-20b-GGUF still run on a NVIDIA RTX 2070 Super but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.
| Model | Size | Quant. | Quality | Memory | Speed~ | Verdict |
|---|---|---|---|---|---|---|
| hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF | 1.24B | Q8_0 | Excellent | 2.57 GB | 325.1 t/s | Fits in VRAM |
| Qwen/Qwen3-4B-GGUF | 4.02B | Q8_0 | Excellent | 5.75 GB | 100.3 t/s | Fits in VRAM |
| janhq/Jan-v3.5-4B-gguf | 4.41B | Q8_0 | Excellent | 6.18 GB | 91.5 t/s | Fits in VRAM |
| bartowski/gemma-2-2b-it-GGUF | 2.61B | Q8_0 | Excellent | 4.17 GB | 154.2 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-0.6B-GGUF | 0.75B | GGUF | Excellent | 2.62 GB | 284.6 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-8B-GGUF | 8.19B | Q5_K_M | Very good | 7.63 GB | 73.4 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-1.7B-GGUF | 2.03B | GGUF | Excellent | 5.28 GB | 105.5 t/s | Fits in VRAM |
| bartowski/Meta-Llama-3.1-8B-Instruct-GGUF | 8.03B | Q5_K_L | Very good | 7.81 GB | 70.9 t/s | Fits in VRAM |
| Qwen/Qwen2.5-1.5B-Instruct-GGUF | 1.78B | GGUF | Excellent | 4.76 GB | 120.6 t/s | Fits in VRAM |
| MaziyarPanahi/Phi-3.5-mini-instruct-GGUF | 3.82B | Q8_0 | Excellent | 5.53 GB | 105.8 t/s | Fits in VRAM |
| Qwen/Qwen2.5-3B-Instruct-GGUF | 3.4B | Q8_0 | Excellent | 5.06 GB | 118.8 t/s | Fits in VRAM |
| bartowski/Llama-3.2-3B-Instruct-GGUF | 3.21B | F16 | Excellent | 7.66 GB | 66.8 t/s | Fits in VRAM |
| Qwen/Qwen2.5-0.5B-Instruct-GGUF | 0.63B | GGUF | Excellent | 2.36 GB | 339.1 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF | 4.02B | Q6_K | Excellent | 4.85 GB | 129.9 t/s | Fits in VRAM |
| LiquidAI/LFM2.5-8B-A1B-GGUF | 8.47B | Q5_K_M | Very good | 7.82 GB | 71.2 t/s | Fits in VRAM |
| MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF | 7.25B | Q6_K | Excellent | 7.64 GB | 72.2 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-4b-it-GGUF | 3.88B | Q8_0 | Excellent | 5.6 GB | 104.0 t/s | Fits in VRAM |
| MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF | 8.03B | Q5_K_M | Very good | 7.51 GB | 74.9 t/s | Fits in VRAM |
| MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF | 7.62B | Q6_K | Excellent | 7.96 GB | 68.7 t/s | Fits in VRAM |
| MaziyarPanahi/Phi-4-mini-instruct-GGUF | 3.84B | Q8_0 | Excellent | 5.55 GB | 105.1 t/s | Fits in VRAM |
| MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF | 1.48B | GGUF | Excellent | 4.14 GB | 145.4 t/s | Fits in VRAM |
| MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF | 8.19B | Q5_K_M | Very good | 7.63 GB | 73.4 t/s | Fits in VRAM |
| MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF | 12.25B | Q3_K_S | Fair | 7.64 GB | 77.6 t/s | Fits in VRAM |
| MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF | 8.03B | Q5_K_M | Very good | 7.51 GB | 74.9 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-1b-it-GGUF | 1.0B | GGUF | Excellent | 3.15 GB | 214.0 t/s | Fits in VRAM |
| TheBloke/Mistral-7B-Instruct-v0.2-GGUF | 7.24B | Q6_K | Excellent | 7.63 GB | 72.3 t/s | Fits in VRAM |
| MaziyarPanahi/Yi-Coder-9B-Chat-GGUF | 8.83B | Q5_K_S | Very good | 7.92 GB | 70.3 t/s | Fits in VRAM |
| MaziyarPanahi/gemma-3-12b-it-GGUF | 11.77B | Q3_K_S | Fair | 7.54 GB | 78.7 t/s | Fits in VRAM |
| unsloth/gpt-oss-20b-GGUF | 20.91B | F16 | Very good | 13.83 GB | 3.9 t/s | Offload |
| unsloth/Qwen3-Coder-Next-GGUF | 79.67B | Q1_0 | Very low | 22.75 GB | 2.8 t/s | Offload |
| MaziyarPanahi/Qwen3-14B-GGUF | 14.77B | Q6_K | Excellent | 13.94 GB | 4.4 t/s | Offload |
| MaziyarPanahi/Qwen3-32B-GGUF | 32.76B | Q4_K_M | Good | 21.97 GB | 2.7 t/s | Offload |
| MaziyarPanahi/Qwen3-30B-A3B-GGUF | 30.53B | Q5_K_M | Very good | 23.7 GB | 2.5 t/s | Offload |
| unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF | 30.53B | Q5_K_XL | Very good | 23.71 GB | 2.5 t/s | Offload |
| MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF | 70.55B | IQ1_M | Very low | 20.45 GB | 3.2 t/s | Offload |
"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.
Frequently asked questions
How much VRAM does the NVIDIA RTX 2070 Super have?
The NVIDIA RTX 2070 Super has 8 GB of VRAM, which determines how large a model it can run entirely on the GPU.
What is the best LLM to run on a NVIDIA RTX 2070 Super?
Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA RTX 2070 Super using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.
Can a NVIDIA RTX 2070 Super run a 7–8B model?
Yes. A 7–8B model like Qwen3-8B-GGUF fits entirely in the 8 GB of a NVIDIA RTX 2070 Super (Q5_K_M).
Can a NVIDIA RTX 2070 Super run a 13–14B model?
Yes. A 13–14B model like Mistral-Nemo-Instruct-2407-GGUF fits entirely in the 8 GB of a NVIDIA RTX 2070 Super (Q3_K_S).
Can a NVIDIA RTX 2070 Super run a 70B model?
Only with offloading. A 70B model like Qwen3-Coder-Next-GGUF runs on a NVIDIA RTX 2070 Super by using system RAM in addition to its 8 GB, which is slower.