Which AI models run on a NVIDIA RTX 4060 Ti 8 GB?

With 8 GB of VRAM, here are the popular models you can run locally (4,096-token context, ~16.0 GB system RAM assumed), ranked by popularity.

VRAM
8 GB
Vendor
NVIDIA
Fits in VRAM
28 models
Assumed RAM
16.0 GB

The NVIDIA RTX 4060 Ti 8 GB comes with 8 GB of VRAM. Among the popular GGUF models we track, it can run 28 of them entirely in VRAM — including Llama-3.2-1B-Instruct-Q8_0-GGUF, Qwen3-4B-GGUF, Jan-v3.5-4B-gguf.

Larger models such as gpt-oss-20b-GGUF still run on a NVIDIA RTX 4060 Ti 8 GB but require offloading part of the model to system RAM, which lowers speed. Models that exceed both VRAM and RAM are not listed.

New to this? Read: How much VRAM do you need?

ModelSize Quant.Quality MemorySpeed~ Verdict
hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF 1.24B Q8_0 Excellent 2.57 GB 325.1 t/s Fits in VRAM
Qwen/Qwen3-4B-GGUF 4.02B Q8_0 Excellent 5.75 GB 100.3 t/s Fits in VRAM
janhq/Jan-v3.5-4B-gguf 4.41B Q8_0 Excellent 6.18 GB 91.5 t/s Fits in VRAM
bartowski/gemma-2-2b-it-GGUF 2.61B Q8_0 Excellent 4.17 GB 154.2 t/s Fits in VRAM
MaziyarPanahi/Qwen3-0.6B-GGUF 0.75B GGUF Excellent 2.62 GB 284.6 t/s Fits in VRAM
MaziyarPanahi/Qwen3-8B-GGUF 8.19B Q5_K_M Very good 7.63 GB 73.4 t/s Fits in VRAM
MaziyarPanahi/Qwen3-1.7B-GGUF 2.03B GGUF Excellent 5.28 GB 105.5 t/s Fits in VRAM
bartowski/Meta-Llama-3.1-8B-Instruct-GGUF 8.03B Q5_K_L Very good 7.81 GB 70.9 t/s Fits in VRAM
Qwen/Qwen2.5-1.5B-Instruct-GGUF 1.78B GGUF Excellent 4.76 GB 120.6 t/s Fits in VRAM
MaziyarPanahi/Phi-3.5-mini-instruct-GGUF 3.82B Q8_0 Excellent 5.53 GB 105.8 t/s Fits in VRAM
Qwen/Qwen2.5-3B-Instruct-GGUF 3.4B Q8_0 Excellent 5.06 GB 118.8 t/s Fits in VRAM
bartowski/Llama-3.2-3B-Instruct-GGUF 3.21B F16 Excellent 7.66 GB 66.8 t/s Fits in VRAM
Qwen/Qwen2.5-0.5B-Instruct-GGUF 0.63B GGUF Excellent 2.36 GB 339.1 t/s Fits in VRAM
MaziyarPanahi/Qwen3-4B-Instruct-2507-GGUF 4.02B Q6_K Excellent 4.85 GB 129.9 t/s Fits in VRAM
LiquidAI/LFM2.5-8B-A1B-GGUF 8.47B Q5_K_M Very good 7.82 GB 71.2 t/s Fits in VRAM
MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF 7.25B Q6_K Excellent 7.64 GB 72.2 t/s Fits in VRAM
MaziyarPanahi/gemma-3-4b-it-GGUF 3.88B Q8_0 Excellent 5.6 GB 104.0 t/s Fits in VRAM
MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF 8.03B Q5_K_M Very good 7.51 GB 74.9 t/s Fits in VRAM
MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF 7.62B Q6_K Excellent 7.96 GB 68.7 t/s Fits in VRAM
MaziyarPanahi/Phi-4-mini-instruct-GGUF 3.84B Q8_0 Excellent 5.55 GB 105.1 t/s Fits in VRAM
MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF 1.48B GGUF Excellent 4.14 GB 145.4 t/s Fits in VRAM
MaziyarPanahi/DeepSeek-R1-0528-Qwen3-8B-GGUF 8.19B Q5_K_M Very good 7.63 GB 73.4 t/s Fits in VRAM
MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF 12.25B Q3_K_S Fair 7.64 GB 77.6 t/s Fits in VRAM
MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF 8.03B Q5_K_M Very good 7.51 GB 74.9 t/s Fits in VRAM
MaziyarPanahi/gemma-3-1b-it-GGUF 1.0B GGUF Excellent 3.15 GB 214.0 t/s Fits in VRAM
TheBloke/Mistral-7B-Instruct-v0.2-GGUF 7.24B Q6_K Excellent 7.63 GB 72.3 t/s Fits in VRAM
MaziyarPanahi/Yi-Coder-9B-Chat-GGUF 8.83B Q5_K_S Very good 7.92 GB 70.3 t/s Fits in VRAM
MaziyarPanahi/gemma-3-12b-it-GGUF 11.77B Q3_K_S Fair 7.54 GB 78.7 t/s Fits in VRAM
unsloth/gpt-oss-20b-GGUF 20.91B F16 Very good 13.83 GB 3.9 t/s Offload
unsloth/Qwen3-Coder-Next-GGUF 79.67B Q1_0 Very low 22.75 GB 2.8 t/s Offload
MaziyarPanahi/Qwen3-14B-GGUF 14.77B Q6_K Excellent 13.94 GB 4.4 t/s Offload
MaziyarPanahi/Qwen3-32B-GGUF 32.76B Q4_K_M Good 21.97 GB 2.7 t/s Offload
MaziyarPanahi/Qwen3-30B-A3B-GGUF 30.53B Q5_K_M Very good 23.7 GB 2.5 t/s Offload
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF 30.53B Q5_K_XL Very good 23.71 GB 2.5 t/s Offload
MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF 70.55B IQ1_M Very low 20.45 GB 3.2 t/s Offload

"Fits in VRAM" = fast, fully on GPU. "Offload" = part on system RAM, slower. Speed is a rough estimate.

Frequently asked questions

How much VRAM does the NVIDIA RTX 4060 Ti 8 GB have?

The NVIDIA RTX 4060 Ti 8 GB has 8 GB of VRAM, which determines how large a model it can run entirely on the GPU.

What is the best LLM to run on a NVIDIA RTX 4060 Ti 8 GB?

Among popular models, hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF runs well on a NVIDIA RTX 4060 Ti 8 GB using the Q8_0 quantization (about 2.57 GB). Larger models trade speed for capability via RAM offloading.

Can a NVIDIA RTX 4060 Ti 8 GB run a 7–8B model?

Yes. A 7–8B model like Qwen3-8B-GGUF fits entirely in the 8 GB of a NVIDIA RTX 4060 Ti 8 GB (Q5_K_M).

Can a NVIDIA RTX 4060 Ti 8 GB run a 13–14B model?

Yes. A 13–14B model like Mistral-Nemo-Instruct-2407-GGUF fits entirely in the 8 GB of a NVIDIA RTX 4060 Ti 8 GB (Q3_K_S).

Can a NVIDIA RTX 4060 Ti 8 GB run a 70B model?

Only with offloading. A 70B model like Qwen3-Coder-Next-GGUF runs on a NVIDIA RTX 4060 Ti 8 GB by using system RAM in addition to its 8 GB, which is slower.

Another graphics card