Run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF locally

License: apache-2.0 ⬇ 243,253 ❤ 763
Parameters30.53B
Context262,144

unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF is a very large code-focused language model with 30.53 billion parameters, built on the qwen3moe architecture. It is released under the apache-2.0 license and has been downloaded 243,253 times.

To run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF locally at a 4,096-token context, its quantized versions need between 8.44 GB (Q1_0, lowest quality) and 57.89 GB (BF16, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is Q5_K_XL, needing about 21.23 GB. That means unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF fits entirely in the VRAM of a 10 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.Bits QualityWeights KVTotal Speed~Verdict
Q1_0 2.1 Very low 7.46 GB 0.19 GB 8.44 GB 53.7 t/s Fits in VRAM
IQ1_S 2.34 Very low 8.3 GB 0.19 GB 9.29 GB 48.2 t/s Fits in VRAM
IQ1_M 2.52 Very low 8.97 GB 0.19 GB 9.95 GB 44.6 t/s Fits in VRAM
IQ2_XXS 2.71 Low 9.62 GB 0.19 GB 10.61 GB 41.6 t/s Fits in VRAM
IQ2_M 2.84 Low 10.09 GB 0.19 GB 11.08 GB 39.6 t/s Fits in VRAM
Q2_K 2.95 Low 10.49 GB 0.19 GB 11.47 GB 38.1 t/s Fits in VRAM
Q2_K_L 2.97 Low 10.55 GB 0.19 GB 11.54 GB 37.9 t/s Fits in VRAM
Q2_K_XL 3.09 Low 10.98 GB 0.19 GB 11.97 GB 36.4 t/s Fits in VRAM
IQ3_XXS 3.37 Fair 11.97 GB 0.19 GB 12.95 GB 33.4 t/s Fits in VRAM
Q3_K_S 3.48 Fair 12.38 GB 0.19 GB 13.37 GB 32.3 t/s Fits in VRAM
Q3_K_XL 3.62 Fair 12.86 GB 0.19 GB 13.85 GB 31.1 t/s Fits in VRAM
Q3_K_M 3.85 Fair 13.7 GB 0.19 GB 14.69 GB 29.2 t/s Fits in VRAM
IQ4_XS 4.29 Good 15.25 GB 0.19 GB 16.24 GB 26.2 t/s Fits in VRAM
IQ4_NL 4.54 Good 16.12 GB 0.19 GB 17.11 GB 24.8 t/s Fits in VRAM
Q4_0 4.55 Good 16.19 GB 0.19 GB 17.17 GB 24.7 t/s Fits in VRAM
Q4_K_S 4.57 Good 16.26 GB 0.19 GB 17.24 GB 24.6 t/s Fits in VRAM
Q4_K_XL 4.63 Good 16.45 GB 0.19 GB 17.44 GB 24.3 t/s Fits in VRAM
Q4_K_M 4.86 Good 17.28 GB 0.19 GB 18.27 GB 23.1 t/s Fits in VRAM
Q4_1 5.03 Very good 17.87 GB 0.19 GB 18.86 GB 22.4 t/s Fits in VRAM
Q5_K_S 5.52 Very good 19.63 GB 0.19 GB 20.62 GB 20.4 t/s Fits in VRAM
Q5_K_M 5.69 Very good 20.23 GB 0.19 GB 21.22 GB 19.8 t/s Fits in VRAM
Q5_K_XL 5.7 Very good 20.25 GB 0.19 GB 21.23 GB 19.8 t/s Fits in VRAM
Q6_K 6.57 Excellent 23.37 GB 0.19 GB 24.36 GB 2.1 t/s Offload
Q6_K_XL 6.9 Excellent 24.53 GB 0.19 GB 25.52 GB 2.0 t/s Offload
Q8_0 8.51 Excellent 30.25 GB 0.19 GB 31.24 GB 1.7 t/s Offload
Q8_K_XL 9.43 Excellent 33.52 GB 0.19 GB 34.51 GB 1.5 t/s Offload
BF16 16.01 Excellent 56.9 GB 0.19 GB 57.89 GB Insufficient

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF?

You need about 9.95 GB of VRAM to run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF entirely on the GPU using the IQ1_M quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF on an 8 GB GPU?

Partially. unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF only fits on an 8 GB GPU by offloading part of it to system RAM (with Q5_K_XL), which runs but is slower.

Can I run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF on a 16 GB GPU?

Yes. With 16 GB of VRAM you can run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF fully on the GPU using Q3_K_M (about 14.69 GB).

Can I run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF on a 24 GB GPU?

Yes. With 24 GB of VRAM you can run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF fully on the GPU using Q5_K_XL (about 21.23 GB).

What is the best quantization for unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.