Run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF locally

License: apache-2.0 ⬇ 243,253 ❤ 763

Parameters30.53B

Context262,144

unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF is a very large code-focused language model with 30.53 billion parameters, built on the qwen3moe architecture. It is released under the apache-2.0 license and has been downloaded 243,253 times.

To run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF locally at a 4,096-token context, its quantized versions need between 8.44 GB (Q1_0, lowest quality) and 57.89 GB (BF16, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is Q5_K_XL, needing about 21.23 GB. That means unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF fits entirely in the VRAM of a 10 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.	Bits	Quality	Weights	KV	Total	Speed~	Verdict
Q1_0	2.1	Very low	7.46 GB	0.19 GB	8.44 GB	53.7 t/s	Fits in VRAM
IQ1_S	2.34	Very low	8.3 GB	0.19 GB	9.29 GB	48.2 t/s	Fits in VRAM
IQ1_M	2.52	Very low	8.97 GB	0.19 GB	9.95 GB	44.6 t/s	Fits in VRAM
IQ2_XXS	2.71	Low	9.62 GB	0.19 GB	10.61 GB	41.6 t/s	Fits in VRAM
IQ2_M	2.84	Low	10.09 GB	0.19 GB	11.08 GB	39.6 t/s	Fits in VRAM
Q2_K	2.95	Low	10.49 GB	0.19 GB	11.47 GB	38.1 t/s	Fits in VRAM
Q2_K_L	2.97	Low	10.55 GB	0.19 GB	11.54 GB	37.9 t/s	Fits in VRAM
Q2_K_XL	3.09	Low	10.98 GB	0.19 GB	11.97 GB	36.4 t/s	Fits in VRAM
IQ3_XXS	3.37	Fair	11.97 GB	0.19 GB	12.95 GB	33.4 t/s	Fits in VRAM
Q3_K_S	3.48	Fair	12.38 GB	0.19 GB	13.37 GB	32.3 t/s	Fits in VRAM
Q3_K_XL	3.62	Fair	12.86 GB	0.19 GB	13.85 GB	31.1 t/s	Fits in VRAM
Q3_K_M	3.85	Fair	13.7 GB	0.19 GB	14.69 GB	29.2 t/s	Fits in VRAM
IQ4_XS	4.29	Good	15.25 GB	0.19 GB	16.24 GB	26.2 t/s	Fits in VRAM
IQ4_NL	4.54	Good	16.12 GB	0.19 GB	17.11 GB	24.8 t/s	Fits in VRAM
Q4_0	4.55	Good	16.19 GB	0.19 GB	17.17 GB	24.7 t/s	Fits in VRAM
Q4_K_S	4.57	Good	16.26 GB	0.19 GB	17.24 GB	24.6 t/s	Fits in VRAM
Q4_K_XL	4.63	Good	16.45 GB	0.19 GB	17.44 GB	24.3 t/s	Fits in VRAM
Q4_K_M	4.86	Good	17.28 GB	0.19 GB	18.27 GB	23.1 t/s	Fits in VRAM
Q4_1	5.03	Very good	17.87 GB	0.19 GB	18.86 GB	22.4 t/s	Fits in VRAM
Q5_K_S	5.52	Very good	19.63 GB	0.19 GB	20.62 GB	20.4 t/s	Fits in VRAM
Q5_K_M	5.69	Very good	20.23 GB	0.19 GB	21.22 GB	19.8 t/s	Fits in VRAM
Q5_K_XL	5.7	Very good	20.25 GB	0.19 GB	21.23 GB	19.8 t/s	Fits in VRAM
Q6_K	6.57	Excellent	23.37 GB	0.19 GB	24.36 GB	2.1 t/s	Offload
Q6_K_XL	6.9	Excellent	24.53 GB	0.19 GB	25.52 GB	2.0 t/s	Offload
Q8_0	8.51	Excellent	30.25 GB	0.19 GB	31.24 GB	1.7 t/s	Offload
Q8_K_XL	9.43	Excellent	33.52 GB	0.19 GB	34.51 GB	1.5 t/s	Offload
BF16	16.01	Excellent	56.9 GB	0.19 GB	57.89 GB	—	Insufficient

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF?

You need about 9.95 GB of VRAM to run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF entirely on the GPU using the IQ1_M quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF on an 8 GB GPU?

Partially. unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF only fits on an 8 GB GPU by offloading part of it to system RAM (with Q5_K_XL), which runs but is slower.

Can I run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF on a 16 GB GPU?

Yes. With 16 GB of VRAM you can run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF fully on the GPU using Q3_K_M (about 14.69 GB).

Can I run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF on a 24 GB GPU?

Yes. With 24 GB of VRAM you can run unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF fully on the GPU using Q5_K_XL (about 21.23 GB).

What is the best quantization for unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.