Run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF locally

License: apache-2.0 ⬇ 609,270 ❤ 311

Parameters30.53B

Context262,144

unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF is a very large instruction-tuned chat model with 30.53 billion parameters, built on the qwen3moe architecture. It is released under the apache-2.0 license and has been downloaded 609,270 times.

To run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF locally at a 4,096-token context, its quantized versions need between 8.53 GB (Q1_0, lowest quality) and 57.89 GB (BF16, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is Q5_K_XL, needing about 21.23 GB. That means unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF fits entirely in the VRAM of a 10 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.	Bits	Quality	Weights	KV	Total	Speed~	Verdict
Q1_0	2.12	Very low	7.54 GB	0.19 GB	8.53 GB	6.6 t/s	Offload
IQ1_S	2.37	Very low	8.42 GB	0.19 GB	9.41 GB	5.9 t/s	Offload
IQ1_M	2.54	Very low	9.02 GB	0.19 GB	10.01 GB	5.5 t/s	Offload
IQ2_XXS	2.71	Low	9.63 GB	0.19 GB	10.62 GB	5.2 t/s	Offload
IQ2_M	2.84	Low	10.1 GB	0.19 GB	11.09 GB	5.0 t/s	Offload
Q2_K	2.95	Low	10.49 GB	0.19 GB	11.47 GB	4.8 t/s	Offload
Q2_K_L	2.97	Low	10.55 GB	0.19 GB	11.54 GB	4.7 t/s	Offload
Q2_K_XL	3.09	Low	10.98 GB	0.19 GB	11.97 GB	4.6 t/s	Offload
IQ3_XXS	3.38	Fair	12.02 GB	0.19 GB	13.01 GB	4.2 t/s	Offload
Q3_K_S	3.48	Fair	12.38 GB	0.19 GB	13.37 GB	4.0 t/s	Offload
Q3_K_XL	3.62	Fair	12.88 GB	0.19 GB	13.87 GB	3.9 t/s	Offload
Q3_K_M	3.85	Fair	13.7 GB	0.19 GB	14.69 GB	3.6 t/s	Offload
IQ4_XS	4.29	Good	15.25 GB	0.19 GB	16.24 GB	3.3 t/s	Offload
IQ4_NL	4.54	Good	16.12 GB	0.19 GB	17.11 GB	3.1 t/s	Offload
Q4_0	4.55	Good	16.19 GB	0.19 GB	17.17 GB	3.1 t/s	Offload
Q4_K_S	4.57	Good	16.26 GB	0.19 GB	17.24 GB	3.1 t/s	Offload
Q4_K_XL	4.64	Good	16.48 GB	0.19 GB	17.46 GB	3.0 t/s	Offload
Q4_K_M	4.86	Good	17.28 GB	0.19 GB	18.27 GB	2.9 t/s	Offload
Q4_1	5.03	Very good	17.87 GB	0.19 GB	18.86 GB	2.8 t/s	Offload
Q5_K_S	5.52	Very good	19.63 GB	0.19 GB	20.62 GB	2.5 t/s	Offload
Q5_K_M	5.69	Very good	20.23 GB	0.19 GB	21.22 GB	2.5 t/s	Offload
Q5_K_XL	5.7	Very good	20.25 GB	0.19 GB	21.23 GB	2.5 t/s	Offload
Q6_K	6.57	Excellent	23.37 GB	0.19 GB	24.36 GB	—	Insufficient
Q6_K_XL	6.9	Excellent	24.53 GB	0.19 GB	25.52 GB	—	Insufficient
Q8_0	8.51	Excellent	30.25 GB	0.19 GB	31.24 GB	—	Insufficient
Q8_K_XL	9.43	Excellent	33.52 GB	0.19 GB	34.51 GB	—	Insufficient
BF16	16.01	Excellent	56.9 GB	0.19 GB	57.89 GB	—	Insufficient

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF?

You need about 9.41 GB of VRAM to run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF entirely on the GPU using the IQ1_S quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF on an 8 GB GPU?

Partially. unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF only fits on an 8 GB GPU by offloading part of it to system RAM (with Q5_K_XL), which runs but is slower.

Can I run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF on a 16 GB GPU?

Yes. With 16 GB of VRAM you can run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF fully on the GPU using Q3_K_M (about 14.69 GB).

Can I run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF on a 24 GB GPU?

Yes. With 24 GB of VRAM you can run unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF fully on the GPU using Q5_K_XL (about 21.23 GB).

What is the best quantization for unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.