Run unsloth/gemma-4-26B-A4B-it-GGUF locally

License: apache-2.0 ⬇ 1,465,954 ❤ 918

Parameters25.23B

Context262,144

unsloth/gemma-4-26B-A4B-it-GGUF is a large instruction-tuned chat model with 25.23 billion parameters, built on the gemma4 architecture. It is released under the apache-2.0 license and has been downloaded 1,465,954 times.

To run unsloth/gemma-4-26B-A4B-it-GGUF locally at a 4,096-token context, its quantized versions need between 5.13 GB (F16, lowest quality) and 52.17 GB (BF16, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is F32, needing about 5.36 GB. That means unsloth/gemma-4-26B-A4B-it-GGUF fits entirely in the VRAM of a 6 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.	Bits	Quality	Weights	KV	Total	Speed~	Verdict
F16	0.65	Very low	1.91 GB	2.42 GB	5.13 GB	209.7 t/s	Fits in VRAM
F32	0.73	Very low	2.13 GB	2.42 GB	5.36 GB	187.5 t/s	Fits in VRAM
IQ2_XXS	3.15	Low	9.24 GB	2.42 GB	12.46 GB	5.4 t/s	Offload
IQ2_M	3.18	Low	9.33 GB	2.42 GB	12.55 GB	5.4 t/s	Offload
Q2_K_XL	3.34	Fair	9.82 GB	2.42 GB	13.05 GB	5.1 t/s	Offload
IQ3_S	3.58	Fair	10.51 GB	2.42 GB	13.74 GB	4.8 t/s	Offload
IQ3_XXS	3.62	Fair	10.63 GB	2.42 GB	13.86 GB	4.7 t/s	Offload
Q3_K_M	4.04	Fair	11.85 GB	2.42 GB	15.08 GB	4.2 t/s	Offload
Q3_K_XL	4.09	Fair	12.02 GB	2.42 GB	15.24 GB	4.2 t/s	Offload
IQ4_XS	4.31	Good	12.66 GB	2.42 GB	15.89 GB	3.9 t/s	Offload
IQ4_NL	4.32	Good	12.68 GB	2.42 GB	15.9 GB	3.9 t/s	Offload
Q4_K_S	5.23	Very good	15.36 GB	2.42 GB	18.58 GB	3.3 t/s	Offload
Q4_K_M	5.37	Very good	15.78 GB	2.42 GB	19.01 GB	3.2 t/s	Offload
Q4_K_XL	5.39	Very good	15.84 GB	2.42 GB	19.07 GB	3.2 t/s	Offload
GGUF	5.39	Very good	15.84 GB	2.42 GB	19.07 GB	3.2 t/s	Offload
Q5_K_S	5.98	Very good	17.56 GB	2.42 GB	20.78 GB	2.8 t/s	Offload
Q5_K_M	6.71	Excellent	19.7 GB	2.42 GB	22.92 GB	2.5 t/s	Offload
Q5_K_XL	6.73	Excellent	19.76 GB	2.42 GB	22.98 GB	2.5 t/s	Offload
Q6_K	7.35	Excellent	21.58 GB	2.42 GB	24.8 GB	—	Insufficient
Q6_K_XL	7.39	Excellent	21.7 GB	2.42 GB	24.92 GB	—	Insufficient
Q8_0	8.66	Excellent	25.45 GB	2.42 GB	28.67 GB	—	Insufficient
Q8_K_XL	8.76	Excellent	25.74 GB	2.42 GB	28.96 GB	—	Insufficient
BF16	16.66	Excellent	48.95 GB	2.42 GB	52.17 GB	—	Insufficient

KV cache estimated (architecture unavailable). Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run unsloth/gemma-4-26B-A4B-it-GGUF?

You need about 5.36 GB of VRAM to run unsloth/gemma-4-26B-A4B-it-GGUF entirely on the GPU using the F32 quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run unsloth/gemma-4-26B-A4B-it-GGUF on an 8 GB GPU?

Yes. With 8 GB of VRAM you can run unsloth/gemma-4-26B-A4B-it-GGUF fully on the GPU using F32 (about 5.36 GB).

Can I run unsloth/gemma-4-26B-A4B-it-GGUF on a 16 GB GPU?

Yes. With 16 GB of VRAM you can run unsloth/gemma-4-26B-A4B-it-GGUF fully on the GPU using IQ4_NL (about 15.9 GB).

Can I run unsloth/gemma-4-26B-A4B-it-GGUF on a 24 GB GPU?

Yes. With 24 GB of VRAM you can run unsloth/gemma-4-26B-A4B-it-GGUF fully on the GPU using Q5_K_XL (about 22.98 GB).

What is the best quantization for unsloth/gemma-4-26B-A4B-it-GGUF?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.