Run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive locally

License: gemma ⬇ 573,160 ❤ 848

Parameters7.52B

Context131,072

HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive is a mid-size language model with 7.52 billion parameters, built on the gemma4 architecture. It is released under the gemma license and has been downloaded 573,160 times.

To run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive locally at a 4,096-token context, its quantized versions need between 1.91 GB (F16, lowest quality) and 8.56 GB (Q8_K_P, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is Q6_K_P, needing about 6.81 GB. That means HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fits entirely in the VRAM of a 6 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.	Bits	Quality	Weights	KV	Total	Speed~	Verdict
F16	1.05	Very low	0.92 GB	0.19 GB	1.91 GB	433.7 t/s	Fits in VRAM
Q2_K_P	4.72	Good	4.13 GB	0.19 GB	5.12 GB	96.9 t/s	Fits in VRAM
IQ3_M	5.02	Very good	4.39 GB	0.19 GB	5.38 GB	91.1 t/s	Fits in VRAM
Q3_K_M	5.16	Very good	4.52 GB	0.19 GB	5.5 GB	88.5 t/s	Fits in VRAM
Q3_K_P	5.2	Very good	4.55 GB	0.19 GB	5.54 GB	87.9 t/s	Fits in VRAM
IQ4_XS	5.4	Very good	4.72 GB	0.19 GB	5.71 GB	84.7 t/s	Fits in VRAM
Q4_K_M	5.68	Very good	4.97 GB	0.19 GB	5.96 GB	80.5 t/s	Fits in VRAM
Q4_K_P	5.71	Very good	5.0 GB	0.19 GB	5.99 GB	80.0 t/s	Fits in VRAM
Q5_K_M	6.13	Very good	5.37 GB	0.19 GB	6.35 GB	74.5 t/s	Fits in VRAM
Q5_K_P	6.19	Very good	5.41 GB	0.19 GB	6.4 GB	73.9 t/s	Fits in VRAM
Q6_K_P	6.65	Excellent	5.82 GB	0.19 GB	6.81 GB	68.7 t/s	Fits in VRAM
Q8_K_P	8.65	Excellent	7.57 GB	0.19 GB	8.56 GB	6.6 t/s	Offload

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive?

You need about 5.99 GB of VRAM to run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive entirely on the GPU using the Q4_K_P quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive on an 8 GB GPU?

Yes. With 8 GB of VRAM you can run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fully on the GPU using Q6_K_P (about 6.81 GB).

Can I run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive on a 16 GB GPU?

Yes. With 16 GB of VRAM you can run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fully on the GPU using Q8_K_P (about 8.56 GB).

Can I run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive on a 24 GB GPU?

Yes. With 24 GB of VRAM you can run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fully on the GPU using Q8_K_P (about 8.56 GB).

What is the best quantization for HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.