Run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive locally

License: gemma ⬇ 573,160 ❤ 848
Parameters7.52B
Context131,072

HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive is a mid-size language model with 7.52 billion parameters, built on the gemma4 architecture. It is released under the gemma license and has been downloaded 573,160 times.

To run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive locally at a 4,096-token context, its quantized versions need between 1.91 GB (F16, lowest quality) and 8.56 GB (Q8_K_P, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is Q6_K_P, needing about 6.81 GB. That means HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fits entirely in the VRAM of a 6 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.Bits QualityWeights KVTotal Speed~Verdict
F16 1.05 Very low 0.92 GB 0.19 GB 1.91 GB 433.7 t/s Fits in VRAM
Q2_K_P 4.72 Good 4.13 GB 0.19 GB 5.12 GB 96.9 t/s Fits in VRAM
IQ3_M 5.02 Very good 4.39 GB 0.19 GB 5.38 GB 91.1 t/s Fits in VRAM
Q3_K_M 5.16 Very good 4.52 GB 0.19 GB 5.5 GB 88.5 t/s Fits in VRAM
Q3_K_P 5.2 Very good 4.55 GB 0.19 GB 5.54 GB 87.9 t/s Fits in VRAM
IQ4_XS 5.4 Very good 4.72 GB 0.19 GB 5.71 GB 84.7 t/s Fits in VRAM
Q4_K_M 5.68 Very good 4.97 GB 0.19 GB 5.96 GB 80.5 t/s Fits in VRAM
Q4_K_P 5.71 Very good 5.0 GB 0.19 GB 5.99 GB 80.0 t/s Fits in VRAM
Q5_K_M 6.13 Very good 5.37 GB 0.19 GB 6.35 GB 74.5 t/s Fits in VRAM
Q5_K_P 6.19 Very good 5.41 GB 0.19 GB 6.4 GB 73.9 t/s Fits in VRAM
Q6_K_P 6.65 Excellent 5.82 GB 0.19 GB 6.81 GB 68.7 t/s Fits in VRAM
Q8_K_P 8.65 Excellent 7.57 GB 0.19 GB 8.56 GB 6.6 t/s Offload

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive?

You need about 5.99 GB of VRAM to run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive entirely on the GPU using the Q4_K_P quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive on an 8 GB GPU?

Yes. With 8 GB of VRAM you can run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fully on the GPU using Q6_K_P (about 6.81 GB).

Can I run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive on a 16 GB GPU?

Yes. With 16 GB of VRAM you can run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fully on the GPU using Q8_K_P (about 8.56 GB).

Can I run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive on a 24 GB GPU?

Yes. With 24 GB of VRAM you can run HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive fully on the GPU using Q8_K_P (about 8.56 GB).

What is the best quantization for HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.