Run antirez/deepseek-v4-gguf locally

License: mit ⬇ 6,434,367 ❤ 281
Parameters284.33B
Context1,048,576

antirez/deepseek-v4-gguf is a very large language model with 284.33 billion parameters, built on the deepseek4 architecture. It is released under the mit license and has been downloaded 6,434,367 times.

To run antirez/deepseek-v4-gguf locally at a 4,096-token context, its quantized versions need between 1027.81 GB (IQ2, lowest quality) and 1239.64 GB (Q4, highest quality) of memory, weights plus KV cache and a system margin included.

→ Guide: How much VRAM do you need?

All quantizations

Quant.Bits QualityWeights KVTotal Speed~Verdict
IQ2 31.03 Excellent 1026.96 GB 0.04 GB 1027.81 GB Insufficient
Q4 37.42 Excellent 1238.8 GB 0.04 GB 1239.64 GB Insufficient

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

Can I run antirez/deepseek-v4-gguf on an 8 GB GPU?

No. antirez/deepseek-v4-gguf does not fit on an 8 GB GPU, even with the smallest quantization and system RAM offloading.

Can I run antirez/deepseek-v4-gguf on a 16 GB GPU?

No. antirez/deepseek-v4-gguf does not fit on a 16 GB GPU, even with the smallest quantization and system RAM offloading.

Can I run antirez/deepseek-v4-gguf on a 24 GB GPU?

No. antirez/deepseek-v4-gguf does not fit on a 24 GB GPU, even with the smallest quantization and system RAM offloading.

What is the best quantization for antirez/deepseek-v4-gguf?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.