Run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF locally

License: apache-2.0 ⬇ 168,408 ❤ 77

Parameters140.62B

Context65,536

MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF is a very large language model with 140.62 billion parameters, built on the llama architecture. It is released under the apache-2.0 license and has been downloaded 168,408 times.

To run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF locally at a 4,096-token context, its quantized versions need between 29.28 GB (IQ1_S, lowest quality) and 263.6 GB (GGUF, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is IQ3_XS, needing about 55.9 GB. That means MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF fits entirely in the VRAM of a 32 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.	Bits	Quality	Weights	KV	Total	Speed~	Verdict
IQ1_S	1.69	Very low	27.61 GB	0.88 GB	29.28 GB	1.8 t/s	Offload
IQ1_M	1.86	Very low	30.48 GB	0.88 GB	32.16 GB	1.6 t/s	Offload
Q2_K	2.96	Low	48.52 GB	0.88 GB	50.2 GB	1.0 t/s	Offload
IQ3_XS	3.31	Fair	54.23 GB	0.88 GB	55.9 GB	0.9 t/s	Offload
Q3_K_S	3.5	Fair	57.27 GB	0.88 GB	58.95 GB	—	Insufficient
Q3_K_M	3.86	Fair	63.13 GB	0.88 GB	64.81 GB	—	Insufficient
Q3_K_L	4.13	Fair	67.6 GB	0.88 GB	69.27 GB	—	Insufficient
IQ4_XS	4.34	Good	71.11 GB	0.88 GB	72.79 GB	—	Insufficient
Q4_K_S	4.58	Good	74.95 GB	0.88 GB	76.63 GB	—	Insufficient
Q4_K_M	4.87	Good	79.71 GB	0.88 GB	81.38 GB	—	Insufficient
Q5_K_S	5.52	Very good	90.31 GB	0.88 GB	91.99 GB	—	Insufficient
Q5_K_M	5.69	Very good	93.1 GB	0.88 GB	94.78 GB	—	Insufficient
Q6_K	6.57	Excellent	107.6 GB	0.88 GB	109.27 GB	—	Insufficient
Q8_0	8.5	Excellent	139.15 GB	0.88 GB	140.83 GB	—	Insufficient
GGUF	16.0	Excellent	261.93 GB	0.88 GB	263.6 GB	—	Insufficient

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF?

You need about 29.28 GB of VRAM to run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF entirely on the GPU using the IQ1_S quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF on an 8 GB GPU?

No. MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF does not fit on an 8 GB GPU, even with the smallest quantization and system RAM offloading.

Can I run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF on a 16 GB GPU?

Partially. MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF only fits on a 16 GB GPU by offloading part of it to system RAM (with IQ1_M), which runs but is slower.

Can I run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF on a 24 GB GPU?

Partially. MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF only fits on a 24 GB GPU by offloading part of it to system RAM (with Q3_K_L), which runs but is slower.

What is the best quantization for MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.