Run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF locally
MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF is a very large language model with 140.62 billion parameters, built on the llama architecture. It is released under the apache-2.0 license and has been downloaded 168,408 times.
To run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF locally at a 4,096-token context, its quantized versions need between 29.28 GB (IQ1_S, lowest quality) and 263.6 GB (GGUF, highest quality) of memory, weights plus KV cache and a system margin included.
All quantizations
| Quant. | Bits | Quality | Weights | KV | Total | Speed~ | Verdict |
|---|---|---|---|---|---|---|---|
| IQ1_S | 1.69 | Very low | 27.61 GB | 0.88 GB | 29.28 GB | — | Insufficient |
| IQ1_M | 1.86 | Very low | 30.48 GB | 0.88 GB | 32.16 GB | — | Insufficient |
| Q2_K | 2.96 | Low | 48.52 GB | 0.88 GB | 50.2 GB | — | Insufficient |
| IQ3_XS | 3.31 | Fair | 54.23 GB | 0.88 GB | 55.9 GB | — | Insufficient |
| Q3_K_S | 3.5 | Fair | 57.27 GB | 0.88 GB | 58.95 GB | — | Insufficient |
| Q3_K_M | 3.86 | Fair | 63.13 GB | 0.88 GB | 64.81 GB | — | Insufficient |
| Q3_K_L | 4.13 | Fair | 67.6 GB | 0.88 GB | 69.27 GB | — | Insufficient |
| IQ4_XS | 4.34 | Good | 71.11 GB | 0.88 GB | 72.79 GB | — | Insufficient |
| Q4_K_S | 4.58 | Good | 74.95 GB | 0.88 GB | 76.63 GB | — | Insufficient |
| Q4_K_M | 4.87 | Good | 79.71 GB | 0.88 GB | 81.38 GB | — | Insufficient |
| Q5_K_S | 5.52 | Very good | 90.31 GB | 0.88 GB | 91.99 GB | — | Insufficient |
| Q5_K_M | 5.69 | Very good | 93.1 GB | 0.88 GB | 94.78 GB | — | Insufficient |
| Q6_K | 6.57 | Excellent | 107.6 GB | 0.88 GB | 109.27 GB | — | Insufficient |
| Q8_0 | 8.5 | Excellent | 139.15 GB | 0.88 GB | 140.83 GB | — | Insufficient |
| GGUF | 16.0 | Excellent | 261.93 GB | 0.88 GB | 263.6 GB | — | Insufficient |
KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.
Frequently asked questions
How much VRAM do you need to run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF?
You need about 29.28 GB of VRAM to run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF entirely on the GPU using the IQ1_S quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.
Can I run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF on an 8 GB GPU?
No. MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF does not fit on an 8 GB GPU, even with the smallest quantization and system RAM offloading.
Can I run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF on a 16 GB GPU?
Partially. MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF only fits on a 16 GB GPU by offloading part of it to system RAM (with IQ1_M), which runs but is slower.
Can I run MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF on a 24 GB GPU?
Partially. MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF only fits on a 24 GB GPU by offloading part of it to system RAM (with Q3_K_L), which runs but is slower.
What is the best quantization for MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF?
If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.