Run unsloth/gpt-oss-120b-GGUF locally
unsloth/gpt-oss-120b-GGUF is a very large language model with 116.83 billion parameters, built on the gpt-oss architecture. It is released under the apache-2.0 license and has been downloaded 241,119 times.
To run unsloth/gpt-oss-120b-GGUF locally at a 4,096-token context, its quantized versions need between 59.35 GB (Q3_K_S, lowest quality) and 61.96 GB (F16, highest quality) of memory, weights plus KV cache and a system margin included.
All quantizations
| Quant. | Bits | Quality | Weights | KV | Total | Speed~ | Verdict |
|---|---|---|---|---|---|---|---|
| Q3_K_S | 4.28 | Good | 58.27 GB | 0.28 GB | 59.35 GB | — | Insufficient |
| Q2_K | 4.28 | Good | 58.27 GB | 0.28 GB | 59.35 GB | — | Insufficient |
| Q4_0 | 4.29 | Good | 58.32 GB | 0.28 GB | 59.4 GB | — | Insufficient |
| Q3_K_M | 4.29 | Good | 58.33 GB | 0.28 GB | 59.41 GB | — | Insufficient |
| Q4_1 | 4.29 | Good | 58.41 GB | 0.28 GB | 59.49 GB | — | Insufficient |
| Q4_K_S | 4.3 | Good | 58.45 GB | 0.28 GB | 59.53 GB | — | Insufficient |
| Q4_K_M | 4.3 | Good | 58.46 GB | 0.28 GB | 59.54 GB | — | Insufficient |
| Q2_K_L | 4.3 | Good | 58.54 GB | 0.28 GB | 59.62 GB | — | Insufficient |
| Q5_K_S | 4.31 | Good | 58.56 GB | 0.28 GB | 59.64 GB | — | Insufficient |
| Q5_K_M | 4.31 | Good | 58.57 GB | 0.28 GB | 59.65 GB | — | Insufficient |
| Q4_K_XL | 4.32 | Good | 58.69 GB | 0.28 GB | 59.77 GB | — | Insufficient |
| Q6_K | 4.33 | Good | 58.94 GB | 0.28 GB | 60.02 GB | — | Insufficient |
| Q6_K_XL | 4.33 | Good | 58.94 GB | 0.28 GB | 60.02 GB | — | Insufficient |
| Q8_0 | 4.34 | Good | 59.03 GB | 0.28 GB | 60.12 GB | — | Insufficient |
| Q8_K_XL | 4.41 | Good | 60.05 GB | 0.28 GB | 61.13 GB | — | Insufficient |
| F16 | 4.48 | Good | 60.88 GB | 0.28 GB | 61.96 GB | — | Insufficient |
KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.
Frequently asked questions
Can I run unsloth/gpt-oss-120b-GGUF on an 8 GB GPU?
No. unsloth/gpt-oss-120b-GGUF does not fit on an 8 GB GPU, even with the smallest quantization and system RAM offloading.
Can I run unsloth/gpt-oss-120b-GGUF on a 16 GB GPU?
No. unsloth/gpt-oss-120b-GGUF does not fit on a 16 GB GPU, even with the smallest quantization and system RAM offloading.
Can I run unsloth/gpt-oss-120b-GGUF on a 24 GB GPU?
Partially. unsloth/gpt-oss-120b-GGUF only fits on a 24 GB GPU by offloading part of it to system RAM (with F16), which runs but is slower.
What is the best quantization for unsloth/gpt-oss-120b-GGUF?
If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.