Run unsloth/gpt-oss-120b-GGUF locally

License: apache-2.0 ⬇ 241,119 ❤ 273
Parameters116.83B
Context131,072

unsloth/gpt-oss-120b-GGUF is a very large language model with 116.83 billion parameters, built on the gpt-oss architecture. It is released under the apache-2.0 license and has been downloaded 241,119 times.

To run unsloth/gpt-oss-120b-GGUF locally at a 4,096-token context, its quantized versions need between 59.35 GB (Q3_K_S, lowest quality) and 61.96 GB (F16, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is F16, needing about 61.96 GB. On consumer GPUs it generally requires offloading part of the model to system RAM, which works but is slower.

→ Guide: How much VRAM do you need?

All quantizations

Quant.Bits QualityWeights KVTotal Speed~Verdict
Q3_K_S 4.28 Good 58.27 GB 0.28 GB 59.35 GB 0.9 t/s Offload
Q2_K 4.28 Good 58.27 GB 0.28 GB 59.35 GB 0.9 t/s Offload
Q4_0 4.29 Good 58.32 GB 0.28 GB 59.4 GB 0.9 t/s Offload
Q3_K_M 4.29 Good 58.33 GB 0.28 GB 59.41 GB 0.9 t/s Offload
Q4_1 4.29 Good 58.41 GB 0.28 GB 59.49 GB 0.9 t/s Offload
Q4_K_S 4.3 Good 58.45 GB 0.28 GB 59.53 GB 0.9 t/s Offload
Q4_K_M 4.3 Good 58.46 GB 0.28 GB 59.54 GB 0.9 t/s Offload
Q2_K_L 4.3 Good 58.54 GB 0.28 GB 59.62 GB 0.9 t/s Offload
Q5_K_S 4.31 Good 58.56 GB 0.28 GB 59.64 GB 0.9 t/s Offload
Q5_K_M 4.31 Good 58.57 GB 0.28 GB 59.65 GB 0.9 t/s Offload
Q4_K_XL 4.32 Good 58.69 GB 0.28 GB 59.77 GB 0.9 t/s Offload
Q6_K 4.33 Good 58.94 GB 0.28 GB 60.02 GB 0.8 t/s Offload
Q6_K_XL 4.33 Good 58.94 GB 0.28 GB 60.02 GB 0.8 t/s Offload
Q8_0 4.34 Good 59.03 GB 0.28 GB 60.12 GB 0.8 t/s Offload
Q8_K_XL 4.41 Good 60.05 GB 0.28 GB 61.13 GB 0.8 t/s Offload
F16 4.48 Good 60.88 GB 0.28 GB 61.96 GB 0.8 t/s Offload

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

Can I run unsloth/gpt-oss-120b-GGUF on an 8 GB GPU?

No. unsloth/gpt-oss-120b-GGUF does not fit on an 8 GB GPU, even with the smallest quantization and system RAM offloading.

Can I run unsloth/gpt-oss-120b-GGUF on a 16 GB GPU?

No. unsloth/gpt-oss-120b-GGUF does not fit on a 16 GB GPU, even with the smallest quantization and system RAM offloading.

Can I run unsloth/gpt-oss-120b-GGUF on a 24 GB GPU?

Partially. unsloth/gpt-oss-120b-GGUF only fits on a 24 GB GPU by offloading part of it to system RAM (with F16), which runs but is slower.

What is the best quantization for unsloth/gpt-oss-120b-GGUF?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.