Run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF locally

⬇ 163,018 ❤ 18

Parameters1.48B

Context131,072

MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF is a compact code-focused language model with 1.48 billion parameters, built on the llama architecture. It has been downloaded 163,018 times.

To run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF locally at a 4,096-token context, its quantized versions need between 2.01 GB (IQ1_S, lowest quality) and 4.3 GB (GGUF, highest quality) of memory, weights plus KV cache and a system margin included.

For most users the best balance is Q5_K_M, needing about 2.57 GB. That means MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF fits entirely in the VRAM of a 6 GB GPU or larger, running fully on the GPU.

→ Guide: How much VRAM do you need?

All quantizations

Quant.	Bits	Quality	Weights	KV	Total	Speed~	Verdict
IQ1_S	2.66	Low	0.46 GB	0.75 GB	2.01 GB	874.4 t/s	Fits in VRAM
IQ1_M	2.76	Low	0.47 GB	0.75 GB	2.02 GB	844.5 t/s	Fits in VRAM
IQ2_XS	3.06	Low	0.53 GB	0.75 GB	2.08 GB	761.6 t/s	Fits in VRAM
Q2_K	3.44	Fair	0.59 GB	0.75 GB	2.14 GB	676.7 t/s	Fits in VRAM
IQ3_XS	3.77	Fair	0.65 GB	0.75 GB	2.2 GB	618.0 t/s	Fits in VRAM
Q3_K_S	3.92	Fair	0.67 GB	0.75 GB	2.22 GB	593.7 t/s	Fits in VRAM
Q3_K_M	4.26	Good	0.73 GB	0.75 GB	2.28 GB	546.6 t/s	Fits in VRAM
Q3_K_L	4.48	Good	0.77 GB	0.75 GB	2.32 GB	519.9 t/s	Fits in VRAM
IQ4_XS	4.51	Good	0.78 GB	0.75 GB	2.33 GB	515.9 t/s	Fits in VRAM
Q4_K_S	4.9	Good	0.84 GB	0.75 GB	2.39 GB	475.0 t/s	Fits in VRAM
Q4_K_M	5.22	Very good	0.9 GB	0.75 GB	2.45 GB	445.7 t/s	Fits in VRAM
Q5_K_S	5.7	Very good	0.98 GB	0.75 GB	2.53 GB	408.6 t/s	Fits in VRAM
Q5_K_M	5.96	Very good	1.02 GB	0.75 GB	2.57 GB	390.4 t/s	Fits in VRAM
GGUF	16.01	Excellent	2.75 GB	0.75 GB	4.3 GB	18.2 t/s	Offload

KV cache computed from the model's exact architecture. Speed is a rough estimate bounded by memory bandwidth.

Frequently asked questions

How much VRAM do you need to run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF?

You need about 4.3 GB of VRAM to run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF entirely on the GPU using the GGUF quantization (at a 4,096-token context). Smaller quantizations lower the requirement at the cost of quality.

Can I run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF on an 8 GB GPU?

Yes. With 8 GB of VRAM you can run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF fully on the GPU using GGUF (about 4.3 GB).

Can I run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF on a 16 GB GPU?

Yes. With 16 GB of VRAM you can run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF fully on the GPU using GGUF (about 4.3 GB).

Can I run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF on a 24 GB GPU?

Yes. With 24 GB of VRAM you can run MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF fully on the GPU using GGUF (about 4.3 GB).

What is the best quantization for MaziyarPanahi/Yi-Coder-1.5B-Chat-GGUF?

If memory allows, higher bits-per-weight means better quality. A common sweet spot is a Q4_K_M or Q5_K_M quantization, which keeps most of the quality while roughly halving the memory versus 8-bit. Pick the highest quantization that still fits in your VRAM.