You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

quantization.md 684 B

1234567891011121314151617181920212223
  1. # Quantization
  2. Quantization is significant to accelerate the model inference. Since there's little accuracy (performance) reduction when quantizing the model, get it easy to quantize it!
  3. To quantize the model, please call `Quantize` from `LLamaQuantizer`, which is a static method.
  4. ```cs
  5. string srcPath = "<model.bin>";
  6. string dstPath = "<model_q4_0.bin>";
  7. LLamaQuantizer.Quantize(srcPath, dstPath, "q4_0");
  8. // The following overload is also okay.
  9. // LLamaQuantizer.Quantize(srcPath, dstPath, LLamaFtype.LLAMA_FTYPE_MOSTLY_Q4_0);
  10. ```
  11. After calling it, a quantized model file will be saved.
  12. There're currently 5 types of quantization supported:
  13. - q4_0
  14. - q4_1
  15. - q5_0
  16. - q5_1
  17. - q8_0