LLamaModelQuantizeParams

Namespace: LLama.Native

Quantizer parameters used in the native API

public struct LLamaModelQuantizeParams

Fields

number of threads to use for quantizing, if <=0 will use std:🧵:hardware_concurrency()

public int nthread;

quantize to this llama_ftype

public LLamaFtype ftype;

allow quantizing non-f32/f16 tensors

public bool allow_requantize { get; set; }

quantize output.weight

public bool quantize_output_tensor { get; set; }