LLamaSharp is the C#/.NET binding of llama.cpp. It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It could help C# developers to deploy the LLM (Large Language Model) locally and integrate with C# apps.
If you are new to LLM, here're some tips for you to help you to get start with LLamaSharp. If you are experienced in this field, we'd still recommend you to take a few minutes to read it because some things perform differently compared to cpp/python.
LLamaSharp and LLama.Backend. After installing LLamaSharp, please install one of LLama.Backend.Cpu, LLama.Backend.Cuda11 or LLama.Backend.Cuda12. If you use the source code, dynamic libraries can be found in LLama/Runtimes. Rename the one you want to use to libllama.dll.LLaMa originally refers to the weights released by Meta (Facebook Research). After that, many models are fine-tuned based on it, such as Vicuna, GPT4All, and Pyglion. Though all of these models are supported by LLamaSharp, some steps are necessary with different file formats. There're mainly three kinds of files, which are .pth, .bin (ggml), .bin (quantized). If you have the .bin (quantized) file, it could be used directly by LLamaSharp. If you have the .bin (ggml) file, you could use it directly but get higher inference speed after the quantization. If you have the .pth file, you need to follow the instructions in llama.cpp to convert it to .bin (ggml) file at first.Community effort is always one of the most important things in open-source projects. Any contribution in any way is welcomed here. For example, the following things mean a lot for LLamaSharp:
If you'd like to get deeply involved in development, please touch us in discord channel or send email to AsakusaRinne@gmail.com. :)