|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199 |
- # LLamaSharp - .NET Binding for llama.cpp
-
- 
-
- [](https://discord.gg/7wNVU65ZDY)
- [](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=sN9VVMwbWjs5L0ATpizKKxOcZdEPMrp8&authKey=RLDw41bLTrEyEgZZi%2FzT4pYk%2BwmEFgFcrhs8ZbkiVY7a4JFckzJefaYNW6Lk4yPX&noverify=0&group_code=985366726)
- [](https://www.nuget.org/packages/LLamaSharp)
- [](https://www.nuget.org/packages/LLamaSharp.Backend.Cpu)
- [](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda11)
- [](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12)
-
-
- **The C#/.NET binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provides APIs to inference the LLaMa Models and deploy it on local environment. It works on
- both Windows, Linux and MAC without requirment for compiling llama.cpp yourself. Its performance is close to llama.cpp.**
-
- **Furthermore, it provides integrations with other projects such as [BotSharp](https://github.com/SciSharp/BotSharp) to provide higher-level applications and UI.**
-
-
- ## Documentation
-
- - [Quick start](https://scisharp.github.io/LLamaSharp/0.4/GetStarted/)
- - [Tricks for FAQ](https://scisharp.github.io/LLamaSharp/0.4/Tricks/)
- - [Full documentation](https://scisharp.github.io/LLamaSharp/0.4/)
- - [API reference](https://scisharp.github.io/LLamaSharp/0.4/xmldocs/)
- - [Examples](./LLama.Examples/NewVersion/)
-
- ## Installation
-
- Firstly, search `LLamaSharp` in nuget package manager and install it.
-
- ```
- PM> Install-Package LLamaSharp
- ```
-
- Then, search and install one of the following backends:
-
- ```
- LLamaSharp.Backend.Cpu
- LLamaSharp.Backend.Cuda11
- LLamaSharp.Backend.Cuda12
- ```
-
- Here's the mapping of them and corresponding model samples provided by `LLamaSharp`. If you're not sure which model is available for a version, please try our sample model.
-
- | LLamaSharp.Backend | LLamaSharp | Verified Model Resources | llama.cpp commit id |
- | - | - | -- | - |
- | - | v0.2.0 | This version is not recommended to use. | - |
- | - | v0.2.1 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama), [Vicuna (filenames with "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | - |
- | v0.2.2 | v0.2.2, v0.2.3 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama_ggmlv2), [Vicuna (filenames without "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | 63d2046 |
- | v0.3.0, v0.3.1 | v0.3.0, v0.4.0 | [LLamaSharpSamples v0.3.0](https://huggingface.co/AsakusaRinne/LLamaSharpSamples/tree/v0.3.0), [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/main) | 7e4ea5b |
- | v0.4.1-preview (cpu only) | v0.4.1-preview | [Open llama 3b](https://huggingface.co/SlyEcho/open_llama_3b_ggml), [Open Buddy](https://huggingface.co/OpenBuddy/openbuddy-llama-ggml)| aacdbd4 |
- | v0.4.2-preview (cpu,cuda11) |v0.4.2-preview | [Llama2 7b](https://huggingface.co/TheBloke/llama-2-7B-Guanaco-QLoRA-GGML)| 332311234a0aa2974b2450710e22e09d90dd6b0b |
-
- Many hands make light work. If you have found any other model resource that could work for a version, we'll appreciate it for opening an PR about it! 😊
-
- We publish the backend with cpu, cuda11 and cuda12 because they are the most popular ones. If none of them matches, please compile the [llama.cpp](https://github.com/ggerganov/llama.cpp)
- from source and put the `libllama` under your project's output path ([guide](https://scisharp.github.io/LLamaSharp/0.4/ContributingGuide/)).
-
- ## FAQ
-
- 1. GPU out of memory: Please try setting `n_gpu_layers` to a smaller number.
- 2. Unsupported model: `llama.cpp` is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install, or use the model we provide [on huggingface](https://huggingface.co/AsakusaRinne/LLamaSharpSamples).
-
-
-
- ## Usages
-
- #### Model Inference and Chat Session
-
- LLamaSharp provides two ways to run inference: `LLamaExecutor` and `ChatSession`. The chat session is a higher-level wrapping of the executor and the model. Here's a simple example to use chat session.
-
- ```cs
- using LLama.Common;
- using LLama;
-
- string modelPath = "<Your model path>"; // change it to your own model path
- var prompt = "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\r\n\r\nUser: Hello, Bob.\r\nBob: Hello. How may I help you today?\r\nUser: Please tell me the largest city in Europe.\r\nBob: Sure. The largest city in Europe is Moscow, the capital of Russia.\r\nUser:"; // use the "chat-with-bob" prompt here.
-
- // Initialize a chat session
- var ex = new InteractiveExecutor(new LLamaModel(new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)));
- ChatSession session = new ChatSession(ex);
-
- // show the prompt
- Console.WriteLine();
- Console.Write(prompt);
-
- // run the inference in a loop to chat with LLM
- while (prompt != "stop")
- {
- foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
- {
- Console.Write(text);
- }
- prompt = Console.ReadLine();
- }
-
- // save the session
- session.SaveSession("SavedSessionPath");
- ```
-
- #### Quantization
-
- The following example shows how to quantize the model. With LLamaSharp you needn't to compile c++ project and run scripts to quantize the model, instead, just run it in C#.
-
- ```cs
- string srcFilename = "<Your source path>";
- string dstFilename = "<Your destination path>";
- string ftype = "q4_0";
- if(Quantizer.Quantize(srcFileName, dstFilename, ftype))
- {
- Console.WriteLine("Quantization succeed!");
- }
- else
- {
- Console.WriteLine("Quantization failed!");
- }
- ```
-
- For more usages, please refer to [Examples](./LLama.Examples).
-
- #### Web API
-
- We provide the integration of ASP.NET core [here](./LLama.WebAPI). Since currently the API is not stable, please clone the repo and use it. In the future we'll publish it on NuGet.
-
- Since we are in short of hands, if you're familiar with ASP.NET core, we'll appreciate it if you would like to help upgrading the Web API integration.
-
- ## Demo
-
- 
-
- ## Roadmap
-
- ---
-
- ✅: completed. ⚠️: outdated for latest release but will be updated. 🔳: not completed
-
- ---
-
- ✅ LLaMa model inference
-
- ✅ Embeddings generation, tokenization and detokenization
-
- ✅ Chat session
-
- ✅ Quantization
-
- ✅ State saving and loading
-
- ⚠️ BotSharp Integration
-
- ✅ ASP.NET core Integration
-
- ⚠️ Semantic-kernel Integration
-
- 🔳 Fine-tune
-
- 🔳 Local document search
-
- 🔳 MAUI Integration
-
- 🔳 Follow up llama.cpp and improve performance
-
- ## Assets
-
- Some extra model resources could be found below:
-
- - [Qunatized models provided by LLamaSharp Authors](https://huggingface.co/AsakusaRinne/LLamaSharpSamples)
- - [eachadea/ggml-vicuna-13b-1.1](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main)
- - [TheBloke/wizardLM-7B-GGML](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
- - Magnet: [magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA](magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA)
-
- The weights included in the magnet is exactly the weights from [Facebook LLaMa](https://github.com/facebookresearch/llama).
-
- The prompts could be found below:
-
- - [llama.cpp prompts](https://github.com/ggerganov/llama.cpp/tree/master/prompts)
- - [ChatGPT_DAN](https://github.com/0xk1h0/ChatGPT_DAN)
- - [awesome-chatgpt-prompts](https://github.com/f/awesome-chatgpt-prompts)
- - [awesome-chatgpt-prompts-zh](https://github.com/PlexPt/awesome-chatgpt-prompts-zh) (Chinese)
-
- ## Contributing
-
- Any contribution is welcomed! Please read the [contributing guide](https://scisharp.github.io/LLamaSharp/0.4/ContributingGuide/). You can do one of the followings to help us make `LLamaSharp` better:
-
- - Append a model link that is available for a version. (This is very important!)
- - Star and share `LLamaSharp` to let others know it.
- - Add a feature or fix a BUG.
- - Help to develop Web API and UI integration.
- - Just start an issue about the problem you met!
-
- ## Contact us
-
- Join our chat on [Discord](https://discord.gg/7wNVU65ZDY).
-
- Join [QQ group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=sN9VVMwbWjs5L0ATpizKKxOcZdEPMrp8&authKey=RLDw41bLTrEyEgZZi%2FzT4pYk%2BwmEFgFcrhs8ZbkiVY7a4JFckzJefaYNW6Lk4yPX&noverify=0&group_code=985366726)
-
- ## License
-
- This project is licensed under the terms of the MIT license.
|