| @@ -10,14 +10,19 @@ | |||||
| [](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12) | [](https://www.nuget.org/packages/LLamaSharp.Backend.Cuda12) | ||||
| The C#/.NET binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It works on | |||||
| both Windows and Linux and does NOT require compiling llama.cpp yourself. Its performance is close to llama.cpp. | |||||
| **The C#/.NET binding of [llama.cpp](https://github.com/ggerganov/llama.cpp). It provides APIs to inference the LLaMa Models and deploy it on local environment. It works on | |||||
| both Windows, Linux and MAC without requirment for compiling llama.cpp yourself. Its performance is close to llama.cpp.** | |||||
| - LLaMa models inference | |||||
| - APIs for chat session | |||||
| - Model quantization | |||||
| - Embedding generation, tokenization and detokenization | |||||
| - ASP.NET core integration | |||||
| **Furthermore, it provides integrations with other projects such as [BotSharp](https://github.com/SciSharp/BotSharp) to provide higher-level applications and UI.** | |||||
| ## Documentation | |||||
| - [Quick start](https://scisharp.github.io/LLamaSharp/0.4/GetStarted/) | |||||
| - [Tricks for FAQ](https://scisharp.github.io/LLamaSharp/0.4/Tricks/) | |||||
| - [Full documentation](https://scisharp.github.io/LLamaSharp/0.4/) | |||||
| - [API reference](https://scisharp.github.io/LLamaSharp/0.4/xmldocs/) | |||||
| - [Examples](./LLama.Examples/NewVersion/) | |||||
| ## Installation | ## Installation | ||||
| @@ -42,7 +47,7 @@ Here's the mapping of them and corresponding model samples provided by `LLamaSha | |||||
| | - | v0.2.0 | This version is not recommended to use. | - | | | - | v0.2.0 | This version is not recommended to use. | - | | ||||
| | - | v0.2.1 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama), [Vicuna (filenames with "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | - | | | - | v0.2.1 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama), [Vicuna (filenames with "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | - | | ||||
| | v0.2.2 | v0.2.2, v0.2.3 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama_ggmlv2), [Vicuna (filenames without "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | 63d2046 | | | v0.2.2 | v0.2.2, v0.2.3 | [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/previous_llama_ggmlv2), [Vicuna (filenames without "old")](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main) | 63d2046 | | ||||
| | v0.3.0 | v0.3.0 | [LLamaSharpSamples v0.3.0](https://huggingface.co/AsakusaRinne/LLamaSharpSamples/tree/v0.3.0), [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/main) | 7e4ea5b | | |||||
| | v0.3.0, v0.3.1 | v0.3.0, v0.4.0 | [LLamaSharpSamples v0.3.0](https://huggingface.co/AsakusaRinne/LLamaSharpSamples/tree/v0.3.0), [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-GGML/tree/main) | 7e4ea5b | | |||||
| We publish the backend with cpu, cuda11 and cuda12 because they are the most popular ones. If none of them matches, please compile the [llama.cpp](https://github.com/ggerganov/llama.cpp) | We publish the backend with cpu, cuda11 and cuda12 because they are the most popular ones. If none of them matches, please compile the [llama.cpp](https://github.com/ggerganov/llama.cpp) | ||||
| from source and put the `libllama` under your project's output path. When building from source, please add `-DBUILD_SHARED_LIBS=ON` to enable the library generation. | from source and put the `libllama` under your project's output path. When building from source, please add `-DBUILD_SHARED_LIBS=ON` to enable the library generation. | ||||
| @@ -53,44 +58,40 @@ from source and put the `libllama` under your project's output path. When buildi | |||||
| 2. Unsupported model: `llama.cpp` is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install, or use the model we provide [on huggingface](https://huggingface.co/AsakusaRinne/LLamaSharpSamples). | 2. Unsupported model: `llama.cpp` is under quick development and often has break changes. Please check the release date of the model and find a suitable version of LLamaSharp to install, or use the model we provide [on huggingface](https://huggingface.co/AsakusaRinne/LLamaSharpSamples). | ||||
| ## Simple Benchmark | |||||
| Currently it's only a simple benchmark to indicate that the performance of `LLamaSharp` is close to `llama.cpp`. Experiments run on a computer | |||||
| with Intel i7-12700, 3060Ti with 7B model. Note that the benchmark uses `LLamaModel` instead of `LLamaModelV1`. | |||||
| #### Windows | |||||
| - llama.cpp: 2.98 words / second | |||||
| - LLamaSharp: 2.94 words / second | |||||
| ## Usages | ## Usages | ||||
| #### Model Inference and Chat Session | #### Model Inference and Chat Session | ||||
| Currently, `LLamaSharp` provides two kinds of model, `LLamaModelV1` and `LLamaModel`. Both of them works but `LLamaModel` is more recommended | |||||
| because it provides better alignment with the master branch of [llama.cpp](https://github.com/ggerganov/llama.cpp). | |||||
| Besides, `ChatSession` makes it easier to wrap your own chat bot. The code below is a simple example. For all examples, please refer to | |||||
| [Examples](./LLama.Examples). | |||||
| LLamaSharp provides two ways to run inference: `LLamaExecutor` and `ChatSession`. The chat session is a higher-level wrapping of the executor and the model. Here's a simple example to use chat session. | |||||
| ```cs | ```cs | ||||
| using LLama.Common; | |||||
| using LLama; | |||||
| string modelPath = "<Your model path>" // change it to your own model path | |||||
| var prompt = "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\r\n\r\nUser: Hello, Bob.\r\nBob: Hello. How may I help you today?\r\nUser: Please tell me the largest city in Europe.\r\nBob: Sure. The largest city in Europe is Moscow, the capital of Russia.\r\nUser:"; // use the "chat-with-bob" prompt here. | |||||
| var model = new LLamaModel(new LLamaParams(model: "<Your path>", n_ctx: 512, repeat_penalty: 1.0f)); | |||||
| var session = new ChatSession<LLamaModel>(model).WithPromptFile("<Your prompt file path>") | |||||
| .WithAntiprompt(new string[] { "User:" }); | |||||
| Console.Write("\nUser:"); | |||||
| while (true) | |||||
| // Initialize a chat session | |||||
| var ex = new InteractiveExecutor(new LLamaModel(new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5))); | |||||
| ChatSession session = new ChatSession(ex); | |||||
| // show the prompt | |||||
| Console.WriteLine(); | |||||
| Console.Write(prompt); | |||||
| // run the inference in a loop to chat with LLM | |||||
| while (prompt != "stop") | |||||
| { | { | ||||
| Console.ForegroundColor = ConsoleColor.Green; | |||||
| var question = Console.ReadLine(); | |||||
| Console.ForegroundColor = ConsoleColor.White; | |||||
| var outputs = session.Chat(question); // It's simple to use the chat API. | |||||
| foreach (var output in outputs) | |||||
| foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } })) | |||||
| { | { | ||||
| Console.Write(output); | |||||
| Console.Write(text); | |||||
| } | } | ||||
| prompt = Console.ReadLine(); | |||||
| } | } | ||||
| // save the session | |||||
| session.SaveSession("SavedSessionPath"); | |||||
| ``` | ``` | ||||
| #### Quantization | #### Quantization | ||||
| @@ -125,6 +126,12 @@ Since we are in short of hands, if you're familiar with ASP.NET core, we'll appr | |||||
| ## Roadmap | ## Roadmap | ||||
| --- | |||||
| ✅: completed. ⚠️: outdated but will be updated. 🔳: not completed | |||||
| --- | |||||
| ✅ LLaMa model inference | ✅ LLaMa model inference | ||||
| ✅ Embeddings generation, tokenization and detokenization | ✅ Embeddings generation, tokenization and detokenization | ||||
| @@ -135,7 +142,11 @@ Since we are in short of hands, if you're familiar with ASP.NET core, we'll appr | |||||
| ✅ State saving and loading | ✅ State saving and loading | ||||
| ✅ ASP.NET core Integration | |||||
| ✅ BotSharp Integration | |||||
| ⚠️ ASP.NET core Integration | |||||
| ⚠️ Semantic-kernel Integration | |||||
| 🔳 MAUI Integration | 🔳 MAUI Integration | ||||
| @@ -161,7 +172,7 @@ The prompts could be found below: | |||||
| ## Contributing | ## Contributing | ||||
| Any contribution is welcomed! You can do one of the followings to help us make `LLamaSharp` better: | |||||
| Any contribution is welcomed! Please read the [contributing guide](https://scisharp.github.io/LLamaSharp/0.4/ContributingGuide/). You can do one of the followings to help us make `LLamaSharp` better: | |||||
| - Append a model link that is available for a version. (This is very important!) | - Append a model link that is available for a version. (This is very important!) | ||||
| - Star and share `LLamaSharp` to let others know it. | - Star and share `LLamaSharp` to let others know it. | ||||