LLamaSharp

Commit Graph

Author	SHA1	Message	Date
Rinne	495177fd0f	fix: typos.	1 year ago
Rinne	98909dc2af	Merge pull request #708 from AsakusaRinne/llama3_support Add LLaMA3 chat session example.	1 year ago
Rinne	175b25d4f7	Add LLaMA3 chat session example.	1 year ago
Martin Evans	377ebf3664	- Added `LoadFromFileAsync` method for `LLavaWeights` - Fixed checking for invalid handles in `clip_model_load`	1 year ago
Martin Evans	00df7c1516	- Added `LLamaWeights.LoadFromFileAsync`. - Async loading supports cancellation through a `CancellationToken`. If loading is cancelled an `OperationCanceledException` is thrown. If it fails for another reason a `LoadWeightsFailedException` is thrown. - Updated examples to use `LoadFromFileAsync`	1 year ago
Martin Evans	ccc49eb1e0	BatchedExecutor Save/Load (#681 ) * Added the ability to save and load individual conversations in a batched executor. - New example - Added `BatchedExecutor.Load(filepath)` method - Added `Conversation.Save(filepath)` method - Added new (currently internal) `SaveState`/`LoadState` methods in LLamaContext which can stash some extra binary data in the header * Added ability to save/load a `Conversation` to an in-memory state, instead of to file. * Moved the new save/load methods out to an extension class specifically for the batched executor. * Removed unnecessary spaces	1 year ago
Martin Evans	c325ac9127	April 2024 Binary Update (#662 ) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state	1 year ago
jlsantiago	399e81d314	Merge pull request #664 from SignalRT/LLavaResetOnImageChange Llava Initial approach to clear images	1 year ago
Martin Evans	274ab6e578	Merge pull request #663 from martindevans/remove_example_context_size Removed `ContextSize` from most examples	1 year ago
Martin Evans	6b816dd51b	Removed context size from SpeechChat	1 year ago
SignalRT	168f697db6	Clean up and align documentation with the changes in the interface	1 year ago
SignalRT	aa11562f62	Link the llama.cpp reference about reset llava contex	1 year ago
SignalRT	d6890e4ec4	Initial approach to clear images	1 year ago
Martin Evans	64db478578	Removed `ContextSize` from most examples. If it's not set it's retrieved from the model, which is usually what you want!	1 year ago
jlsantiago	8dd9101f8d	Merge pull request #653 from zsogitbe/master Extension LLava with in memory images	1 year ago
Lyrcaxis	b66b49de58	typo fix	1 year ago
Zoli Somogyi	f4fad825c7	Simplifying image handling	1 year ago
Lyrcaxis	8316c2c3c0	addressed change requests	1 year ago
Lyrcaxis	8c94659dbc	naming adjustments & beam sampling	1 year ago
Lyrcaxis	c86d4b9aba	spaces vs tabs	1 year ago
Lyrcaxis	417ed94a46	Example with GPU support	1 year ago
Zoli Somogyi	e991e631f9	Standardizing Image Data implementation	1 year ago
Lyrcaxis	469ec0d68a	minor fixup	1 year ago
Lyrcaxis	9e513204db	Added Whisper.net x LLamaSharp examples for Speech Detection and Speech Chat	1 year ago
Rinne	ec8f832365	fix: add cuda llava native libraries.	1 year ago
Rinne	b9444452eb	docs: refactor the documentations.	1 year ago
SignalRT	43677c511c	Change interface to support multiple images and add the capabitlity to render the image in the console	1 year ago
SignalRT	e8732efadd	Example InteractiveExecutor Add an Example and modifications to the interactive executor to enable Llava Models. Just a preview / demo	1 year ago
Rinne	b677cdc6a3	Merge pull request #560 from eublefar/feature/chat-session-state-management Chat session state management	1 year ago
Martin Evans	e2705be6c8	Fixed off by one error in LLamaBatch sampling position (#626 )	1 year ago
eublefar	9440f153da	Make process message method more flexible	1 year ago
Martin Evans	ad682fbebd	`BatchedExecutor.Create()` method (#613 ) Replaced `BatchedExecutor.Prompt(string)` method with `BatchedExecutor.Create()` method. This improves the API in two ways: - A conversation can be created, without immediately prompting it - Other prompting overloads (e.g. prompt with token list) can be used without duplicating all the overloads onto `BatchedExecutor` Added `BatchSize` property to `LLamaContext`	1 year ago
eublefar	a31391edd7	Polymorphic serialization for executor state and transforms	1 year ago
Martin Evans	f0b0bbcbb7	Mutable Logits (#586 ) Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.	1 year ago
eublefar	0763f307ec	Example chat session with preprocessing of chat history and reset operation that resets chat to original point of history without extra processing	1 year ago
Martin Evans	7d84625a67	Classifier Free Guidance (#536 ) * Added a `Guidance` method to `LLamaTokenDataArray` which applies classifier free guidance * Factored out a safer `llama_sample_apply_guidance` method based on spans * Created a guided sampling demo using the batched executor * fixed comment, "classifier free" not "context free" * Rebased onto master and fixed breakage due to changes in `BaseSamplingPipeline` * Asking user for guidance weight * Progress bar in batched fork demo * Improved fork example (using tree display) * Added proper disposal of resources in batched examples * Added some more comments in BatchedExecutorGuidance	1 year ago
Martin Evans	91a7967869	`ReadOnlySpan<float>` in ISamplingPipeline (#538 ) * - Modified ISamplingPipeline to accept `ReadOnlySpan<float>` of logits directly. This moves responsibility to copy the logits into the pipeline. - Added a flag to `BaseSamplingPipeline` indicating if a logit copy is necessary. Skipping it in most cases. * Fixed `RestoreProtectedTokens` not working if logit processing is skipped * - Implemented a new greedy sampling pipeline (always sample most likely token) - Moved `Grammar` into `BaseSamplingPipeline` - Removed "protected tokens" concept from `BaseSamplingPipeline`. Was introducing a lot of incidental complexity. - Implemented newline logit save/restore in `DefaultSamplingPipeline` (only place protected tokens was used) * Implemented pipelines for mirostat v1 and v2	1 year ago
Martin Evans	74a39188a2	Used `AnsiConsole` in a few more places: (#534 ) - UserSettings, simplifying the validation/re-ask loop down to one call - Program, adding colour to figlet title - Batched examples, showing default prompt - ExampleRunner, resetting state after running an example	1 year ago
Scott W Harden	91ca9d2732	LLamaSharp.Examples: Document Q&A with local storage (#532 ) * LLama.Examples: disable console logging * LLama.Examples: rename titles to signal grouped topics * LLama.Examples: add additional PDF for Q&A * LLama.Examples: improve kernel memory demo multi-document ingestion * LLama.Examples: improve message before resetting to main menu * LLama.Examples: document Q&A with local memory	1 year ago
Scott W Harden	06ffe3ac95	LLama.Examples: improve model path prompt (#526 ) * LLama.Examples: RepoUtils.cs → ConsoleLogger.cs * LLama.Examples: Examples/Runner.cs → ExampleRunner.cs * LLama.Examples: delete unused console logger * LLama.Examples: improve splash screen appearance the llama_empty_call() no longer shows configuration information on startup, but it will display it automatically the first time a model is engaged * LLama.Examples: Runner → ExampleRunner * LLama.Examples: improve model path prompt The last used model is stored in a config file and is re-used when a blank path is provided * LLama.Examples: NativeApi.llama_empty_call() at startup * LLama.Examples: reduce console noise when saving model path	1 year ago
Scott W Harden	efa49cc8de	Improve "embeddings" example (#525 ) * Embeddings example: set EmbeddingMode true prevents an exception from being thrown when GetEmbeddings() is called * Embeddings example: improve documentation and styling * docs: improve GetEmbeddings page If EmbeddingMode is not set to true, GetEmbeddings() throws an exception * docs: improve GetEmbeddings page The previous commit `6c9ff3158c` was inaccurate * Embeddings example: improve styling displays the example description after the model is loaded to ensure the text is on the screen at the time the prompt is first requested	1 year ago
Martin Evans	b0acecf080	Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix). Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state. Added two new examples, demonstrating forking and rewinding.	1 year ago
Martin Evans	92b9bbe779	Added methods to `SafeLLamaContextHandle` for KV cache manipulation	1 year ago
Martin Evans	96c26c25f5	Merge pull request #445 from martindevans/stateless_executor_llama_decode Swapped `StatelessExecutor` to use `llama_decode`!	1 year ago
xbotter	90815ae7d8	bump sk & km - bump semantic kernel to 1.1.0 - bump kernel memory to 0.26	1 year ago
Martin Evans	9fe878ae1f	- Fixed example - Growing more than double, if necessary	1 year ago
Martin Evans	a2e29d393c	Swapped `StatelessExecutor` to use `llama_decode`! - Added `logits_i` argument to `Context.ApplyPenalty` - Added a new exception type for `llama_decode` return code	1 year ago
Martin Evans	5b6e82a594	Improved the BatchedDecoding demo: - using less `NativeHandle` - Using `StreamingTokenDecoder` instead of obsolete detokenize method	1 year ago
Martin Evans	99969e538e	- Removed some unused `eval` methods. - Added a `DecodeAsync` overload which runs the work in a task - Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents. - Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.	1 year ago
Martin Evans	36a9335588	Removed `LLamaBatchSafeHandle` (using unmanaged memory, created by llama.cpp) and replaced it with a fully managed `LLamaBatch`. Modified the `BatchedDecoding` example to use new managed batch.	1 year ago

1 2

75 Commits (495177fd0fec99d3ad8530b9911e12c05aa4912c)