LLamaSharp

Commit Graph

Author	SHA1	Message	Date
Martin Evans	ecb359c9e7	- Using more specific `LoadWeightsFailedException` when a llava model fails to load (#697 ) - Passing model path, instead of a message, to `LoadWeightsFailedException` constructor	1 year ago
Martin Evans	c325ac9127	April 2024 Binary Update (#662 ) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state	1 year ago
Martin Evans	73172bbaba	Merge pull request #438 from martindevans/cleanup_model_unnecessary_unsafe Model Metadata Loading Cleanup	1 year ago
Martin Evans	ce1d302e7e	Moved some native methods into `SafeLlamaModelHandle`, these methods are all wrapped in safer accessors with no extra costs so there is no need to expose them.	1 year ago
Martin Evans	1e86755071	- Removed unnecessary `unsafe` block in model metadata loading - Clarified comments on native metadata loading methods	1 year ago
Martin Evans	de2b20aae5	- Added a specific exception for failing to load model weights. - Checking if model is readable	1 year ago
Martin Evans	096e0e75f8	Check that the model file actually exists immediately before loading it. Improve #395	1 year ago
Martin Evans	2ea2048b78	- Added a test for tokenizing just a new line (reproduce issue https://github.com/SciSharp/LLamaSharp/issues/430 ) - Properly displaying `LLamaToken` - Removed all tokenisation code in `SafeLLamaContextHandle` - just pass it all through to the `SafeLlamaModelHandle` - Improved `SafeLlamaModelHandle` tokenisation: - Renting an array, for one less allocation - Not using `&tokens[0]` to take a pointer to an array, this is redundant and doesn't work on empty arrays	1 year ago
Martin Evans	98635a0d5a	Fixed decoding of large tokens (over 16 bytes) in streaming text decoder	1 year ago
Martin Evans	402a110a3a	Merge pull request #404 from martindevans/switched_to_LLamaToken_struct LLamaToken Struct	1 year ago
Martin Evans	1e69e265b6	Moved some native methods to do with creating/destroying resources into their respective handles. There is no safe way to call most of these methods, everything must be done through through handles.	1 year ago
Martin Evans	42be9b136d	Switched form using raw integers, to a `LLamaToken` struct	1 year ago
Martin Evans	4e5e994dda	- directly returning a SafeLlamaModelHandle, instead of an IntPtr which is wrapped in a handle. - made `llama_backend_init` private. This is automatically called, there is no way it can correctly be used externally. - made `llama_token_to_piece` safe (Span instead of pointer)	1 year ago
Martin Evans	c002642268	- Removed some `unsafe` where it wasn't necessary - Wrapped some native functions which take (pointer, length) in function which take a `span` instead.	1 year ago
Martin Evans	f860f88c36	Code cleanup driven by R# suggestions: - Made `NativeApi` into a `static class` (it's not intended to be instantiated) - Moved `LLamaTokenType` enum out into a separate file - Made `LLamaSeqId` and `LLamaPos` into `record struct`, convenient to have equality etc	1 year ago
Martin Evans	6be3f62321	Fixed loading of very large metadata values (over 1kb)	1 year ago
Martin Evans	fb606c2488	Fixed incorrect values	1 year ago
Martin Evans	47e4fcef2a	Fixed GetString on netstandard2	1 year ago
Martin Evans	2a1e1b6183	Removed unused imports	1 year ago
Martin Evans	a2bae178fa	Added a `Metadata` property to `LLamaWeights`	1 year ago
Martin Evans	a03fe003de	Fixed decoding of text "accumulating" over time (never properly clearing buffer)	2 years ago
Martin Evans	51d4411a58	Added two new classes for detokenization tasks: - `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected. - `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens. Added tests for these classes and updated StatelessExecutor to use them. Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).	2 years ago
Martin Evans	efdf3d630c	- Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens). - Built a new (hacky) `Detokenize` method which handles this	2 years ago
Martin Evans	1d0620e634	Created a test that "roundtrips" strings through tokenization. This reveals some flaws with certain characters	2 years ago
Martin Evans	9daf586ba8	Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc)	2 years ago
Martin Evans	1f8c94e386	Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538 )	2 years ago
Martin Evans	2a38808bca	- Added threads to context params, replaced all thread args with `uint?` - Replaced all binaries	2 years ago
Martin Evans	bca55eace0	Initial changes to match the llama.cpp changes	2 years ago
Martin Evans	614ba40948	- Added a `TokensEndsWithAnyString` extension to `IReadOnlyList<int>` which efficiently checks if a set of tokens ends with one of a set of strings. - Minimal amount of characters converted - Allocation free - Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span - Allocation free	2 years ago
Martin Evans	2022b82947	Added binaries generated by this action: https://github.com/SciSharp/LLamaSharp/actions/runs/6002797872/job/16279896150 Based on this version: `6b73ef1201`	2 years ago
Martin Evans	31287b5e6e	Rewritten TokenToSpan/TokenToString to better fit the new way it's done in llama.cpp with a few different options: - Just convert it to a `string`, nice and simple - Write the bytes to a `Span<byte>` no allocations - Write the chars to a `StringBuilder` potentially no allocations	2 years ago
Martin Evans	0c98ae1955	Passing ctx to `llama_token_nl(_ctx)`	2 years ago
Martin Evans	2056078aef	Initial changes required for GGUF support	2 years ago
Martin Evans	a911b77dec	Various minor changes, resolving about 100 ReSharper code quality warnings	2 years ago
Martin Evans	2830e5755c	- Applied a lot of minor R# code quality suggestions. Lots of unnecessary imports removed. - Deleted `NativeInfo` (internal class, not used anywhere)	2 years ago
Martin Evans	479ff57853	Renamed `EmbeddingCount` to `EmbeddingSize`	2 years ago
Martin Evans	d0a7a8fcd6	- Cleaned up disposal in LLamaContext - sealed some classes not intended to be extended	2 years ago
Martin Evans	f3511e390f	WIP demonstrating changes to support multi-context. You can see this in use in `TalkToYourself`, along with notes on what still needs improving. The biggest single change is renaming `LLamaModel` to `LLamaContext`	2 years ago
Martin Evans	2d811b2603	- Moved `GetLogits` into `SafeLLamaContextHandle` - Added disposal check into `SafeLLamaContextHandle`	2 years ago
Martin Evans	6985d3ab60	Added comments on two properties	2 years ago
Martin Evans	c974c8429e	Removed leftover `using`	2 years ago
Martin Evans	afb9d24f3a	Added model `Tokenize` method	2 years ago
Martin Evans	369c915afe	Added TokenToString conversion on model handle	2 years ago
Martin Evans	b721072aa5	Exposed some extra model properties on safe handle	2 years ago
Martin Evans	44b1e93609	Moved LoRA loading into `SafeLlamaModelHandle`	2 years ago
Martin Evans	c95b14d8b3	- Fixed null check - Additional comments	2 years ago
Martin Evans	f16aa58e12	Updated to use the new loading system in llama (llama_state). This new system has split model weights and contexts into two separate things, allowing one set of weights to be shared between many contexts. This change _only_ implements the low level API and makes no effort to update the LlamaSharp higher level abstraction. It is built upon llama `b3f138d`, necessary DLLs are not included in this commit.	2 years ago

47 Commits (ecb359c9e7f78f1b181ebd8ca8031ef6d165dfc7)