LLamaSharp

Commit Graph

Author	SHA1	Message	Date
Rinne	495177fd0f	fix: typos.	1 year ago
Martin Evans	c325ac9127	April 2024 Binary Update (#662 ) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state	1 year ago
Martin Evans	a8ba9f05b3	March Binary Update (#565 ) * Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586) * Added abort callback * Added properties to get/set thread count on `LLamaContext` * Fixed LLamaLogLevel numbering	1 year ago
Martin Evans	15a98b36d8	Updated everything to work with llama.cpp `ce32060198`	1 year ago
Martin Evans	9b995510d6	Removed all setters in `IModelParams` and `IContextParams`, allowing implementations to be immutable.	1 year ago
Martin Evans	402a110a3a	Merge pull request #404 from martindevans/switched_to_LLamaToken_struct LLamaToken Struct	1 year ago
Steven Kennedy	cf2e9e35f8	Updating the GpuLayerCount to mirror the Python Port of Llama.cpp	1 year ago
Martin Evans	42be9b136d	Switched form using raw integers, to a `LLamaToken` struct	1 year ago
Martin Evans	48ef3bb080	Added runtime checks that UseMemoryLock and UseMemorymap are actually supported.	1 year ago
Martin Evans	f860f88c36	Code cleanup driven by R# suggestions: - Made `NativeApi` into a `static class` (it's not intended to be instantiated) - Moved `LLamaTokenType` enum out into a separate file - Made `LLamaSeqId` and `LLamaPos` into `record struct`, convenient to have equality etc	1 year ago
Martin Evans	3fc0f34cbe	Fixed some issues which were causing metadata overrides not to work (mostly importantly, converting the key was failing so all keys were null bytes and thus ignored).	1 year ago
Martin Evans	47e4fcef2a	Fixed GetString on netstandard2	1 year ago
Martin Evans	2f0deeadcd	Implemented serialization for `MetadataOverride`. Deserialization is broken (converter is never called)	1 year ago
Martin Evans	b868b056f7	Added metadata overrides to `IModelParams`	1 year ago
Martin Evans	b22d8b7495	- Added `GroupDisposable` to dispose a collection of items all together - Renamed `LLamaModelKvOverride` to `LLamaModelMetadataOverride`	1 year ago
Martin Evans	439d14a061	Updated binaries: - build run: https://github.com/SciSharp/LLamaSharp/actions/runs/7196891440 - commit: `9fb13f9584`	1 year ago
Martin Evans	89fef05362	This commit (`5fe721bdbe`) accidentally removed a load of stuff that it shouldn't. Fixed that. Originally from these PRs: - https://github.com/SciSharp/LLamaSharp/pull/263 - https://github.com/SciSharp/LLamaSharp/pull/259	2 years ago
SignalRT	97006a214f	Merge remote-tracking branch 'upstream/master' into RuntimeDetection	2 years ago
Martin Evans	31244ae691	Merge branch 'master' into YaRN_scaling_parameters	2 years ago
SignalRT	5fe721bdbe	Revert "Merge branch 'pr/268' into RuntimeDetection" This reverts commit 091b8d58b3502a99b3bfbec9db457c92cc736beb, reversing changes made to `9b2ca9cf8e`.	2 years ago
Martin Evans	db1bc741b0	Modified `ContextSize` in parameters to be nullable. A null value means autodetect from the model.	2 years ago
Udayshankar Ravikumar	4071c1f5fc	Updated preprocessor directives	2 years ago
Udayshankar Ravikumar	df310e15da	Fixed preprocessor directives	2 years ago
Martin Evans	04ee64a6be	Exposed YaRN scaling parameters in IContextParams	2 years ago
Udayshankar Ravikumar	1dad1ff834	Enhance framework compatibility	2 years ago
Martin Evans	529b06b35b	- Fixed rope frequency/base to use the values in the model by default, instead of always overriding them by default!	2 years ago
Martin Evans	321d0b58c4	Merge pull request #202 from martindevans/multi_gpu Multi GPU	2 years ago
Martin Evans	51d4411a58	Added two new classes for detokenization tasks: - `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected. - `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens. Added tests for these classes and updated StatelessExecutor to use them. Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).	2 years ago
Martin Evans	6a4cd506bd	Added a safe `TensorSplitsCollection` to the params which prevents incorrectly setting the `tensor_splits` collection	2 years ago
Martin Evans	15db194c17	Added multi GPU support	2 years ago
Martin Evans	9daf586ba8	Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc)	2 years ago
Martin Evans	d8434ea9d6	Merge pull request #185 from martindevans/wip_major_api_change Major llama.cpp API Change	2 years ago
Martin Evans	b8f0eff080	- Added `GetCharCountImpl` tests, fixed handling of empty strings - Added ifdef to remove `Deconstruct` extension on everything except `NETSTANDARD2_0`	2 years ago
Martin Evans	2a38808bca	- Added threads to context params, replaced all thread args with `uint?` - Replaced all binaries	2 years ago
Martin Evans	4e9b1f8cdc	- Split extension methods into separate files	2 years ago
Martin Evans	669ae47ef7	- Split parameters into two interfaces - params contains a list of loras, instead of just one	2 years ago
Martin Evans	bca55eace0	Initial changes to match the llama.cpp changes	2 years ago
Martin Evans	08f1615e60	- Converted LLamaStatelessExecutor to run `Exec` calls inside an awaited task. This unblocks async callers while the model is being evaluated. - Added a "spinner" to the `StatelessModeExecute` demo, which spins while waiting for the next token (demonstrating that it's not blocked).	2 years ago
Martin Evans	fe54f6764f	- Added unit tests for extension methods - Removed unused `AddRangeSpan` extension	2 years ago
Martin Evans	d08a125020	Using the `TokensEndsWithAnyString` extensions for antiprompt checking in instruct executor. Simpler and more efficient.	2 years ago
Martin Evans	77bd090150	Simplified `LLamaInteractExecutor` antiprompt matching by using new extension method	2 years ago
Martin Evans	614ba40948	- Added a `TokensEndsWithAnyString` extension to `IReadOnlyList<int>` which efficiently checks if a set of tokens ends with one of a set of strings. - Minimal amount of characters converted - Allocation free - Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span - Allocation free	2 years ago
Rinne	4e83e48ad1	Merge pull request #122 from martindevans/gguf Add GGUF support	2 years ago
Martin Evans	a70c7170dd	- Created a higher level `Grammar` class which is immutable and contains a list of grammar rules. This is the main "entry point" to the grammar system. - Made all the mechanics of grammar parsing (GBNFGrammarParser, ParseState) internal. Just call `Grammar.Parse("whatever")`. - Added a `GrammarRule` class which validates elements on construction (this allows constructing grammar without parsing GBNF). - It should be impossible for a `GrammarRule` to represent an invalid rule.	2 years ago
Martin Evans	2056078aef	Initial changes required for GGUF support	2 years ago
Martin Evans	cf4754db44	Removed unnecessary parameters from some low level sampler methods	2 years ago
Martin Evans	4738c26299	- Reduced context size of test, to speed it up - Removed some unnecessary `ToArray` calls - Initial pass on LLamaStatelessExecutor, the context overflow management is broken but I think I found where it's ported from	2 years ago
Martin Evans	91bcefc852	comment on IModelParamsExtensions	2 years ago
Martin Evans	9cdc72aa67	Fixed `ToLlamaContextParams` using the wrong parameter for `use_mmap`	2 years ago
sa_ddam213	2d1269cae9	Access to IModelParamsExtensions	2 years ago

1 2

53 Commits (master)