LLamaSharp

Commit Graph

Author	SHA1	Message	Date
ksanchez	46a9d603f4	Add method to get BOS token.	1 year ago
Rinne	495177fd0f	fix: typos.	1 year ago
Martin Evans	c325ac9127	April 2024 Binary Update (#662 ) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state	1 year ago
Rinne	b677cdc6a3	Merge pull request #560 from eublefar/feature/chat-session-state-management Chat session state management	1 year ago
Martin Evans	ad682fbebd	`BatchedExecutor.Create()` method (#613 ) Replaced `BatchedExecutor.Prompt(string)` method with `BatchedExecutor.Create()` method. This improves the API in two ways: - A conversation can be created, without immediately prompting it - Other prompting overloads (e.g. prompt with token list) can be used without duplicating all the overloads onto `BatchedExecutor` Added `BatchSize` property to `LLamaContext`	1 year ago
eublefar	a31391edd7	Polymorphic serialization for executor state and transforms	1 year ago
Martin Evans	a8ba9f05b3	March Binary Update (#565 ) * Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586) * Added abort callback * Added properties to get/set thread count on `LLamaContext` * Fixed LLamaLogLevel numbering	1 year ago
eublefar	e05d5d4e14	Remove resetting state ops and make SessionState.ExecutorState and SessionState.ContextState no nullable	1 year ago
eublefar	b2f7dbb39b	AddPromptAsync method for stateful executors, Chat session initialize from history and process system message methods for pre-processing prompts. Serializing executor state to JSON, to avoid saved states from being updated by reference.	1 year ago
eublefar	35153a77dd	Chat session Get/Load in-memory state operations, reset state ops for stateful executors and context	1 year ago
Martin Evans	8ac1634233	Removed `llama_eval`. It is going to be completely removed in the next version of llama.cpp (#553 )	1 year ago
Martin Evans	91a7967869	`ReadOnlySpan<float>` in ISamplingPipeline (#538 ) * - Modified ISamplingPipeline to accept `ReadOnlySpan<float>` of logits directly. This moves responsibility to copy the logits into the pipeline. - Added a flag to `BaseSamplingPipeline` indicating if a logit copy is necessary. Skipping it in most cases. * Fixed `RestoreProtectedTokens` not working if logit processing is skipped * - Implemented a new greedy sampling pipeline (always sample most likely token) - Moved `Grammar` into `BaseSamplingPipeline` - Removed "protected tokens" concept from `BaseSamplingPipeline`. Was introducing a lot of incidental complexity. - Implemented newline logit save/restore in `DefaultSamplingPipeline` (only place protected tokens was used) * Implemented pipelines for mirostat v1 and v2	1 year ago
Martin Evans	b0acecf080	Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix). Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state. Added two new examples, demonstrating forking and rewinding.	1 year ago
Martin Evans	15a98b36d8	Updated everything to work with llama.cpp `ce32060198`	1 year ago
Martin Evans	5da2a2f64b	- Removed one of the constructors of `SafeLLamaHandleBase`, which implicitly states that memory is owned. Better to be explicit about this kind of thing! - Also fixed `ToString()` in `SafeLLamaHandleBase`	1 year ago
Martin Evans	a2e29d393c	Swapped `StatelessExecutor` to use `llama_decode`! - Added `logits_i` argument to `Context.ApplyPenalty` - Added a new exception type for `llama_decode` return code	1 year ago
Martin Evans	99969e538e	- Removed some unused `eval` methods. - Added a `DecodeAsync` overload which runs the work in a task - Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents. - Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.	1 year ago
Martin Evans	2eb52b1630	made casts to/from int explicit, fixed places affected	1 year ago
Martin Evans	42be9b136d	Switched form using raw integers, to a `LLamaToken` struct	1 year ago
Martin Evans	f860f88c36	Code cleanup driven by R# suggestions: - Made `NativeApi` into a `static class` (it's not intended to be instantiated) - Moved `LLamaTokenType` enum out into a separate file - Made `LLamaSeqId` and `LLamaPos` into `record struct`, convenient to have equality etc	1 year ago
Martin Evans	dc8e5d88f7	Update LLama/LLamaContext.cs	1 year ago
Martin Evans	2df3e7617e	Added a method to set the RNG seed on the context	1 year ago
Martin Evans	b34f72a883	- Added `SamplingPipeline` to inference params which overrides all other options with an entirely custom pipeline. - Added a `Sample` method to `LLamaContext` which uses a custom pipeline - Modified all executors to use the custom pipeline if it exists	1 year ago
Martin Evans	16ab33ba3c	Added Obsolete markings to all `Eval` overloads	2 years ago
Martin Evans	d743516070	- Added support for the MinP sampler - Cleaned up comments in implementations of `IInferenceParams` - Removed default values for all parameters in `LLamaContext.Sample` - they're never used and probably _shouldn't_ ever be used	2 years ago
Martin Evans	dcc82e582e	Fixed `Eval` on platforms < dotnet 5	2 years ago
Martin Evans	e81b3023d5	Rewritten sampling API to be accessed through the `LLamaTokenDataArray` object	2 years ago
Martin Evans	a024d2242e	It works! had to update binary to `b1426`	2 years ago
Martin Evans	36c71abcfb	Fixed `LLama.StreamingTokenDecoderLLamaLLama.StreamingTokenDecoderLLamaLLama.StreamingTokenDecoderLLama` spam in all executors except Stateless.	2 years ago
Martin Evans	51d4411a58	Added two new classes for detokenization tasks: - `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected. - `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens. Added tests for these classes and updated StatelessExecutor to use them. Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).	2 years ago
Martin Evans	efdf3d630c	- Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens). - Built a new (hacky) `Detokenize` method which handles this	2 years ago
Martin Evans	9daf586ba8	Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc)	2 years ago
Martin Evans	d8434ea9d6	Merge pull request #185 from martindevans/wip_major_api_change Major llama.cpp API Change	2 years ago
Martin Evans	1f8c94e386	Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538 )	2 years ago
Martin Evans	669ae47ef7	- Split parameters into two interfaces - params contains a list of loras, instead of just one	2 years ago
Martin Evans	9a0a0ae9fe	Removed cloning support	2 years ago
Martin Evans	0d40338692	Fixed out-of-context handling in stateless executor	2 years ago
Martin Evans	ce1fc51163	Added some more native methods	2 years ago
Martin Evans	bca55eace0	Initial changes to match the llama.cpp changes	2 years ago
Martin Evans	08f1615e60	- Converted LLamaStatelessExecutor to run `Exec` calls inside an awaited task. This unblocks async callers while the model is being evaluated. - Added a "spinner" to the `StatelessModeExecute` demo, which spins while waiting for the next token (demonstrating that it's not blocked).	2 years ago
redthing1	b78044347c	fix opaque GetState (fixes #176 )	2 years ago
Martin Evans	466722dcff	Merge pull request #165 from martindevans/better_instruct_antiprompt_checking better_instruct_antiprompt_checking	2 years ago
Martin Evans	d08a125020	Using the `TokensEndsWithAnyString` extensions for antiprompt checking in instruct executor. Simpler and more efficient.	2 years ago
Martin Evans	bba801f4b7	Added a property to get the KV cache size from a context	2 years ago
Martin Evans	4dac142bd5	Merge pull request #160 from martindevans/GetState_fix `GetState()` fix	2 years ago
Martin Evans	832bf7dbe0	Simplified implementation of `GetState` and fixed a memory leak (`bigMemory` was never freed)	2 years ago
Martin Evans	4f7b6ffdcc	Removed `GenerateResult` method that was only used in one place	2 years ago
sa_ddam213	949b0cde16	Replace ILLamaLogger for ILogger	2 years ago
Martin Evans	31287b5e6e	Rewritten TokenToSpan/TokenToString to better fit the new way it's done in llama.cpp with a few different options: - Just convert it to a `string`, nice and simple - Write the bytes to a `Span<byte>` no allocations - Write the chars to a `StringBuilder` potentially no allocations	2 years ago
Martin Evans	0c98ae1955	Passing ctx to `llama_token_nl(_ctx)`	2 years ago

1 2

68 Commits (6f9097f25bdb9726335a2aecf1f087cc6e2a4990)