* Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`.
- Added all new functions.
- Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs`
- Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here.
- Changed all token properties to return nullable tokens, to handle some models not having some tokens.
- Fixed `DefaultSamplingPipeline` to handle no newline token in some models.
* Moved native methods to more specific locations.
- Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already.
- Checking that GPU layer count is zero if GPU offload is not supported.
- Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.
* Removed exception if `GpuLayerCount > 0` when GPU is not supported.
* - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle`
- Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext`
- Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`
* Added update and defrag methods for KV cache in `SafeLLamaContextHandle`
* Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`
* Passing the sequence ID when saving a single sequence state
* - Added `NativeLogConfig` which allows overriding the llama.cpp log callback
- Delaying binding of this into llama.cpp until after `NativeLibraryConfig` has loaded
* Using the log callback to show loading log messages during loading.
* Registering log callbacks before any calls to llama.cpp except `llama_empty_call`, this is specifically selected to be a method that does nothing and is just there for triggering DLL loading.
* - Removed much of the complexity of logging from `NativeApi.Load`. It always call whatever log callbacks you have registered.
- Removed alternative path for `ILogger` in NativeLibraryConfig, instead it redirects to wrapping it in a delegate.
* Saving a GC handle to keep the log callback alive
* Removed prefix, logger should already do that.
* Buffering up messages until a newline is encountered before passing log message to ILogger.
* - Added trailing `\n` to log messages from loading.
- Using `ThreadLocal<StringBuilder>` to ensure messages from separate threads don't get mixed together.
This commit is originally made by lcarrere in https://github.com/SciSharp/LLamaSharp/issues/180 .
I have confirmed this modification is OK in my windows 11 laptop, add make this commit according require of AsakusaRinne.
- Re-implmented `Rewind` as an extension method using `Modify` internally
- Implemented `ShiftLeft`, which shifts everything over except for some starting tokens. This is the same as the `StatelessExecutor` out-of-context handling.
- Starting batch at epoch 1, this ensures that conversations (starting at zero) are below the current epoch. It also means `0` can always be used as a value guaranteed to be below the current epoch.
- made `llama_backend_init` private. This is automatically called, there is no way it can correctly be used externally.
- made `llama_token_to_piece` safe (Span instead of pointer)
- Made `NativeApi` into a `static class` (it's not intended to be instantiated)
- Moved `LLamaTokenType` enum out into a separate file
- Made `LLamaSeqId` and `LLamaPos` into `record struct`, convenient to have equality etc