* Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`.
- Added all new functions.
- Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs`
- Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here.
- Changed all token properties to return nullable tokens, to handle some models not having some tokens.
- Fixed `DefaultSamplingPipeline` to handle no newline token in some models.
* Moved native methods to more specific locations.
- Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already.
- Checking that GPU layer count is zero if GPU offload is not supported.
- Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.
* Removed exception if `GpuLayerCount > 0` when GPU is not supported.
* - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle`
- Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext`
- Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`
* Added update and defrag methods for KV cache in `SafeLLamaContextHandle`
* Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`
* Passing the sequence ID when saving a single sequence state
Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.
* - Modified ISamplingPipeline to accept `ReadOnlySpan<float>` of logits directly. This moves responsibility to copy the logits into the pipeline.
- Added a flag to `BaseSamplingPipeline` indicating if a logit copy is necessary. Skipping it in most cases.
* Fixed `RestoreProtectedTokens` not working if logit processing is skipped
* - Implemented a new greedy sampling pipeline (always sample most likely token)
- Moved `Grammar` into `BaseSamplingPipeline`
- Removed "protected tokens" concept from `BaseSamplingPipeline`. Was introducing a lot of incidental complexity.
- Implemented newline logit save/restore in `DefaultSamplingPipeline` (only place protected tokens was used)
* Implemented pipelines for mirostat v1 and v2
Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.
Added two new examples, demonstrating forking and rewinding.
- Added BaseSamplingPipeline which provides a base impl of `ISamplingPipeline`
- Added `DefaultSamplingPipeline` which mimics normal llama.cpp sampling