- Properly displaying `LLamaToken`
- Removed all tokenisation code in `SafeLLamaContextHandle` - just pass it all through to the `SafeLlamaModelHandle`
- Improved `SafeLlamaModelHandle` tokenisation:
- Renting an array, for one less allocation
- Not using `&tokens[0]` to take a pointer to an array, this is redundant and doesn't work on empty arrays
- made `llama_backend_init` private. This is automatically called, there is no way it can correctly be used externally.
- made `llama_token_to_piece` safe (Span instead of pointer)
- Made `NativeApi` into a `static class` (it's not intended to be instantiated)
- Moved `LLamaTokenType` enum out into a separate file
- Made `LLamaSeqId` and `LLamaPos` into `record struct`, convenient to have equality etc
- `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected.
- `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens.
Added tests for these classes and updated StatelessExecutor to use them.
Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).
- Minimal amount of characters converted
- Allocation free
- Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span
- Allocation free
- Just convert it to a `string`, nice and simple
- Write the bytes to a `Span<byte>` no allocations
- Write the chars to a `StringBuilder` potentially no allocations
This change _only_ implements the low level API and makes no effort to update the LlamaSharp higher level abstraction.
It is built upon llama `b3f138d`, necessary DLLs are **not** included in this commit.