LLamaSharp

333 MB

Tree: ec8f832365

Author	SHA1	Message	Date
Martin Evans	e2705be6c8	Fixed off by one error in LLamaBatch sampling position (#626 )	1 year ago
Martin Evans	91d72e7465	Keeping track of positions where logits will be generated in a batch and what sequence those logits are associated with. (#624 )	1 year ago
Martin Evans	f0b0bbcbb7	Mutable Logits (#586 ) Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.	1 year ago
Martin Evans	b0acecf080	Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix). Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state. Added two new examples, demonstrating forking and rewinding.	1 year ago
Martin Evans	92b9bbe779	Added methods to `SafeLLamaContextHandle` for KV cache manipulation	1 year ago
Martin Evans	9fe878ae1f	- Fixed example - Growing more than double, if necessary	1 year ago
Martin Evans	9ede1bedc2	Automatically growing batch n_seq_max when exceeded. This means no parameters need to be picked when the batch is created.	1 year ago
Martin Evans	99969e538e	- Removed some unused `eval` methods. - Added a `DecodeAsync` overload which runs the work in a task - Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents. - Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.	1 year ago
Martin Evans	36a9335588	Removed `LLamaBatchSafeHandle` (using unmanaged memory, created by llama.cpp) and replaced it with a fully managed `LLamaBatch`. Modified the `BatchedDecoding` example to use new managed batch.	1 year ago

9 Commits (ec8f83236545a1989df2f75da4e1d8d0345b0407)