9 Commits (ec8f83236545a1989df2f75da4e1d8d0345b0407)

Author SHA1 Message Date
  Martin Evans e2705be6c8
Fixed off by one error in LLamaBatch sampling position (#626) 1 year ago
  Martin Evans 91d72e7465
Keeping track of positions where logits will be generated in a batch and what sequence those logits are associated with. (#624) 1 year ago
  Martin Evans f0b0bbcbb7
Mutable Logits (#586) 1 year ago
  Martin Evans b0acecf080 Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix). 1 year ago
  Martin Evans 92b9bbe779 Added methods to `SafeLLamaContextHandle` for KV cache manipulation 1 year ago
  Martin Evans 9fe878ae1f - Fixed example 1 year ago
  Martin Evans 9ede1bedc2 Automatically growing batch n_seq_max when exceeded. This means no parameters need to be picked when the batch is created. 1 year ago
  Martin Evans 99969e538e - Removed some unused `eval` methods. 1 year ago
  Martin Evans 36a9335588 Removed `LLamaBatchSafeHandle` (using unmanaged memory, created by llama.cpp) and replaced it with a fully managed `LLamaBatch`. Modified the `BatchedDecoding` example to use new managed batch. 1 year ago