Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.
Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.
Added two new examples, demonstrating forking and rewinding.
- Added a `DecodeAsync` overload which runs the work in a task
- Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents.
- Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.