Namespace: LLama.Batched
A batched executor that can infer multiple separate "conversations" simultaneously.
public sealed class BatchedExecutor : System.IDisposable
Inheritance Object → BatchedExecutor
Implements IDisposable
The LLamaContext this executor is using
public LLamaContext Context { get; }
The LLamaWeights this executor is using
public LLamaWeights Model { get; }
Get the number of tokens in the batch, waiting for BatchedExecutor.Infer(CancellationToken) to be called
public int BatchedTokenCount { get; }
Check if this executor has been disposed.
public bool IsDisposed { get; private set; }
Create a new batched executor
public BatchedExecutor(LLamaWeights model, IContextParams contextParams)
model LLamaWeights
The model to use
contextParams IContextParams
Parameters to create a new context
Use BatchedExecutor.Create instead
Start a new Conversation with the given prompt
public Conversation Prompt(string prompt)
prompt String
Start a new Conversation
public Conversation Create()
Run inference for all conversations in the batch which have pending tokens.
If the result is NoKvSlot then there is not enough memory for inference, try disposing some conversation
threads and running inference again.
public Task<DecodeResult> Infer(CancellationToken cancellation)
cancellation CancellationToken
public void Dispose()
internal LLamaSeqId GetNextSequenceId()