# NativeApi
Namespace: LLama.Native
Direct translation of the llama.cpp API
```csharp
public static class NativeApi
```
Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeApi](./llama.native.nativeapi.md)
## Methods
### **llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Int32, Single&)**
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
```csharp
public static LLamaToken llama_sample_token_mirostat(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, int m, Single& mu)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
#### Returns
[LLamaToken](./llama.native.llamatoken.md)
### **llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single&)**
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
```csharp
public static LLamaToken llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, Single& mu)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
#### Returns
[LLamaToken](./llama.native.llamatoken.md)
### **llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
Selects the token with the highest probability.
```csharp
public static LLamaToken llama_sample_token_greedy(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
#### Returns
[LLamaToken](./llama.native.llamatoken.md)
### **llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
Randomly selects a token from the candidates based on their probabilities.
```csharp
public static LLamaToken llama_sample_token(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
#### Returns
[LLamaToken](./llama.native.llamatoken.md)
### **<llama_get_embeddings>g__llama_get_embeddings_native|30_0(SafeLLamaContextHandle)**
```csharp
internal static Single* g__llama_get_embeddings_native|30_0(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)
### **<llama_token_to_piece>g__llama_token_to_piece_native|44_0(SafeLlamaModelHandle, LLamaToken, Byte*, Int32)**
```csharp
internal static int g__llama_token_to_piece_native|44_0(SafeLlamaModelHandle model, LLamaToken llamaToken, Byte* buffer, int length)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
`llamaToken` [LLamaToken](./llama.native.llamatoken.md)
`buffer` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)
`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **<TryLoadLibraries>g__TryLoad|84_0(String)**
```csharp
internal static IntPtr g__TryLoad|84_0(string path)
```
#### Parameters
`path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
#### Returns
[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
### **<TryLoadLibraries>g__TryFindPath|84_1(String, <>c__DisplayClass84_0&)**
```csharp
internal static string g__TryFindPath|84_1(string filename, <>c__DisplayClass84_0& )
```
#### Parameters
`filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
`` [<>c__DisplayClass84_0&](./llama.native.nativeapi.<>c__displayclass84_0&.md)
#### Returns
[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
### **llama_set_n_threads(SafeLLamaContextHandle, UInt32, UInt32)**
Set the number of threads used for decoding
```csharp
public static void llama_set_n_threads(SafeLLamaContextHandle ctx, uint n_threads, uint n_threads_batch)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`n_threads` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)
n_threads is the number of threads used for generation (single token)
`n_threads_batch` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)
n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
### **llama_vocab_type(SafeLlamaModelHandle)**
```csharp
public static LLamaVocabType llama_vocab_type(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[LLamaVocabType](./llama.native.llamavocabtype.md)
### **llama_rope_type(SafeLlamaModelHandle)**
```csharp
public static LLamaRopeType llama_rope_type(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[LLamaRopeType](./llama.native.llamaropetype.md)
### **llama_grammar_init(LLamaGrammarElement**, UInt64, UInt64)**
Create a new grammar from the given set of grammar rules
```csharp
public static IntPtr llama_grammar_init(LLamaGrammarElement** rules, ulong n_rules, ulong start_rule_index)
```
#### Parameters
`rules` [LLamaGrammarElement**](./llama.native.llamagrammarelement**.md)
`n_rules` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
#### Returns
[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
### **llama_grammar_free(IntPtr)**
Free all memory from the given SafeLLamaGrammarHandle
```csharp
public static void llama_grammar_free(IntPtr grammar)
```
#### Parameters
`grammar` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
### **llama_grammar_copy(SafeLLamaGrammarHandle)**
Create a copy of an existing grammar instance
```csharp
public static IntPtr llama_grammar_copy(SafeLLamaGrammarHandle grammar)
```
#### Parameters
`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)
#### Returns
[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
### **llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, SafeLLamaGrammarHandle)**
Apply constraints from grammar
```csharp
public static void llama_sample_grammar(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, SafeLLamaGrammarHandle grammar)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)
### **llama_grammar_accept_token(SafeLLamaContextHandle, SafeLLamaGrammarHandle, LLamaToken)**
Accepts the sampled token into the grammar
```csharp
public static void llama_grammar_accept_token(SafeLLamaContextHandle ctx, SafeLLamaGrammarHandle grammar, LLamaToken token)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)
`token` [LLamaToken](./llama.native.llamatoken.md)
### **llava_validate_embed_size(SafeLLamaContextHandle, SafeLlavaModelHandle)**
Sanity check for clip <-> llava embed size match
```csharp
public static bool llava_validate_embed_size(SafeLLamaContextHandle ctxLlama, SafeLlavaModelHandle ctxClip)
```
#### Parameters
`ctxLlama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
LLama Context
`ctxClip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)
Llava Model
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
True if validate successfully
### **llava_image_embed_make_with_bytes(SafeLlavaModelHandle, Int32, Byte[], Int32)**
Build an image embed from image file bytes
```csharp
public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_bytes(SafeLlavaModelHandle ctx_clip, int n_threads, Byte[] image_bytes, int image_bytes_length)
```
#### Parameters
`ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)
SafeHandle to the Clip Model
`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Number of threads
`image_bytes` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)
Binary image in jpeg format
`image_bytes_length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Bytes lenght of the image
#### Returns
[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)
SafeHandle to the Embeddings
### **llava_image_embed_make_with_filename(SafeLlavaModelHandle, Int32, String)**
Build an image embed from a path to an image filename
```csharp
public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_filename(SafeLlavaModelHandle ctx_clip, int n_threads, string image_path)
```
#### Parameters
`ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)
SafeHandle to the Clip Model
`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Number of threads
`image_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
Image filename (jpeg) to generate embeddings
#### Returns
[SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)
SafeHandel to the embeddings
### **llava_image_embed_free(IntPtr)**
Free an embedding made with llava_image_embed_make_*
```csharp
public static void llava_image_embed_free(IntPtr embed)
```
#### Parameters
`embed` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
Embeddings to release
### **llava_eval_image_embed(SafeLLamaContextHandle, SafeLlavaImageEmbedHandle, Int32, Int32&)**
Write the image represented by embed into the llama context with batch size n_batch, starting at context
pos n_past. on completion, n_past points to the next position in the context after the image embed.
```csharp
public static bool llava_eval_image_embed(SafeLLamaContextHandle ctx_llama, SafeLlavaImageEmbedHandle embed, int n_batch, Int32& n_past)
```
#### Parameters
`ctx_llama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
Llama Context
`embed` [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)
Embedding handle
`n_batch` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
`n_past` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
True on success
### **llama_model_quantize(String, String, LLamaModelQuantizeParams*)**
Returns 0 on success
```csharp
public static uint llama_model_quantize(string fname_inp, string fname_out, LLamaModelQuantizeParams* param)
```
#### Parameters
`fname_inp` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
`fname_out` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
`param` [LLamaModelQuantizeParams*](./llama.native.llamamodelquantizeparams*.md)
#### Returns
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)
Returns 0 on success
### **llama_sample_repetition_penalties(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, LLamaToken*, UInt64, Single, Single, Single)**
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
```csharp
public static void llama_sample_repetition_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, LLamaToken* last_tokens, ulong last_tokens_size, float penalty_repeat, float penalty_freq, float penalty_present)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
`last_tokens` [LLamaToken*](./llama.native.llamatoken*.md)
`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
`penalty_repeat` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
`penalty_freq` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
`penalty_present` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
### **llama_sample_apply_guidance(SafeLLamaContextHandle, Span<Single>, ReadOnlySpan<Single>, Single)**
Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
```csharp
public static void llama_sample_apply_guidance(SafeLLamaContextHandle ctx, Span logits, ReadOnlySpan logits_guidance, float scale)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`logits` [Span<Single>](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)
Logits extracted from the original generation context.
`logits_guidance` [ReadOnlySpan<Single>](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)
Logits extracted from a separate context from the same model.
Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
### **llama_sample_apply_guidance(SafeLLamaContextHandle, Single*, Single*, Single)**
Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
```csharp
public static void llama_sample_apply_guidance(SafeLLamaContextHandle ctx, Single* logits, Single* logits_guidance, float scale)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`logits` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)
Logits extracted from the original generation context.
`logits_guidance` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)
Logits extracted from a separate context from the same model.
Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
### **llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
```csharp
public static void llama_sample_softmax(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
### **llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Int32, UInt64)**
Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
```csharp
public static void llama_sample_top_k(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, int k, ulong min_keep)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
### **llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
```csharp
public static void llama_sample_top_p(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
### **llama_sample_min_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
```csharp
public static void llama_sample_min_p(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
### **llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
```csharp
public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float z, ulong min_keep)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
`z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
```csharp
public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single)**
Dynamic temperature implementation described in the paper https://arxiv.org/abs/2309.02772.
```csharp
public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float min_temp, float max_temp, float exponent_val)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
Pointer to LLamaTokenDataArray
`min_temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
`max_temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
`exponent_val` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
### **llama_sample_temp(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single)**
Modify logits by temperature
```csharp
public static void llama_sample_temp(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float temp)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)
`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
### **llama_get_embeddings(SafeLLamaContextHandle)**
Get the embeddings for the input
```csharp
public static Span llama_get_embeddings(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[Span<Single>](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)
### **llama_chat_apply_template(SafeLlamaModelHandle, Char*, LLamaChatMessage*, IntPtr, Boolean, Char*, Int32)**
Apply chat template. Inspired by hf apply_chat_template() on python.
Both "model" and "custom_template" are optional, but at least one is required. "custom_template" has higher precedence than "model"
NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
```csharp
public static int llama_chat_apply_template(SafeLlamaModelHandle model, Char* tmpl, LLamaChatMessage* chat, IntPtr n_msg, bool add_ass, Char* buf, int length)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
`tmpl` [Char*](https://docs.microsoft.com/en-us/dotnet/api/system.char*)
A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
`chat` [LLamaChatMessage*](./llama.native.llamachatmessage*.md)
Pointer to a list of multiple llama_chat_message
`n_msg` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
Number of llama_chat_message in this chat
`add_ass` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
Whether to end the prompt with the token(s) that indicate the start of an assistant message.
`buf` [Char*](https://docs.microsoft.com/en-us/dotnet/api/system.char*)
A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)
`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
The size of the allocated buffer
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
### **llama_token_bos(SafeLlamaModelHandle)**
Get the "Beginning of sentence" token
```csharp
public static LLamaToken llama_token_bos(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[LLamaToken](./llama.native.llamatoken.md)
### **llama_token_eos(SafeLlamaModelHandle)**
Get the "End of sentence" token
```csharp
public static LLamaToken llama_token_eos(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[LLamaToken](./llama.native.llamatoken.md)
### **llama_token_nl(SafeLlamaModelHandle)**
Get the "new line" token
```csharp
public static LLamaToken llama_token_nl(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[LLamaToken](./llama.native.llamatoken.md)
### **llama_add_bos_token(SafeLlamaModelHandle)**
Returns -1 if unknown, 1 for true or 0 for false.
```csharp
public static int llama_add_bos_token(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_add_eos_token(SafeLlamaModelHandle)**
Returns -1 if unknown, 1 for true or 0 for false.
```csharp
public static int llama_add_eos_token(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_token_prefix(SafeLlamaModelHandle)**
codellama infill tokens, Beginning of infill prefix
```csharp
public static int llama_token_prefix(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_token_middle(SafeLlamaModelHandle)**
codellama infill tokens, Beginning of infill middle
```csharp
public static int llama_token_middle(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_token_suffix(SafeLlamaModelHandle)**
codellama infill tokens, Beginning of infill suffix
```csharp
public static int llama_token_suffix(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_token_eot(SafeLlamaModelHandle)**
codellama infill tokens, End of infill middle
```csharp
public static int llama_token_eot(SafeLlamaModelHandle model)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_print_timings(SafeLLamaContextHandle)**
Print out timing information for this context
```csharp
public static void llama_print_timings(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
### **llama_reset_timings(SafeLLamaContextHandle)**
Reset all collected timing information for this context
```csharp
public static void llama_reset_timings(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
### **llama_print_system_info()**
Print system information
```csharp
public static IntPtr llama_print_system_info()
```
#### Returns
[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
### **llama_token_to_piece(SafeLlamaModelHandle, LLamaToken, Span<Byte>)**
Convert a single token into text
```csharp
public static int llama_token_to_piece(SafeLlamaModelHandle model, LLamaToken llamaToken, Span buffer)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
`llamaToken` [LLamaToken](./llama.native.llamatoken.md)
`buffer` [Span<Byte>](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)
buffer to write string into
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
The length written, or if the buffer is too small a negative that indicates the length required
### **llama_tokenize(SafeLlamaModelHandle, Byte*, Int32, LLamaToken*, Int32, Boolean, Boolean)**
Convert text into tokens
```csharp
public static int llama_tokenize(SafeLlamaModelHandle model, Byte* text, int text_len, LLamaToken* tokens, int n_max_tokens, bool add_bos, bool special)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
`text` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)
`text_len` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
`tokens` [LLamaToken*](./llama.native.llamatoken*.md)
`n_max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
`special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Returns the number of tokens on success, no more than n_max_tokens.
Returns a negative number on failure - the number of tokens that would have been returned
### **llama_log_set(LLamaLogCallback)**
Register a callback to receive llama log messages
```csharp
public static void llama_log_set(LLamaLogCallback logCallback)
```
#### Parameters
`logCallback` [LLamaLogCallback](./llama.native.llamalogcallback.md)
### **llama_kv_cache_clear(SafeLLamaContextHandle)**
Clear the KV cache
```csharp
public static void llama_kv_cache_clear(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
### **llama_kv_cache_seq_rm(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos)**
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
```csharp
public static void llama_kv_cache_seq_rm(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)
`p0` [LLamaPos](./llama.native.llamapos.md)
`p1` [LLamaPos](./llama.native.llamapos.md)
### **llama_kv_cache_seq_cp(SafeLLamaContextHandle, LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)**
Copy all tokens that belong to the specified sequence to another sequence
Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
```csharp
public static void llama_kv_cache_seq_cp(SafeLLamaContextHandle ctx, LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`src` [LLamaSeqId](./llama.native.llamaseqid.md)
`dest` [LLamaSeqId](./llama.native.llamaseqid.md)
`p0` [LLamaPos](./llama.native.llamapos.md)
`p1` [LLamaPos](./llama.native.llamapos.md)
### **llama_kv_cache_seq_keep(SafeLLamaContextHandle, LLamaSeqId)**
Removes all tokens that do not belong to the specified sequence
```csharp
public static void llama_kv_cache_seq_keep(SafeLLamaContextHandle ctx, LLamaSeqId seq)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)
### **llama_kv_cache_seq_add(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)**
Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1)
If the KV cache is RoPEd, the KV data is updated accordingly:
- lazily on next llama_decode()
- explicitly with llama_kv_cache_update()
```csharp
public static void llama_kv_cache_seq_add(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)
`p0` [LLamaPos](./llama.native.llamapos.md)
`p1` [LLamaPos](./llama.native.llamapos.md)
`delta` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_kv_cache_seq_div(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)**
Integer division of the positions by factor of `d > 1`
If the KV cache is RoPEd, the KV data is updated accordingly:
- lazily on next llama_decode()
- explicitly with llama_kv_cache_update()
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)
```csharp
public static void llama_kv_cache_seq_div(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int d)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)
`p0` [LLamaPos](./llama.native.llamapos.md)
`p1` [LLamaPos](./llama.native.llamapos.md)
`d` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_kv_cache_seq_pos_max(SafeLLamaContextHandle, LLamaSeqId)**
Returns the largest position present in the KV cache for the specified sequence
```csharp
public static LLamaPos llama_kv_cache_seq_pos_max(SafeLLamaContextHandle ctx, LLamaSeqId seq)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`seq` [LLamaSeqId](./llama.native.llamaseqid.md)
#### Returns
[LLamaPos](./llama.native.llamapos.md)
### **llama_kv_cache_defrag(SafeLLamaContextHandle)**
Defragment the KV cache. This will be applied:
- lazily on next llama_decode()
- explicitly with llama_kv_cache_update()
```csharp
public static LLamaPos llama_kv_cache_defrag(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[LLamaPos](./llama.native.llamapos.md)
### **llama_kv_cache_update(SafeLLamaContextHandle)**
Apply the KV cache updates (such as K-shifts, defragmentation, etc.)
```csharp
public static void llama_kv_cache_update(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
### **llama_batch_init(Int32, Int32, Int32)**
Allocates a batch of tokens on the heap
Each token can be assigned up to n_seq_max sequence ids
The batch has to be freed with llama_batch_free()
If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float)
Otherwise, llama_batch.token will be allocated to store n_tokens llama_token
The rest of the llama_batch members are allocated with size n_tokens
All members are left uninitialized
```csharp
public static LLamaNativeBatch llama_batch_init(int n_tokens, int embd, int n_seq_max)
```
#### Parameters
`n_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
`embd` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
`n_seq_max` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Each token can be assigned up to n_seq_max sequence ids
#### Returns
[LLamaNativeBatch](./llama.native.llamanativebatch.md)
### **llama_batch_free(LLamaNativeBatch)**
Frees a batch of tokens allocated with llama_batch_init()
```csharp
public static void llama_batch_free(LLamaNativeBatch batch)
```
#### Parameters
`batch` [LLamaNativeBatch](./llama.native.llamanativebatch.md)
### **llama_decode(SafeLLamaContextHandle, LLamaNativeBatch)**
```csharp
public static int llama_decode(SafeLLamaContextHandle ctx, LLamaNativeBatch batch)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`batch` [LLamaNativeBatch](./llama.native.llamanativebatch.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error
### **llama_kv_cache_view_init(SafeLLamaContextHandle, Int32)**
Create an empty KV cache view. (use only for debugging purposes)
```csharp
public static LLamaKvCacheView llama_kv_cache_view_init(SafeLLamaContextHandle ctx, int n_max_seq)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`n_max_seq` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
#### Returns
[LLamaKvCacheView](./llama.native.llamakvcacheview.md)
### **llama_kv_cache_view_free(LLamaKvCacheView&)**
Free a KV cache view. (use only for debugging purposes)
```csharp
public static void llama_kv_cache_view_free(LLamaKvCacheView& view)
```
#### Parameters
`view` [LLamaKvCacheView&](./llama.native.llamakvcacheview&.md)
### **llama_kv_cache_view_update(SafeLLamaContextHandle, LLamaKvCacheView&)**
Update the KV cache view structure with the current state of the KV cache. (use only for debugging purposes)
```csharp
public static void llama_kv_cache_view_update(SafeLLamaContextHandle ctx, LLamaKvCacheView& view)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`view` [LLamaKvCacheView&](./llama.native.llamakvcacheview&.md)
### **llama_get_kv_cache_token_count(SafeLLamaContextHandle)**
Returns the number of tokens in the KV cache (slow, use only for debug)
If a KV cell has multiple sequences assigned to it, it will be counted multiple times
```csharp
public static int llama_get_kv_cache_token_count(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_get_kv_cache_used_cells(SafeLLamaContextHandle)**
Returns the number of used KV cells (i.e. have at least one sequence assigned to them)
```csharp
public static int llama_get_kv_cache_used_cells(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
### **llama_beam_search(SafeLLamaContextHandle, LLamaBeamSearchCallback, IntPtr, UInt64, Int32, Int32, Int32)**
Deterministically returns entire sentence constructed by a beam search.
```csharp
public static void llama_beam_search(SafeLLamaContextHandle ctx, LLamaBeamSearchCallback callback, IntPtr callback_data, ulong n_beams, int n_past, int n_predict, int n_threads)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
Pointer to the llama_context.
`callback` [LLamaBeamSearchCallback](./llama.native.nativeapi.llamabeamsearchcallback.md)
Invoked for each iteration of the beam_search loop, passing in beams_state.
`callback_data` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)
A pointer that is simply passed back to callback.
`n_beams` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
Number of beams to use.
`n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Number of tokens already evaluated.
`n_predict` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Maximum number of tokens to predict. EOS may occur earlier.
`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
Number of threads.
### **llama_empty_call()**
A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
```csharp
public static void llama_empty_call()
```
### **llama_max_devices()**
Get the maximum number of devices supported by llama.cpp
```csharp
public static long llama_max_devices()
```
#### Returns
[Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)
### **llama_model_default_params()**
Create a LLamaModelParams with default values
```csharp
public static LLamaModelParams llama_model_default_params()
```
#### Returns
[LLamaModelParams](./llama.native.llamamodelparams.md)
### **llama_context_default_params()**
Create a LLamaContextParams with default values
```csharp
public static LLamaContextParams llama_context_default_params()
```
#### Returns
[LLamaContextParams](./llama.native.llamacontextparams.md)
### **llama_model_quantize_default_params()**
Create a LLamaModelQuantizeParams with default values
```csharp
public static LLamaModelQuantizeParams llama_model_quantize_default_params()
```
#### Returns
[LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)
### **llama_supports_mmap()**
Check if memory mapping is supported
```csharp
public static bool llama_supports_mmap()
```
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
### **llama_supports_mlock()**
Check if memory locking is supported
```csharp
public static bool llama_supports_mlock()
```
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
### **llama_supports_gpu_offload()**
Check if GPU offload is supported
```csharp
public static bool llama_supports_gpu_offload()
```
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
### **llama_set_rng_seed(SafeLLamaContextHandle, UInt32)**
Sets the current rng seed.
```csharp
public static void llama_set_rng_seed(SafeLLamaContextHandle ctx, uint seed)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)
### **llama_get_state_size(SafeLLamaContextHandle)**
Returns the maximum size in bytes of the state (rng, logits, embedding
and kv_cache) - will often be smaller after compacting tokens
```csharp
public static ulong llama_get_state_size(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
### **llama_copy_state_data(SafeLLamaContextHandle, Byte*)**
Copies the state to the specified destination address.
Destination needs to have allocated enough memory.
```csharp
public static ulong llama_copy_state_data(SafeLLamaContextHandle ctx, Byte* dest)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
the number of bytes copied
### **llama_set_state_data(SafeLLamaContextHandle, Byte*)**
Set the state reading from the specified address
```csharp
public static ulong llama_set_state_data(SafeLLamaContextHandle ctx, Byte* src)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)
#### Returns
[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
the number of bytes read
### **llama_load_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)**
Load session file
```csharp
public static bool llama_load_session_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens_out, ulong n_token_capacity, UInt64& n_token_count_out)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
`tokens_out` [LLamaToken[]](./llama.native.llamatoken.md)
`n_token_capacity` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
`n_token_count_out` [UInt64&](https://docs.microsoft.com/en-us/dotnet/api/system.uint64&)
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
### **llama_save_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)**
Save session file
```csharp
public static bool llama_save_session_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens, ulong n_token_count)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)
`tokens` [LLamaToken[]](./llama.native.llamatoken.md)
`n_token_count` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)
#### Returns
[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)
### **llama_token_get_text(SafeLlamaModelHandle, LLamaToken)**
```csharp
public static Byte* llama_token_get_text(SafeLlamaModelHandle model, LLamaToken token)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
`token` [LLamaToken](./llama.native.llamatoken.md)
#### Returns
[Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)
### **llama_token_get_score(SafeLlamaModelHandle, LLamaToken)**
```csharp
public static float llama_token_get_score(SafeLlamaModelHandle model, LLamaToken token)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
`token` [LLamaToken](./llama.native.llamatoken.md)
#### Returns
[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)
### **llama_token_get_type(SafeLlamaModelHandle, LLamaToken)**
```csharp
public static LLamaTokenType llama_token_get_type(SafeLlamaModelHandle model, LLamaToken token)
```
#### Parameters
`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
`token` [LLamaToken](./llama.native.llamatoken.md)
#### Returns
[LLamaTokenType](./llama.native.llamatokentype.md)
### **llama_n_ctx(SafeLLamaContextHandle)**
Get the size of the context window for the model for this context
```csharp
public static uint llama_n_ctx(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)
### **llama_n_batch(SafeLLamaContextHandle)**
Get the batch size for this context
```csharp
public static uint llama_n_batch(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)
### **llama_get_logits(SafeLLamaContextHandle)**
Token logits obtained from the last call to llama_decode
The logits for the last token are stored in the last row
Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab
```csharp
public static Single* llama_get_logits(SafeLLamaContextHandle ctx)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
#### Returns
[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)
### **llama_get_logits_ith(SafeLLamaContextHandle, Int32)**
Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
```csharp
public static Single* llama_get_logits_ith(SafeLLamaContextHandle ctx, int i)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`i` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
#### Returns
[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)
### **llama_get_embeddings_ith(SafeLLamaContextHandle, Int32)**
Get the embeddings for the ith sequence. Equivalent to: llama_get_embeddings(ctx) + i*n_embd
```csharp
public static Single* llama_get_embeddings_ith(SafeLLamaContextHandle ctx, int i)
```
#### Parameters
`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
`i` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)
#### Returns
[Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)