diff --git a/LLama/LLamaSharp.csproj b/LLama/LLamaSharp.csproj
index 5cbee538..d1f12f42 100644
--- a/LLama/LLamaSharp.csproj
+++ b/LLama/LLamaSharp.csproj
@@ -7,8 +7,8 @@
     <Platforms>AnyCPU;x64;Arm64</Platforms>
     <AllowUnsafeBlocks>True</AllowUnsafeBlocks>
 
-    <Version>0.4.2</Version>
-    <Authors>Yaohui Liu, Haiping Chen</Authors>
+    <Version>0.5.0</Version>
+    <Authors>Yaohui Liu, Martin Devans, Haiping Chen</Authors>
     <Company>SciSharp STACK</Company>
     <GeneratePackageOnBuild>true</GeneratePackageOnBuild>
     <Copyright>MIT, SciSharp STACK $([System.DateTime]::UtcNow.ToString(yyyy))</Copyright>
@@ -21,7 +21,7 @@
       weights to run, please go to https://github.com/SciSharp/LLamaSharp for more information.
     </Description>
     <PackageReleaseNotes>
-      LLamaSharp 0.4.1 followed up the master branch of llama.cpp. (commit id: aacdbd4)
+      LLamaSharp 0.5.0 adds support for GGUF, grammar and integration with semantic-kernel.
     </PackageReleaseNotes>
     <PackageLicenseExpression>MIT</PackageLicenseExpression>
     <PackageOutputPath>packages</PackageOutputPath>
diff --git a/docs/Architecture.md b/docs/Architecture.md
index 8d12556c..b0c6226b 100644
--- a/docs/Architecture.md
+++ b/docs/Architecture.md
@@ -4,9 +4,9 @@
 
 The figure below shows the core framework structure, which is separated to four levels.
 
-- **LLamaModel**: The holder of a model which directly interact with native library and provide some basic APIs such as tokenization and embedding. Currently it includes three classes: `LLamaModel`, `LLamaEmbedder` and `LLamaQuantizer`.
+- **LLamaContext**: The holder of a model which directly interact with native library and provide some basic APIs such as tokenization and embedding. Currently it includes three classes: `LLamaContext`, `LLamaEmbedder` and `LLamaQuantizer`.
 - **LLamaExecutors**: Executors which define the way to run the LLama model. It provides text-to-text APIs to make it easy to use. Currently we provide three kinds of executors: `InteractiveExecutor`, `InstructuExecutor` and `StatelessExecutor`.
-- **ChatSession**: A wrapping for `InteractiveExecutor` and `LLamaModel`, which supports interactive tasks and saving/re-loading sessions. It also provides a flexible way to customize the text process by `IHistoryTransform`, `ITextTransform` and `ITextStreamTransform`.
+- **ChatSession**: A wrapping for `InteractiveExecutor` and `LLamaContext`, which supports interactive tasks and saving/re-loading sessions. It also provides a flexible way to customize the text process by `IHistoryTransform`, `ITextTransform` and `ITextStreamTransform`.
 - **High-level Applications**: Some applications that provides higher-level integration. For example, [BotSharp](https://github.com/SciSharp/BotSharp) provides integration for vector search, Chatbot UI and Web APIs. [semantic-kernel](https://github.com/microsoft/semantic-kernel) provides various APIs for manipulations related with LLM. If you've made an integration, please tell us and add it to the doc!
 
 
@@ -14,7 +14,7 @@ The figure below shows the core framework structure, which is separated to four
 
 ## Recommended Use
 
-Since `LLamaModel` interact with native library, it's not recommended to use the methods of it directly unless you know what you are doing. So does the `NativeApi`, which is not included in the architecture figure above.
+Since `LLamaContext` interact with native library, it's not recommended to use the methods of it directly unless you know what you are doing. So does the `NativeApi`, which is not included in the architecture figure above.
 
 `ChatSession` is recommended to be used when you want to build an application similar to ChatGPT, or the ChatBot, because it works best with `InteractiveExecutor`. Though other executors are also allowed to passed as a parameter to initialize a `ChatSession`, it's not encouraged if you are new to LLamaSharp and LLM.
 
diff --git a/docs/HighLevelApps/semantic-kernel.md b/docs/HighLevelApps/semantic-kernel.md
new file mode 100644
index 00000000..b6ebe65c
--- /dev/null
+++ b/docs/HighLevelApps/semantic-kernel.md
@@ -0,0 +1,3 @@
+# The Usage of semantic-kernel Integration
+
+Please see [this doc](../../LLama.SemanticKernel/README.md)
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index 97e1008f..26fb68c0 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -9,6 +9,7 @@ LLamaSharp is the C#/.NET binding of [llama.cpp](https://github.com/ggerganov/ll
 - Model inference
 - Model quantization
 - Generating embeddings
+- Grammar parse
 - Interactive/Instruct/Stateless executor mode
 - Chat session APIs
 - Save/load the state
diff --git a/docs/media/structure.jpg b/docs/media/structure.jpg
index a0b708bc..74173977 100644
Binary files a/docs/media/structure.jpg and b/docs/media/structure.jpg differ
diff --git a/docs/media/structure.vsdx b/docs/media/structure.vsdx
index e703b502..c36500eb 100644
Binary files a/docs/media/structure.vsdx and b/docs/media/structure.vsdx differ
diff --git a/docs/xmldocs/index.md b/docs/xmldocs/index.md
index 7bc5a746..68daac1b 100644
--- a/docs/xmldocs/index.md
+++ b/docs/xmldocs/index.md
@@ -8,26 +8,32 @@
 
 [InteractiveExecutor](./llama.interactiveexecutor.md)
 
-[LLamaEmbedder](./llama.llamaembedder.md)
+[LLamaContext](./llama.llamacontext.md)
 
-[LLamaModel](./llama.llamamodel.md)
+[LLamaEmbedder](./llama.llamaembedder.md)
 
 [LLamaQuantizer](./llama.llamaquantizer.md)
 
 [LLamaTransforms](./llama.llamatransforms.md)
 
-[ResettableLLamaModel](./llama.resettablellamamodel.md)
+[LLamaWeights](./llama.llamaweights.md)
 
 [StatefulExecutorBase](./llama.statefulexecutorbase.md)
 
 [StatelessExecutor](./llama.statelessexecutor.md)
 
+[Utils](./llama.utils.md)
+
 ## LLama.Abstractions
 
 [IHistoryTransform](./llama.abstractions.ihistorytransform.md)
 
+[IInferenceParams](./llama.abstractions.iinferenceparams.md)
+
 [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)
 
+[IModelParams](./llama.abstractions.imodelparams.md)
+
 [ITextStreamTransform](./llama.abstractions.itextstreamtransform.md)
 
 [ITextTransform](./llama.abstractions.itexttransform.md)
@@ -46,17 +52,45 @@
 
 [LLamaDefaultLogger](./llama.common.llamadefaultlogger.md)
 
-[MiroStateType](./llama.common.mirostatetype.md)
+[MirostatType](./llama.common.mirostattype.md)
 
 [ModelParams](./llama.common.modelparams.md)
 
 ## LLama.Exceptions
 
+[GrammarExpectedName](./llama.exceptions.grammarexpectedname.md)
+
+[GrammarExpectedNext](./llama.exceptions.grammarexpectednext.md)
+
+[GrammarExpectedPrevious](./llama.exceptions.grammarexpectedprevious.md)
+
+[GrammarFormatException](./llama.exceptions.grammarformatexception.md)
+
+[GrammarUnexpectedCharAltElement](./llama.exceptions.grammarunexpectedcharaltelement.md)
+
+[GrammarUnexpectedCharRngElement](./llama.exceptions.grammarunexpectedcharrngelement.md)
+
+[GrammarUnexpectedEndElement](./llama.exceptions.grammarunexpectedendelement.md)
+
+[GrammarUnexpectedEndOfInput](./llama.exceptions.grammarunexpectedendofinput.md)
+
+[GrammarUnexpectedHexCharsCount](./llama.exceptions.grammarunexpectedhexcharscount.md)
+
+[GrammarUnknownEscapeCharacter](./llama.exceptions.grammarunknownescapecharacter.md)
+
 [RuntimeError](./llama.exceptions.runtimeerror.md)
 
 ## LLama.Extensions
 
-[DictionaryExtension](./llama.extensions.dictionaryextension.md)
+[IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
+
+[KeyValuePairExtensions](./llama.extensions.keyvaluepairextensions.md)
+
+## LLama.Grammars
+
+[Grammar](./llama.grammars.grammar.md)
+
+[GrammarRule](./llama.grammars.grammarrule.md)
 
 ## LLama.Native
 
@@ -64,6 +98,12 @@
 
 [LLamaFtype](./llama.native.llamaftype.md)
 
+[LLamaGrammarElement](./llama.native.llamagrammarelement.md)
+
+[LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)
+
+[LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)
+
 [LLamaTokenData](./llama.native.llamatokendata.md)
 
 [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)
@@ -74,8 +114,14 @@
 
 [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)
 
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)
+
 [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md)
 
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)
+
+[SamplingApi](./llama.native.samplingapi.md)
+
 ## LLama.OldVersion
 
 [ChatCompletion](./llama.oldversion.chatcompletion.md)
diff --git a/docs/xmldocs/llama.abstractions.iinferenceparams.md b/docs/xmldocs/llama.abstractions.iinferenceparams.md
new file mode 100644
index 00000000..5b48b8d5
--- /dev/null
+++ b/docs/xmldocs/llama.abstractions.iinferenceparams.md
@@ -0,0 +1,268 @@
+# IInferenceParams
+
+Namespace: LLama.Abstractions
+
+The paramters used for inference.
+
+```csharp
+public interface IInferenceParams
+```
+
+## Properties
+
+### **TokensKeep**
+
+number of tokens to keep from initial prompt
+
+```csharp
+public abstract int TokensKeep { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **MaxTokens**
+
+how many new tokens to predict (n_predict), set to -1 to inifinitely generate response
+ until it complete.
+
+```csharp
+public abstract int MaxTokens { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **LogitBias**
+
+logit bias for specific tokens
+
+```csharp
+public abstract Dictionary<int, float> LogitBias { get; set; }
+```
+
+#### Property Value
+
+[Dictionary&lt;Int32, Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2)<br>
+
+### **AntiPrompts**
+
+Sequences where the model will stop generating further tokens.
+
+```csharp
+public abstract IEnumerable<string> AntiPrompts { get; set; }
+```
+
+#### Property Value
+
+[IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+### **PathSession**
+
+path to file for saving/loading model eval state
+
+```csharp
+public abstract string PathSession { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **InputSuffix**
+
+string to suffix user inputs with
+
+```csharp
+public abstract string InputSuffix { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **InputPrefix**
+
+string to prefix user inputs with
+
+```csharp
+public abstract string InputPrefix { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **TopK**
+
+0 or lower to use vocab size
+
+```csharp
+public abstract int TopK { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **TopP**
+
+1.0 = disabled
+
+```csharp
+public abstract float TopP { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **TfsZ**
+
+1.0 = disabled
+
+```csharp
+public abstract float TfsZ { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **TypicalP**
+
+1.0 = disabled
+
+```csharp
+public abstract float TypicalP { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **Temperature**
+
+1.0 = disabled
+
+```csharp
+public abstract float Temperature { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RepeatPenalty**
+
+1.0 = disabled
+
+```csharp
+public abstract float RepeatPenalty { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RepeatLastTokensCount**
+
+last n tokens to penalize (0 = disable penalty, -1 = context size) (repeat_last_n)
+
+```csharp
+public abstract int RepeatLastTokensCount { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **FrequencyPenalty**
+
+frequency penalty coefficient
+ 0.0 = disabled
+
+```csharp
+public abstract float FrequencyPenalty { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **PresencePenalty**
+
+presence penalty coefficient
+ 0.0 = disabled
+
+```csharp
+public abstract float PresencePenalty { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **Mirostat**
+
+Mirostat uses tokens instead of words.
+ algorithm described in the paper https://arxiv.org/abs/2007.14966.
+ 0 = disabled, 1 = mirostat, 2 = mirostat 2.0
+
+```csharp
+public abstract MirostatType Mirostat { get; set; }
+```
+
+#### Property Value
+
+[MirostatType](./llama.common.mirostattype.md)<br>
+
+### **MirostatTau**
+
+target entropy
+
+```csharp
+public abstract float MirostatTau { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **MirostatEta**
+
+learning rate
+
+```csharp
+public abstract float MirostatEta { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **PenalizeNL**
+
+consider newlines as a repeatable token (penalize_nl)
+
+```csharp
+public abstract bool PenalizeNL { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Grammar**
+
+Grammar to constrain possible tokens
+
+```csharp
+public abstract SafeLLamaGrammarHandle Grammar { get; set; }
+```
+
+#### Property Value
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
diff --git a/docs/xmldocs/llama.abstractions.illamaexecutor.md b/docs/xmldocs/llama.abstractions.illamaexecutor.md
index 9ddaaa45..3091b6f3 100644
--- a/docs/xmldocs/llama.abstractions.illamaexecutor.md
+++ b/docs/xmldocs/llama.abstractions.illamaexecutor.md
@@ -10,26 +10,26 @@ public interface ILLamaExecutor
 
 ## Properties
 
-### **Model**
+### **Context**
 
-The loaded model for this executor.
+The loaded context for this executor.
 
 ```csharp
-public abstract LLamaModel Model { get; }
+public abstract LLamaContext Context { get; }
 ```
 
 #### Property Value
 
-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>
 
 ## Methods
 
-### **Infer(String, InferenceParams, CancellationToken)**
+### **Infer(String, IInferenceParams, CancellationToken)**
 
 Infers a response from the model.
 
 ```csharp
-IEnumerable<string> Infer(string text, InferenceParams inferenceParams, CancellationToken token)
+IEnumerable<string> Infer(string text, IInferenceParams inferenceParams, CancellationToken token)
 ```
 
 #### Parameters
@@ -37,7 +37,7 @@ IEnumerable<string> Infer(string text, InferenceParams inferenceParams, Cancella
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 Your prompt
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 Any additional parameters
 
 `token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
@@ -47,19 +47,24 @@ A cancellation token.
 
 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
 
-### **InferAsync(String, InferenceParams, CancellationToken)**
+### **InferAsync(String, IInferenceParams, CancellationToken)**
+
+Asynchronously infers a response from the model.
 
 ```csharp
-IAsyncEnumerable<string> InferAsync(string text, InferenceParams inferenceParams, CancellationToken token)
+IAsyncEnumerable<string> InferAsync(string text, IInferenceParams inferenceParams, CancellationToken token)
 ```
 
 #### Parameters
 
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Your prompt
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
+Any additional parameters
 
 `token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+A cancellation token.
 
 #### Returns
 
diff --git a/docs/xmldocs/llama.abstractions.imodelparams.md b/docs/xmldocs/llama.abstractions.imodelparams.md
new file mode 100644
index 00000000..140cfaf1
--- /dev/null
+++ b/docs/xmldocs/llama.abstractions.imodelparams.md
@@ -0,0 +1,276 @@
+# IModelParams
+
+Namespace: LLama.Abstractions
+
+The parameters for initializing a LLama model.
+
+```csharp
+public interface IModelParams
+```
+
+## Properties
+
+### **ContextSize**
+
+Model context size (n_ctx)
+
+```csharp
+public abstract int ContextSize { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **MainGpu**
+
+the GPU that is used for scratch and small tensors
+
+```csharp
+public abstract int MainGpu { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **LowVram**
+
+if true, reduce VRAM usage at the cost of performance
+
+```csharp
+public abstract bool LowVram { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GpuLayerCount**
+
+Number of layers to run in VRAM / GPU memory (n_gpu_layers)
+
+```csharp
+public abstract int GpuLayerCount { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Seed**
+
+Seed for the random number generator (seed)
+
+```csharp
+public abstract int Seed { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **UseFp16Memory**
+
+Use f16 instead of f32 for memory kv (memory_f16)
+
+```csharp
+public abstract bool UseFp16Memory { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **UseMemorymap**
+
+Use mmap for faster loads (use_mmap)
+
+```csharp
+public abstract bool UseMemorymap { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **UseMemoryLock**
+
+Use mlock to keep model in memory (use_mlock)
+
+```csharp
+public abstract bool UseMemoryLock { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Perplexity**
+
+Compute perplexity over the prompt (perplexity)
+
+```csharp
+public abstract bool Perplexity { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **ModelPath**
+
+Model path (model)
+
+```csharp
+public abstract string ModelPath { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **ModelAlias**
+
+model alias
+
+```csharp
+public abstract string ModelAlias { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **LoraAdapter**
+
+lora adapter path (lora_adapter)
+
+```csharp
+public abstract string LoraAdapter { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **LoraBase**
+
+base model path for the lora adapter (lora_base)
+
+```csharp
+public abstract string LoraBase { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Threads**
+
+Number of threads (-1 = autodetect) (n_threads)
+
+```csharp
+public abstract int Threads { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **BatchSize**
+
+batch size for prompt processing (must be &gt;=32 to use BLAS) (n_batch)
+
+```csharp
+public abstract int BatchSize { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ConvertEosToNewLine**
+
+Whether to convert eos to newline during the inference.
+
+```csharp
+public abstract bool ConvertEosToNewLine { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **EmbeddingMode**
+
+Whether to use embedding mode. (embedding) Note that if this is set to true, 
+ The LLamaModel won't produce text response anymore.
+
+```csharp
+public abstract bool EmbeddingMode { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **TensorSplits**
+
+how split tensors should be distributed across GPUs
+
+```csharp
+public abstract Single[] TensorSplits { get; set; }
+```
+
+#### Property Value
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyBase**
+
+RoPE base frequency
+
+```csharp
+public abstract float RopeFrequencyBase { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyScale**
+
+RoPE frequency scaling factor
+
+```csharp
+public abstract float RopeFrequencyScale { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **MulMatQ**
+
+Use experimental mul_mat_q kernels
+
+```csharp
+public abstract bool MulMatQ { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Encoding**
+
+The encoding to use for models
+
+```csharp
+public abstract Encoding Encoding { get; set; }
+```
+
+#### Property Value
+
+[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
diff --git a/docs/xmldocs/llama.chatsession.md b/docs/xmldocs/llama.chatsession.md
index f81e17f2..dcd818b8 100644
--- a/docs/xmldocs/llama.chatsession.md
+++ b/docs/xmldocs/llama.chatsession.md
@@ -161,19 +161,19 @@ public void LoadSession(string path)
 `path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 The directory name to load the session.
 
-### **Chat(ChatHistory, InferenceParams, CancellationToken)**
+### **Chat(ChatHistory, IInferenceParams, CancellationToken)**
 
 Get the response from the LLama model with chat histories.
 
 ```csharp
-public IEnumerable<string> Chat(ChatHistory history, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Chat(ChatHistory history, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `history` [ChatHistory](./llama.common.chathistory.md)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
@@ -181,20 +181,20 @@ public IEnumerable<string> Chat(ChatHistory history, InferenceParams inferencePa
 
 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
 
-### **Chat(String, InferenceParams, CancellationToken)**
+### **Chat(String, IInferenceParams, CancellationToken)**
 
 Get the response from the LLama model. Note that prompt could not only be the preset words, 
  but also the question you want to ask.
 
 ```csharp
-public IEnumerable<string> Chat(string prompt, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Chat(string prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
@@ -202,19 +202,19 @@ public IEnumerable<string> Chat(string prompt, InferenceParams inferenceParams,
 
 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
 
-### **ChatAsync(ChatHistory, InferenceParams, CancellationToken)**
+### **ChatAsync(ChatHistory, IInferenceParams, CancellationToken)**
 
 Get the response from the LLama model with chat histories.
 
 ```csharp
-public IAsyncEnumerable<string> ChatAsync(ChatHistory history, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IAsyncEnumerable<string> ChatAsync(ChatHistory history, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `history` [ChatHistory](./llama.common.chathistory.md)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
@@ -222,19 +222,19 @@ public IAsyncEnumerable<string> ChatAsync(ChatHistory history, InferenceParams i
 
 [IAsyncEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1)<br>
 
-### **ChatAsync(String, InferenceParams, CancellationToken)**
+### **ChatAsync(String, IInferenceParams, CancellationToken)**
 
 Get the response from the LLama model with chat histories asynchronously.
 
 ```csharp
-public IAsyncEnumerable<string> ChatAsync(string prompt, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IAsyncEnumerable<string> ChatAsync(string prompt, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `prompt` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
diff --git a/docs/xmldocs/llama.common.authorrole.md b/docs/xmldocs/llama.common.authorrole.md
index 10fc2b6d..da1881f4 100644
--- a/docs/xmldocs/llama.common.authorrole.md
+++ b/docs/xmldocs/llama.common.authorrole.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Common
 
+Role of the message author, e.g. user/assistant/system
+
 ```csharp
 public enum AuthorRole
 ```
@@ -13,3 +15,7 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom
 
 | Name | Value | Description |
 | --- | --: | --- |
+| Unknown | -1 | Role is unknown |
+| System | 0 | Message comes from a "system" prompt, not written by a user or language model |
+| User | 1 | Message comes from the user |
+| Assistant | 2 | Messages was generated by the language model |
diff --git a/docs/xmldocs/llama.common.fixedsizequeue-1.md b/docs/xmldocs/llama.common.fixedsizequeue-1.md
index c3d1a354..32ba6ecf 100644
--- a/docs/xmldocs/llama.common.fixedsizequeue-1.md
+++ b/docs/xmldocs/llama.common.fixedsizequeue-1.md
@@ -20,6 +20,8 @@ Implements IEnumerable&lt;T&gt;, [IEnumerable](https://docs.microsoft.com/en-us/
 
 ### **Count**
 
+Number of items in this queue
+
 ```csharp
 public int Count { get; }
 ```
@@ -30,6 +32,8 @@ public int Count { get; }
 
 ### **Capacity**
 
+Maximum number of items allowed in this queue
+
 ```csharp
 public int Capacity { get; }
 ```
@@ -42,6 +46,8 @@ public int Capacity { get; }
 
 ### **FixedSizeQueue(Int32)**
 
+Create a new queue
+
 ```csharp
 public FixedSizeQueue(int size)
 ```
@@ -49,9 +55,12 @@ public FixedSizeQueue(int size)
 #### Parameters
 
 `size` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+the maximum number of items to store in this queue
 
 ### **FixedSizeQueue(Int32, IEnumerable&lt;T&gt;)**
 
+Fill the quene with the data. Please ensure that data.Count &lt;= size
+
 ```csharp
 public FixedSizeQueue(int size, IEnumerable<T> data)
 ```
@@ -66,6 +75,8 @@ public FixedSizeQueue(int size, IEnumerable<T> data)
 
 ### **FillWith(T)**
 
+Replace every item in the queue with the given value
+
 ```csharp
 public FixedSizeQueue<T> FillWith(T value)
 ```
@@ -73,10 +84,12 @@ public FixedSizeQueue<T> FillWith(T value)
 #### Parameters
 
 `value` T<br>
+The value to replace all items with
 
 #### Returns
 
 [FixedSizeQueue&lt;T&gt;](./llama.common.fixedsizequeue-1.md)<br>
+returns this
 
 ### **Enqueue(T)**
 
@@ -90,16 +103,6 @@ public void Enqueue(T item)
 
 `item` T<br>
 
-### **ToArray()**
-
-```csharp
-public T[] ToArray()
-```
-
-#### Returns
-
-T[]<br>
-
 ### **GetEnumerator()**
 
 ```csharp
diff --git a/docs/xmldocs/llama.common.illamalogger.md b/docs/xmldocs/llama.common.illamalogger.md
index 4ede2f5a..e35a9417 100644
--- a/docs/xmldocs/llama.common.illamalogger.md
+++ b/docs/xmldocs/llama.common.illamalogger.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Common
 
+receives log messages from LLamaSharp
+
 ```csharp
 public interface ILLamaLogger
 ```
@@ -10,7 +12,7 @@ public interface ILLamaLogger
 
 ### **Log(String, String, LogLevel)**
 
-Write the log in cosutomized way
+Write the log in customized way
 
 ```csharp
 void Log(string source, string message, LogLevel level)
diff --git a/docs/xmldocs/llama.common.inferenceparams.md b/docs/xmldocs/llama.common.inferenceparams.md
index ac9d7bf2..f8142332 100644
--- a/docs/xmldocs/llama.common.inferenceparams.md
+++ b/docs/xmldocs/llama.common.inferenceparams.md
@@ -2,11 +2,14 @@
 
 Namespace: LLama.Common
 
+The paramters used for inference.
+
 ```csharp
-public class InferenceParams
+public class InferenceParams : LLama.Abstractions.IInferenceParams
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [InferenceParams](./llama.common.inferenceparams.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [InferenceParams](./llama.common.inferenceparams.md)<br>
+Implements [IInferenceParams](./llama.abstractions.iinferenceparams.md)
 
 ## Properties
 
@@ -212,12 +215,12 @@ Mirostat uses tokens instead of words.
  0 = disabled, 1 = mirostat, 2 = mirostat 2.0
 
 ```csharp
-public MiroStateType Mirostat { get; set; }
+public MirostatType Mirostat { get; set; }
 ```
 
 #### Property Value
 
-[MiroStateType](./llama.common.mirostatetype.md)<br>
+[MirostatType](./llama.common.mirostattype.md)<br>
 
 ### **MirostatTau**
 
@@ -255,6 +258,18 @@ public bool PenalizeNL { get; set; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+### **Grammar**
+
+A grammar to constrain the possible tokens
+
+```csharp
+public SafeLLamaGrammarHandle Grammar { get; set; }
+```
+
+#### Property Value
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
 ## Constructors
 
 ### **InferenceParams()**
diff --git a/docs/xmldocs/llama.common.llamadefaultlogger.md b/docs/xmldocs/llama.common.llamadefaultlogger.md
index 2159852f..aeef13b0 100644
--- a/docs/xmldocs/llama.common.llamadefaultlogger.md
+++ b/docs/xmldocs/llama.common.llamadefaultlogger.md
@@ -2,8 +2,8 @@
 
 Namespace: LLama.Common
 
-The default logger of LLamaSharp. On default it write to console. User methods of `LLamaLogger.Default` to change the behavior.
- It's more recommended to inherit `ILLamaLogger` to cosutomize the behavior.
+The default logger of LLamaSharp. On default it write to console. Use methods of `LLamaLogger.Default` to change the behavior.
+ It's recommended to inherit `ILLamaLogger` to customize the behavior.
 
 ```csharp
 public sealed class LLamaDefaultLogger : ILLamaLogger
@@ -16,6 +16,8 @@ Implements [ILLamaLogger](./llama.common.illamalogger.md)
 
 ### **Default**
 
+Get the default logger instance
+
 ```csharp
 public static LLamaDefaultLogger Default { get; }
 ```
@@ -26,8 +28,22 @@ public static LLamaDefaultLogger Default { get; }
 
 ## Methods
 
+### **EnableNative()**
+
+Enable logging output from llama.cpp
+
+```csharp
+public LLamaDefaultLogger EnableNative()
+```
+
+#### Returns
+
+[LLamaDefaultLogger](./llama.common.llamadefaultlogger.md)<br>
+
 ### **EnableConsole()**
 
+Enable writing log messages to console
+
 ```csharp
 public LLamaDefaultLogger EnableConsole()
 ```
@@ -38,6 +54,8 @@ public LLamaDefaultLogger EnableConsole()
 
 ### **DisableConsole()**
 
+Disable writing messages to console
+
 ```csharp
 public LLamaDefaultLogger DisableConsole()
 ```
@@ -48,6 +66,8 @@ public LLamaDefaultLogger DisableConsole()
 
 ### **EnableFile(String, FileMode)**
 
+Enable writing log messages to file
+
 ```csharp
 public LLamaDefaultLogger EnableFile(string filename, FileMode mode)
 ```
@@ -64,6 +84,14 @@ public LLamaDefaultLogger EnableFile(string filename, FileMode mode)
 
 ### **DisableFile(String)**
 
+#### Caution
+
+Use DisableFile method without 'filename' parameter
+
+---
+
+Disable writing log messages to file
+
 ```csharp
 public LLamaDefaultLogger DisableFile(string filename)
 ```
@@ -71,6 +99,19 @@ public LLamaDefaultLogger DisableFile(string filename)
 #### Parameters
 
 `filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+unused!
+
+#### Returns
+
+[LLamaDefaultLogger](./llama.common.llamadefaultlogger.md)<br>
+
+### **DisableFile()**
+
+Disable writing log messages to file
+
+```csharp
+public LLamaDefaultLogger DisableFile()
+```
 
 #### Returns
 
@@ -78,6 +119,8 @@ public LLamaDefaultLogger DisableFile(string filename)
 
 ### **Log(String, String, LogLevel)**
 
+Log a message
+
 ```csharp
 public void Log(string source, string message, LogLevel level)
 ```
@@ -85,13 +128,18 @@ public void Log(string source, string message, LogLevel level)
 #### Parameters
 
 `source` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The source of this message (e.g. class name)
 
 `message` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The message to log
 
 `level` [LogLevel](./llama.common.illamalogger.loglevel.md)<br>
+Severity level of this message
 
 ### **Info(String)**
 
+Write a log message with "Info" severity
+
 ```csharp
 public void Info(string message)
 ```
@@ -102,6 +150,8 @@ public void Info(string message)
 
 ### **Warn(String)**
 
+Write a log message with "Warn" severity
+
 ```csharp
 public void Warn(string message)
 ```
@@ -112,6 +162,8 @@ public void Warn(string message)
 
 ### **Error(String)**
 
+Write a log message with "Error" severity
+
 ```csharp
 public void Error(string message)
 ```
diff --git a/docs/xmldocs/llama.common.mirostatetype.md b/docs/xmldocs/llama.common.mirostattype.md
similarity index 61%
rename from docs/xmldocs/llama.common.mirostatetype.md
rename to docs/xmldocs/llama.common.mirostattype.md
index b72aafc3..6d54c181 100644
--- a/docs/xmldocs/llama.common.mirostatetype.md
+++ b/docs/xmldocs/llama.common.mirostattype.md
@@ -1,15 +1,21 @@
-# MiroStateType
+# MirostatType
 
 Namespace: LLama.Common
 
+Type of "mirostat" sampling to use.
+ https://github.com/basusourya/mirostat
+
 ```csharp
-public enum MiroStateType
+public enum MirostatType
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [MiroStateType](./llama.common.mirostatetype.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [MirostatType](./llama.common.mirostattype.md)<br>
 Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
 
 ## Fields
 
 | Name | Value | Description |
 | --- | --: | --- |
+| Disable | 0 | Disable Mirostat sampling |
+| Mirostat | 1 | Original mirostat algorithm |
+| Mirostat2 | 2 | Mirostat 2.0 algorithm |
diff --git a/docs/xmldocs/llama.common.modelparams.md b/docs/xmldocs/llama.common.modelparams.md
index d041faf2..85dc655e 100644
--- a/docs/xmldocs/llama.common.modelparams.md
+++ b/docs/xmldocs/llama.common.modelparams.md
@@ -2,11 +2,14 @@
 
 Namespace: LLama.Common
 
+The parameters for initializing a LLama model.
+
 ```csharp
-public class ModelParams
+public class ModelParams : LLama.Abstractions.IModelParams, System.IEquatable`1[[LLama.Common.ModelParams, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ModelParams](./llama.common.modelparams.md)
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ModelParams](./llama.common.modelparams.md)<br>
+Implements [IModelParams](./llama.abstractions.imodelparams.md), [IEquatable&lt;ModelParams&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
 
 ## Properties
 
@@ -22,6 +25,30 @@ public int ContextSize { get; set; }
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
+### **MainGpu**
+
+the GPU that is used for scratch and small tensors
+
+```csharp
+public int MainGpu { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **LowVram**
+
+if true, reduce VRAM usage at the cost of performance
+
+```csharp
+public bool LowVram { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **GpuLayerCount**
 
 Number of layers to run in VRAM / GPU memory (n_gpu_layers)
@@ -106,6 +133,18 @@ public string ModelPath { get; set; }
 
 [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
+### **ModelAlias**
+
+model alias
+
+```csharp
+public string ModelAlias { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
 ### **LoraAdapter**
 
 lora adapter path (lora_adapter)
@@ -179,14 +218,93 @@ public bool EmbeddingMode { get; set; }
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
+### **TensorSplits**
+
+how split tensors should be distributed across GPUs
+
+```csharp
+public Single[] TensorSplits { get; set; }
+```
+
+#### Property Value
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyBase**
+
+RoPE base frequency
+
+```csharp
+public float RopeFrequencyBase { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **RopeFrequencyScale**
+
+RoPE frequency scaling factor
+
+```csharp
+public float RopeFrequencyScale { get; set; }
+```
+
+#### Property Value
+
+[Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **MulMatQ**
+
+Use experimental mul_mat_q kernels
+
+```csharp
+public bool MulMatQ { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Encoding**
+
+The encoding to use to convert text for the model
+
+```csharp
+public Encoding Encoding { get; set; }
+```
+
+#### Property Value
+
+[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
 ## Constructors
 
-### **ModelParams(String, Int32, Int32, Int32, Boolean, Boolean, Boolean, Boolean, String, String, Int32, Int32, Boolean, Boolean)**
+### **ModelParams(String)**
 
 
 
 ```csharp
-public ModelParams(string modelPath, int contextSize, int gpuLayerCount, int seed, bool useFp16Memory, bool useMemorymap, bool useMemoryLock, bool perplexity, string loraAdapter, string loraBase, int threads, int batchSize, bool convertEosToNewLine, bool embeddingMode)
+public ModelParams(string modelPath)
+```
+
+#### Parameters
+
+`modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The model path.
+
+### **ModelParams(String, Int32, Int32, Int32, Boolean, Boolean, Boolean, Boolean, String, String, Int32, Int32, Boolean, Boolean, Single, Single, Boolean, String)**
+
+#### Caution
+
+Use object initializer to set all optional parameters
+
+---
+
+
+
+```csharp
+public ModelParams(string modelPath, int contextSize, int gpuLayerCount, int seed, bool useFp16Memory, bool useMemorymap, bool useMemoryLock, bool perplexity, string loraAdapter, string loraBase, int threads, int batchSize, bool convertEosToNewLine, bool embeddingMode, float ropeFrequencyBase, float ropeFrequencyScale, bool mulMatQ, string encoding)
 ```
 
 #### Parameters
@@ -232,3 +350,89 @@ Whether to convert eos to newline during the inference.
 
 `embeddingMode` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 Whether to use embedding mode. (embedding) Note that if this is set to true, The LLamaModel won't produce text response anymore.
+
+`ropeFrequencyBase` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+RoPE base frequency.
+
+`ropeFrequencyScale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+RoPE frequency scaling factor
+
+`mulMatQ` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Use experimental mul_mat_q kernels
+
+`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The encoding to use to convert text for the model
+
+## Methods
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **PrintMembers(StringBuilder)**
+
+```csharp
+protected bool PrintMembers(StringBuilder builder)
+```
+
+#### Parameters
+
+`builder` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetHashCode()**
+
+```csharp
+public int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+public bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(ModelParams)**
+
+```csharp
+public bool Equals(ModelParams other)
+```
+
+#### Parameters
+
+`other` [ModelParams](./llama.common.modelparams.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **&lt;Clone&gt;$()**
+
+```csharp
+public ModelParams <Clone>$()
+```
+
+#### Returns
+
+[ModelParams](./llama.common.modelparams.md)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarexpectedname.md b/docs/xmldocs/llama.exceptions.grammarexpectedname.md
new file mode 100644
index 00000000..8ad5fd21
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarexpectedname.md
@@ -0,0 +1,94 @@
+# GrammarExpectedName
+
+Namespace: LLama.Exceptions
+
+Failed to parse a "name" element when one was expected
+
+```csharp
+public class GrammarExpectedName : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedName](./llama.exceptions.grammarexpectedname.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarexpectednext.md b/docs/xmldocs/llama.exceptions.grammarexpectednext.md
new file mode 100644
index 00000000..bdf2df13
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarexpectednext.md
@@ -0,0 +1,94 @@
+# GrammarExpectedNext
+
+Namespace: LLama.Exceptions
+
+A specified string was expected when parsing
+
+```csharp
+public class GrammarExpectedNext : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedNext](./llama.exceptions.grammarexpectednext.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarexpectedprevious.md b/docs/xmldocs/llama.exceptions.grammarexpectedprevious.md
new file mode 100644
index 00000000..890e6bdc
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarexpectedprevious.md
@@ -0,0 +1,94 @@
+# GrammarExpectedPrevious
+
+Namespace: LLama.Exceptions
+
+A specified character was expected to preceded another when parsing
+
+```csharp
+public class GrammarExpectedPrevious : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarExpectedPrevious](./llama.exceptions.grammarexpectedprevious.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarformatexception.md b/docs/xmldocs/llama.exceptions.grammarformatexception.md
new file mode 100644
index 00000000..74a9d80c
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarformatexception.md
@@ -0,0 +1,94 @@
+# GrammarFormatException
+
+Namespace: LLama.Exceptions
+
+Base class for all grammar exceptions
+
+```csharp
+public abstract class GrammarFormatException : System.Exception, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md b/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
new file mode 100644
index 00000000..ddaf1a51
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
@@ -0,0 +1,94 @@
+# GrammarUnexpectedCharAltElement
+
+Namespace: LLama.Exceptions
+
+A CHAR_ALT was created without a preceding CHAR element
+
+```csharp
+public class GrammarUnexpectedCharAltElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedCharAltElement](./llama.exceptions.grammarunexpectedcharaltelement.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md b/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
new file mode 100644
index 00000000..882ba31e
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
@@ -0,0 +1,94 @@
+# GrammarUnexpectedCharRngElement
+
+Namespace: LLama.Exceptions
+
+A CHAR_RNG was created without a preceding CHAR element
+
+```csharp
+public class GrammarUnexpectedCharRngElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedCharRngElement](./llama.exceptions.grammarunexpectedcharrngelement.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md b/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md
new file mode 100644
index 00000000..af98be6c
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedendelement.md
@@ -0,0 +1,94 @@
+# GrammarUnexpectedEndElement
+
+Namespace: LLama.Exceptions
+
+An END was encountered before the last element
+
+```csharp
+public class GrammarUnexpectedEndElement : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedEndElement](./llama.exceptions.grammarunexpectedendelement.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md b/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md
new file mode 100644
index 00000000..1d1f1133
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedendofinput.md
@@ -0,0 +1,94 @@
+# GrammarUnexpectedEndOfInput
+
+Namespace: LLama.Exceptions
+
+End-of-file was encountered while parsing
+
+```csharp
+public class GrammarUnexpectedEndOfInput : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedEndOfInput](./llama.exceptions.grammarunexpectedendofinput.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md b/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
new file mode 100644
index 00000000..f699939f
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
@@ -0,0 +1,94 @@
+# GrammarUnexpectedHexCharsCount
+
+Namespace: LLama.Exceptions
+
+An incorrect number of characters were encountered while parsing a hex literal
+
+```csharp
+public class GrammarUnexpectedHexCharsCount : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnexpectedHexCharsCount](./llama.exceptions.grammarunexpectedhexcharscount.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md b/docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md
new file mode 100644
index 00000000..009a5bf8
--- /dev/null
+++ b/docs/xmldocs/llama.exceptions.grammarunknownescapecharacter.md
@@ -0,0 +1,94 @@
+# GrammarUnknownEscapeCharacter
+
+Namespace: LLama.Exceptions
+
+An unexpected character was encountered after an escape sequence
+
+```csharp
+public class GrammarUnknownEscapeCharacter : GrammarFormatException, System.Runtime.Serialization.ISerializable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception) → [GrammarFormatException](./llama.exceptions.grammarformatexception.md) → [GrammarUnknownEscapeCharacter](./llama.exceptions.grammarunknownescapecharacter.md)<br>
+Implements [ISerializable](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.iserializable)
+
+## Properties
+
+### **TargetSite**
+
+```csharp
+public MethodBase TargetSite { get; }
+```
+
+#### Property Value
+
+[MethodBase](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.methodbase)<br>
+
+### **Message**
+
+```csharp
+public string Message { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Data**
+
+```csharp
+public IDictionary Data { get; }
+```
+
+#### Property Value
+
+[IDictionary](https://docs.microsoft.com/en-us/dotnet/api/system.collections.idictionary)<br>
+
+### **InnerException**
+
+```csharp
+public Exception InnerException { get; }
+```
+
+#### Property Value
+
+[Exception](https://docs.microsoft.com/en-us/dotnet/api/system.exception)<br>
+
+### **HelpLink**
+
+```csharp
+public string HelpLink { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Source**
+
+```csharp
+public string Source { get; set; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **HResult**
+
+```csharp
+public int HResult { get; set; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **StackTrace**
+
+```csharp
+public string StackTrace { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.extensions.dictionaryextension.md b/docs/xmldocs/llama.extensions.dictionaryextension.md
deleted file mode 100644
index 5c013c46..00000000
--- a/docs/xmldocs/llama.extensions.dictionaryextension.md
+++ /dev/null
@@ -1,73 +0,0 @@
-# DictionaryExtension
-
-Namespace: LLama.Extensions
-
-```csharp
-public static class DictionaryExtension
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [DictionaryExtension](./llama.extensions.dictionaryextension.md)
-
-## Methods
-
-### **Deconstruct&lt;T1, T2&gt;(KeyValuePair&lt;T1, T2&gt;, T1&, T2&)**
-
-```csharp
-public static void Deconstruct<T1, T2>(KeyValuePair<T1, T2> pair, T1& first, T2& second)
-```
-
-#### Type Parameters
-
-`T1`<br>
-
-`T2`<br>
-
-#### Parameters
-
-`pair` KeyValuePair&lt;T1, T2&gt;<br>
-
-`first` T1&<br>
-
-`second` T2&<br>
-
-### **Update&lt;T1, T2&gt;(Dictionary&lt;T1, T2&gt;, IDictionary&lt;T1, T2&gt;)**
-
-```csharp
-public static void Update<T1, T2>(Dictionary<T1, T2> dic, IDictionary<T1, T2> other)
-```
-
-#### Type Parameters
-
-`T1`<br>
-
-`T2`<br>
-
-#### Parameters
-
-`dic` Dictionary&lt;T1, T2&gt;<br>
-
-`other` IDictionary&lt;T1, T2&gt;<br>
-
-### **GetOrDefault&lt;T1, T2&gt;(Dictionary&lt;T1, T2&gt;, T1, T2)**
-
-```csharp
-public static T2 GetOrDefault<T1, T2>(Dictionary<T1, T2> dic, T1 key, T2 defaultValue)
-```
-
-#### Type Parameters
-
-`T1`<br>
-
-`T2`<br>
-
-#### Parameters
-
-`dic` Dictionary&lt;T1, T2&gt;<br>
-
-`key` T1<br>
-
-`defaultValue` T2<br>
-
-#### Returns
-
-T2<br>
diff --git a/docs/xmldocs/llama.extensions.imodelparamsextensions.md b/docs/xmldocs/llama.extensions.imodelparamsextensions.md
new file mode 100644
index 00000000..460be8f8
--- /dev/null
+++ b/docs/xmldocs/llama.extensions.imodelparamsextensions.md
@@ -0,0 +1,37 @@
+# IModelParamsExtensions
+
+Namespace: LLama.Extensions
+
+Extention methods to the IModelParams interface
+
+```csharp
+public static class IModelParamsExtensions
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [IModelParamsExtensions](./llama.extensions.imodelparamsextensions.md)
+
+## Methods
+
+### **ToLlamaContextParams(IModelParams, LLamaContextParams&)**
+
+Convert the given `IModelParams` into a `LLamaContextParams`
+
+```csharp
+public static MemoryHandle ToLlamaContextParams(IModelParams params, LLamaContextParams& result)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+`result` [LLamaContextParams&](./llama.native.llamacontextparams&.md)<br>
+
+#### Returns
+
+[MemoryHandle](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memoryhandle)<br>
+
+#### Exceptions
+
+[FileNotFoundException](https://docs.microsoft.com/en-us/dotnet/api/system.io.filenotfoundexception)<br>
+
+[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
diff --git a/docs/xmldocs/llama.extensions.keyvaluepairextensions.md b/docs/xmldocs/llama.extensions.keyvaluepairextensions.md
new file mode 100644
index 00000000..c72e1c7e
--- /dev/null
+++ b/docs/xmldocs/llama.extensions.keyvaluepairextensions.md
@@ -0,0 +1,40 @@
+# KeyValuePairExtensions
+
+Namespace: LLama.Extensions
+
+Extensions to the KeyValuePair struct
+
+```csharp
+public static class KeyValuePairExtensions
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [KeyValuePairExtensions](./llama.extensions.keyvaluepairextensions.md)
+
+## Methods
+
+### **Deconstruct&lt;TKey, TValue&gt;(KeyValuePair&lt;TKey, TValue&gt;, TKey&, TValue&)**
+
+Deconstruct a KeyValuePair into it's constituent parts.
+
+```csharp
+public static void Deconstruct<TKey, TValue>(KeyValuePair<TKey, TValue> pair, TKey& first, TValue& second)
+```
+
+#### Type Parameters
+
+`TKey`<br>
+Type of the Key
+
+`TValue`<br>
+Type of the Value
+
+#### Parameters
+
+`pair` KeyValuePair&lt;TKey, TValue&gt;<br>
+The KeyValuePair to deconstruct
+
+`first` TKey&<br>
+First element, the Key
+
+`second` TValue&<br>
+Second element, the Value
diff --git a/docs/xmldocs/llama.grammars.grammar.md b/docs/xmldocs/llama.grammars.grammar.md
new file mode 100644
index 00000000..3b794f45
--- /dev/null
+++ b/docs/xmldocs/llama.grammars.grammar.md
@@ -0,0 +1,110 @@
+# Grammar
+
+Namespace: LLama.Grammars
+
+A grammar is a set of [GrammarRule](./llama.grammars.grammarrule.md)s for deciding which characters are valid next. Can be used to constrain
+ output to certain formats - e.g. force the model to output JSON
+
+```csharp
+public sealed class Grammar
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Grammar](./llama.grammars.grammar.md)
+
+## Properties
+
+### **StartRuleIndex**
+
+Index of the initial rule to start from
+
+```csharp
+public ulong StartRuleIndex { get; set; }
+```
+
+#### Property Value
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **Rules**
+
+The rules which make up this grammar
+
+```csharp
+public IReadOnlyList<GrammarRule> Rules { get; }
+```
+
+#### Property Value
+
+[IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+## Constructors
+
+### **Grammar(IReadOnlyList&lt;GrammarRule&gt;, UInt64)**
+
+Create a new grammar from a set of rules
+
+```csharp
+public Grammar(IReadOnlyList<GrammarRule> rules, ulong startRuleIndex)
+```
+
+#### Parameters
+
+`rules` [IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+The rules which make up this grammar
+
+`startRuleIndex` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Index of the initial rule to start from
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+
+## Methods
+
+### **CreateInstance()**
+
+Create a `SafeLLamaGrammarHandle` instance to use for parsing
+
+```csharp
+public SafeLLamaGrammarHandle CreateInstance()
+```
+
+#### Returns
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+### **Parse(String, String)**
+
+Parse a string of GGML BNF into a Grammar
+
+```csharp
+public static Grammar Parse(string gbnf, string startRule)
+```
+
+#### Parameters
+
+`gbnf` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The string to parse
+
+`startRule` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+Name of the start rule of this grammar
+
+#### Returns
+
+[Grammar](./llama.grammars.grammar.md)<br>
+A Grammar which can be converted into a SafeLLamaGrammarHandle for sampling
+
+#### Exceptions
+
+[GrammarFormatException](./llama.exceptions.grammarformatexception.md)<br>
+Thrown if input is malformed
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/docs/xmldocs/llama.grammars.grammarrule.md b/docs/xmldocs/llama.grammars.grammarrule.md
new file mode 100644
index 00000000..3cac47c8
--- /dev/null
+++ b/docs/xmldocs/llama.grammars.grammarrule.md
@@ -0,0 +1,118 @@
+# GrammarRule
+
+Namespace: LLama.Grammars
+
+A single rule in a [Grammar](./llama.grammars.grammar.md)
+
+```csharp
+public sealed class GrammarRule : System.IEquatable`1[[LLama.Grammars.GrammarRule, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [GrammarRule](./llama.grammars.grammarrule.md)<br>
+Implements [IEquatable&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+
+## Properties
+
+### **Name**
+
+Name of this rule
+
+```csharp
+public string Name { get; }
+```
+
+#### Property Value
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Elements**
+
+The elements of this grammar rule
+
+```csharp
+public IReadOnlyList<LLamaGrammarElement> Elements { get; }
+```
+
+#### Property Value
+
+[IReadOnlyList&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+## Constructors
+
+### **GrammarRule(String, IReadOnlyList&lt;LLamaGrammarElement&gt;)**
+
+Create a new GrammarRule containing the given elements
+
+```csharp
+public GrammarRule(string name, IReadOnlyList<LLamaGrammarElement> elements)
+```
+
+#### Parameters
+
+`name` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`elements` [IReadOnlyList&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+
+#### Exceptions
+
+[ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
+
+## Methods
+
+### **ToString()**
+
+```csharp
+public string ToString()
+```
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **GetHashCode()**
+
+```csharp
+public int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **Equals(Object)**
+
+```csharp
+public bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(GrammarRule)**
+
+```csharp
+public bool Equals(GrammarRule other)
+```
+
+#### Parameters
+
+`other` [GrammarRule](./llama.grammars.grammarrule.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **&lt;Clone&gt;$()**
+
+```csharp
+public GrammarRule <Clone>$()
+```
+
+#### Returns
+
+[GrammarRule](./llama.grammars.grammarrule.md)<br>
diff --git a/docs/xmldocs/llama.instructexecutor.md b/docs/xmldocs/llama.instructexecutor.md
index 3d10bbd6..95a018eb 100644
--- a/docs/xmldocs/llama.instructexecutor.md
+++ b/docs/xmldocs/llama.instructexecutor.md
@@ -13,31 +13,31 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)
 
 ## Properties
 
-### **Model**
+### **Context**
 
-The mode used by the executor.
+The context used by the executor.
 
 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; }
 ```
 
 #### Property Value
 
-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>
 
 ## Constructors
 
-### **InstructExecutor(LLamaModel, String, String)**
+### **InstructExecutor(LLamaContext, String, String)**
 
 
 
 ```csharp
-public InstructExecutor(LLamaModel model, string instructionPrefix, string instructionSuffix)
+public InstructExecutor(LLamaContext context, string instructionPrefix, string instructionSuffix)
 ```
 
 #### Parameters
 
-`model` [LLamaModel](./llama.llamamodel.md)<br>
+`context` [LLamaContext](./llama.llamacontext.md)<br>
 
 `instructionPrefix` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
@@ -111,15 +111,15 @@ protected void PreprocessInputs(string text, InferStateArgs args)
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
 
-### **PostProcess(InferenceParams, InferStateArgs, IEnumerable`1&)**
+### **PostProcess(IInferenceParams, InferStateArgs, IEnumerable`1&)**
 
 ```csharp
-protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
+protected bool PostProcess(IInferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
 ```
 
 #### Parameters
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
 
@@ -129,14 +129,14 @@ protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args,
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **InferInternal(InferenceParams, InferStateArgs)**
+### **InferInternal(IInferenceParams, InferStateArgs)**
 
 ```csharp
-protected void InferInternal(InferenceParams inferenceParams, InferStateArgs args)
+protected void InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
 ```
 
 #### Parameters
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
diff --git a/docs/xmldocs/llama.interactiveexecutor.md b/docs/xmldocs/llama.interactiveexecutor.md
index b8953138..38134c40 100644
--- a/docs/xmldocs/llama.interactiveexecutor.md
+++ b/docs/xmldocs/llama.interactiveexecutor.md
@@ -13,31 +13,31 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)
 
 ## Properties
 
-### **Model**
+### **Context**
 
-The mode used by the executor.
+The context used by the executor.
 
 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; }
 ```
 
 #### Property Value
 
-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>
 
 ## Constructors
 
-### **InteractiveExecutor(LLamaModel)**
+### **InteractiveExecutor(LLamaContext)**
 
 
 
 ```csharp
-public InteractiveExecutor(LLamaModel model)
+public InteractiveExecutor(LLamaContext context)
 ```
 
 #### Parameters
 
-`model` [LLamaModel](./llama.llamamodel.md)<br>
+`context` [LLamaContext](./llama.llamacontext.md)<br>
 
 ## Methods
 
@@ -109,17 +109,17 @@ protected void PreprocessInputs(string text, InferStateArgs args)
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
 
-### **PostProcess(InferenceParams, InferStateArgs, IEnumerable`1&)**
+### **PostProcess(IInferenceParams, InferStateArgs, IEnumerable`1&)**
 
 Return whether to break the generation.
 
 ```csharp
-protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
+protected bool PostProcess(IInferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
 ```
 
 #### Parameters
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
 
@@ -129,14 +129,14 @@ protected bool PostProcess(InferenceParams inferenceParams, InferStateArgs args,
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **InferInternal(InferenceParams, InferStateArgs)**
+### **InferInternal(IInferenceParams, InferStateArgs)**
 
 ```csharp
-protected void InferInternal(InferenceParams inferenceParams, InferStateArgs args)
+protected void InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
 ```
 
 #### Parameters
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
diff --git a/docs/xmldocs/llama.llamamodel.md b/docs/xmldocs/llama.llamacontext.md
similarity index 53%
rename from docs/xmldocs/llama.llamamodel.md
rename to docs/xmldocs/llama.llamacontext.md
index 4e54b371..59494aee 100644
--- a/docs/xmldocs/llama.llamamodel.md
+++ b/docs/xmldocs/llama.llamacontext.md
@@ -1,21 +1,33 @@
-# LLamaModel
+# LLamaContext
 
 Namespace: LLama
 
-The abstraction of a LLama model, which holds the context in the native library.
+A llama_context, which holds all the context required to interact with a model
 
 ```csharp
-public class LLamaModel : System.IDisposable
+public sealed class LLamaContext : System.IDisposable
 ```
 
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaModel](./llama.llamamodel.md)<br>
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaContext](./llama.llamacontext.md)<br>
 Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
 
 ## Properties
 
+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **ContextSize**
 
-The context size.
+Total number of tokens in the context
 
 ```csharp
 public int ContextSize { get; }
@@ -25,22 +37,33 @@ public int ContextSize { get; }
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **Params**
 
 The model params set for this model.
 
 ```csharp
-public ModelParams Params { get; set; }
+public IModelParams Params { get; set; }
 ```
 
 #### Property Value
 
-[ModelParams](./llama.common.modelparams.md)<br>
+[IModelParams](./llama.abstractions.imodelparams.md)<br>
 
 ### **NativeHandle**
 
-The native handle, which is used to be passed to the native APIs. Please avoid using it 
- unless you know what is the usage of the Native API.
+The native handle, which is used to be passed to the native APIs
 
 ```csharp
 public SafeLLamaContextHandle NativeHandle { get; }
@@ -50,6 +73,10 @@ public SafeLLamaContextHandle NativeHandle { get; }
 
 [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
+**Remarks:**
+
+Be careful how you use this!
+
 ### **Encoding**
 
 The encoding set for this model to deal with text input.
@@ -62,35 +89,82 @@ public Encoding Encoding { get; }
 
 [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
 
+### **EmbeddingLength**
+
+The embedding length of the model, also known as `n_embed`
+
+```csharp
+public int EmbeddingLength { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ## Constructors
 
-### **LLamaModel(ModelParams, String, ILLamaLogger)**
+### **LLamaContext(IModelParams, ILLamaLogger)**
+
+#### Caution
+
+Use the LLamaWeights.CreateContext instead
+
+---
 
 
 
 ```csharp
-public LLamaModel(ModelParams Params, string encoding, ILLamaLogger logger)
+public LLamaContext(IModelParams params, ILLamaLogger logger)
 ```
 
 #### Parameters
 
-`Params` [ModelParams](./llama.common.modelparams.md)<br>
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
 Model params.
 
-`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-Encoding to deal with text input.
-
 `logger` [ILLamaLogger](./llama.common.illamalogger.md)<br>
 The logger.
 
+### **LLamaContext(LLamaWeights, IModelParams, ILLamaLogger)**
+
+Create a new LLamaContext for the given LLamaWeights
+
+```csharp
+public LLamaContext(LLamaWeights model, IModelParams params, ILLamaLogger logger)
+```
+
+#### Parameters
+
+`model` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+`logger` [ILLamaLogger](./llama.common.illamalogger.md)<br>
+
+#### Exceptions
+
+[ObjectDisposedException](https://docs.microsoft.com/en-us/dotnet/api/system.objectdisposedexception)<br>
+
 ## Methods
 
+### **Clone()**
+
+Create a copy of the current state of this context
+
+```csharp
+public LLamaContext Clone()
+```
+
+#### Returns
+
+[LLamaContext](./llama.llamacontext.md)<br>
+
 ### **Tokenize(String, Boolean)**
 
 Tokenize a string.
 
 ```csharp
-public IEnumerable<int> Tokenize(string text, bool addBos)
+public Int32[] Tokenize(string text, bool addBos)
 ```
 
 #### Parameters
@@ -102,7 +176,7 @@ Whether to add a bos to the text.
 
 #### Returns
 
-[IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+[Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
 ### **DeTokenize(IEnumerable&lt;Int32&gt;)**
 
@@ -134,6 +208,12 @@ public void SaveState(string filename)
 
 ### **GetStateData()**
 
+#### Caution
+
+Use `GetState` instead, this supports larger states (over 2GB)
+
+---
+
 Get the state data as a byte array.
 
 ```csharp
@@ -144,6 +224,18 @@ public Byte[] GetStateData()
 
 [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
 
+### **GetState()**
+
+Get the state data as an opaque handle
+
+```csharp
+public State GetState()
+```
+
+#### Returns
+
+[State](./llama.llamacontext.state.md)<br>
+
 ### **LoadState(String)**
 
 Load the state from specified path.
@@ -176,21 +268,39 @@ public void LoadState(Byte[] stateData)
 
 [RuntimeError](./llama.exceptions.runtimeerror.md)<br>
 
-### **Sample(LLamaTokenDataArray, Single, MiroStateType, Single, Single, Int32, Single, Single, Single)**
+### **LoadState(State)**
+
+Load the state from memory.
+
+```csharp
+public void LoadState(State state)
+```
+
+#### Parameters
+
+`state` [State](./llama.llamacontext.state.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Sample(LLamaTokenDataArray, Nullable`1&, Single, MirostatType, Single, Single, Int32, Single, Single, Single, SafeLLamaGrammarHandle)**
 
 Perform the sampling. Please don't use it unless you fully know what it does.
 
 ```csharp
-public int Sample(LLamaTokenDataArray candidates, float temperature, MiroStateType mirostat, float mirostatTau, float mirostatEta, int topK, float topP, float tfsZ, float typicalP)
+public int Sample(LLamaTokenDataArray candidates, Nullable`1& mirostat_mu, float temperature, MirostatType mirostat, float mirostatTau, float mirostatEta, int topK, float topP, float tfsZ, float typicalP, SafeLLamaGrammarHandle grammar)
 ```
 
 #### Parameters
 
 `candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
 
+`mirostat_mu` [Nullable`1&](https://docs.microsoft.com/en-us/dotnet/api/system.nullable-1&)<br>
+
 `temperature` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-`mirostat` [MiroStateType](./llama.common.mirostatetype.md)<br>
+`mirostat` [MirostatType](./llama.common.mirostattype.md)<br>
 
 `mirostatTau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
@@ -204,6 +314,8 @@ public int Sample(LLamaTokenDataArray candidates, float temperature, MiroStateTy
 
 `typicalP` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
+`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
 #### Returns
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
@@ -259,6 +371,75 @@ The updated `pastTokensCount`.
 
 [RuntimeError](./llama.exceptions.runtimeerror.md)<br>
 
+### **Eval(List&lt;Int32&gt;, Int32)**
+
+
+
+```csharp
+public int Eval(List<int> tokens, int pastTokensCount)
+```
+
+#### Parameters
+
+`tokens` [List&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1)<br>
+
+`pastTokensCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The updated `pastTokensCount`.
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Eval(ReadOnlyMemory&lt;Int32&gt;, Int32)**
+
+
+
+```csharp
+public int Eval(ReadOnlyMemory<int> tokens, int pastTokensCount)
+```
+
+#### Parameters
+
+`tokens` [ReadOnlyMemory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlymemory-1)<br>
+
+`pastTokensCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The updated `pastTokensCount`.
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Eval(ReadOnlySpan&lt;Int32&gt;, Int32)**
+
+
+
+```csharp
+public int Eval(ReadOnlySpan<int> tokens, int pastTokensCount)
+```
+
+#### Parameters
+
+`tokens` [ReadOnlySpan&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+
+`pastTokensCount` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The updated `pastTokensCount`.
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
 ### **GenerateResult(IEnumerable&lt;Int32&gt;)**
 
 ```csharp
@@ -273,9 +454,23 @@ internal IEnumerable<string> GenerateResult(IEnumerable<int> ids)
 
 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
 
-### **Dispose()**
+### **TokenToString(Int32)**
 
+Convert a token into a string
 
+```csharp
+public string TokenToString(int token)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **Dispose()**
 
 ```csharp
 public void Dispose()
diff --git a/docs/xmldocs/llama.llamaembedder.md b/docs/xmldocs/llama.llamaembedder.md
index 60c36b63..77057207 100644
--- a/docs/xmldocs/llama.llamaembedder.md
+++ b/docs/xmldocs/llama.llamaembedder.md
@@ -5,30 +5,62 @@ Namespace: LLama
 The embedder for LLama, which supports getting embeddings from text.
 
 ```csharp
-public class LLamaEmbedder : System.IDisposable
+public sealed class LLamaEmbedder : System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaEmbedder](./llama.llamaembedder.md)<br>
 Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
 
+## Properties
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ## Constructors
 
-### **LLamaEmbedder(ModelParams)**
+### **LLamaEmbedder(IModelParams)**
+
+
+
+```csharp
+public LLamaEmbedder(IModelParams params)
+```
+
+#### Parameters
 
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
 
+### **LLamaEmbedder(LLamaWeights, IModelParams)**
 
 ```csharp
-public LLamaEmbedder(ModelParams params)
+public LLamaEmbedder(LLamaWeights weights, IModelParams params)
 ```
 
 #### Parameters
 
-`params` [ModelParams](./llama.common.modelparams.md)<br>
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
 
 ## Methods
 
 ### **GetEmbeddings(String, Int32, Boolean, String)**
 
+#### Caution
+
+'threads' and 'encoding' parameters are no longer used
+
+---
+
 Get the embeddings of the text.
 
 ```csharp
@@ -40,12 +72,56 @@ public Single[] GetEmbeddings(string text, int threads, bool addBos, string enco
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
 `threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-Threads used for inference.
+unused
 
 `addBos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 Add bos to the text.
 
 `encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+unused
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **GetEmbeddings(String)**
+
+Get the embeddings of the text.
+
+```csharp
+public Single[] GetEmbeddings(string text)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Single[]](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **GetEmbeddings(String, Boolean)**
+
+Get the embeddings of the text.
+
+```csharp
+public Single[] GetEmbeddings(string text, bool addBos)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`addBos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Add bos to the text.
 
 #### Returns
 
diff --git a/docs/xmldocs/llama.llamaquantizer.md b/docs/xmldocs/llama.llamaquantizer.md
index ce0349bb..977185d7 100644
--- a/docs/xmldocs/llama.llamaquantizer.md
+++ b/docs/xmldocs/llama.llamaquantizer.md
@@ -12,12 +12,12 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 
 ## Methods
 
-### **Quantize(String, String, LLamaFtype, Int32)**
+### **Quantize(String, String, LLamaFtype, Int32, Boolean, Boolean)**
 
 Quantize the model.
 
 ```csharp
-public static bool Quantize(string srcFileName, string dstFilename, LLamaFtype ftype, int nthread)
+public static bool Quantize(string srcFileName, string dstFilename, LLamaFtype ftype, int nthread, bool allowRequantize, bool quantizeOutputTensor)
 ```
 
 #### Parameters
@@ -34,6 +34,10 @@ The type of quantization.
 `nthread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 Thread to be used during the quantization. By default it's the physical core number.
 
+`allowRequantize` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`quantizeOutputTensor` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
@@ -43,12 +47,12 @@ Whether the quantization is successful.
 
 [ArgumentException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentexception)<br>
 
-### **Quantize(String, String, String, Int32)**
+### **Quantize(String, String, String, Int32, Boolean, Boolean)**
 
 Quantize the model.
 
 ```csharp
-public static bool Quantize(string srcFileName, string dstFilename, string ftype, int nthread)
+public static bool Quantize(string srcFileName, string dstFilename, string ftype, int nthread, bool allowRequantize, bool quantizeOutputTensor)
 ```
 
 #### Parameters
@@ -65,6 +69,10 @@ The type of quantization.
 `nthread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 Thread to be used during the quantization. By default it's the physical core number.
 
+`allowRequantize` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`quantizeOutputTensor` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.llamaweights.md b/docs/xmldocs/llama.llamaweights.md
new file mode 100644
index 00000000..3b448c62
--- /dev/null
+++ b/docs/xmldocs/llama.llamaweights.md
@@ -0,0 +1,118 @@
+# LLamaWeights
+
+Namespace: LLama
+
+A set of model weights, loaded into memory.
+
+```csharp
+public sealed class LLamaWeights : System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaWeights](./llama.llamaweights.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+
+## Properties
+
+### **NativeHandle**
+
+The native handle, which is used in the native APIs
+
+```csharp
+public SafeLlamaModelHandle NativeHandle { get; }
+```
+
+#### Property Value
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+**Remarks:**
+
+Be careful how you use this!
+
+### **Encoding**
+
+Encoding to use to convert text into bytes for the model
+
+```csharp
+public Encoding Encoding { get; }
+```
+
+#### Property Value
+
+[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public int ContextSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+## Methods
+
+### **LoadFromFile(IModelParams)**
+
+Load weights into memory
+
+```csharp
+public static LLamaWeights LoadFromFile(IModelParams params)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+#### Returns
+
+[LLamaWeights](./llama.llamaweights.md)<br>
+
+### **Dispose()**
+
+```csharp
+public void Dispose()
+```
+
+### **CreateContext(IModelParams)**
+
+Create a llama_context using this model
+
+```csharp
+public LLamaContext CreateContext(IModelParams params)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+#### Returns
+
+[LLamaContext](./llama.llamacontext.md)<br>
diff --git a/docs/xmldocs/llama.native.llamacontextparams.md b/docs/xmldocs/llama.native.llamacontextparams.md
index e47b2bb6..0b9ba61e 100644
--- a/docs/xmldocs/llama.native.llamacontextparams.md
+++ b/docs/xmldocs/llama.native.llamacontextparams.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Native
 
+A C# representation of the llama.cpp `llama_context_params` struct
+
 ```csharp
 public struct LLamaContextParams
 ```
@@ -10,6 +12,14 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 
 ## Fields
 
+### **seed**
+
+RNG seed, -1 for random
+
+```csharp
+public int seed;
+```
+
 ### **n_ctx**
 
 text context
@@ -18,6 +28,14 @@ text context
 public int n_ctx;
 ```
 
+### **n_batch**
+
+prompt processing batch size
+
+```csharp
+public int n_batch;
+```
+
 ### **n_gpu_layers**
 
 number of layers to store in VRAM
@@ -26,74 +44,150 @@ number of layers to store in VRAM
 public int n_gpu_layers;
 ```
 
-### **seed**
+### **main_gpu**
 
-RNG seed, -1 for random
+the GPU that is used for scratch and small tensors
 
 ```csharp
-public int seed;
+public int main_gpu;
+```
+
+### **tensor_split**
+
+how to split layers across multiple GPUs
+
+```csharp
+public IntPtr tensor_split;
 ```
 
+### **rope_freq_base**
+
+ref: https://github.com/ggerganov/llama.cpp/pull/2054
+ RoPE base frequency
+
+```csharp
+public float rope_freq_base;
+```
+
+### **rope_freq_scale**
+
+ref: https://github.com/ggerganov/llama.cpp/pull/2054
+ RoPE frequency scaling factor
+
+```csharp
+public float rope_freq_scale;
+```
+
+### **progress_callback**
+
+called with a progress value between 0 and 1, pass NULL to disable
+
+```csharp
+public IntPtr progress_callback;
+```
+
+### **progress_callback_user_data**
+
+context pointer passed to the progress callback
+
+```csharp
+public IntPtr progress_callback_user_data;
+```
+
+## Properties
+
+### **low_vram**
+
+if true, reduce VRAM usage at the cost of performance
+
+```csharp
+public bool low_vram { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **mul_mat_q**
+
+if true, use experimental mul_mat_q kernels
+
+```csharp
+public bool mul_mat_q { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **f16_kv**
 
 use fp16 for KV cache
 
 ```csharp
-public bool f16_kv;
+public bool f16_kv { get; set; }
 ```
 
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **logits_all**
 
 the llama_eval() call computes all logits, not just the last one
 
 ```csharp
-public bool logits_all;
+public bool logits_all { get; set; }
 ```
 
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **vocab_only**
 
 only load the vocabulary, no weights
 
 ```csharp
-public bool vocab_only;
+public bool vocab_only { get; set; }
 ```
 
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **use_mmap**
 
 use mmap if possible
 
 ```csharp
-public bool use_mmap;
+public bool use_mmap { get; set; }
 ```
 
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **use_mlock**
 
 force system to keep model in RAM
 
 ```csharp
-public bool use_mlock;
+public bool use_mlock { get; set; }
 ```
 
-### **embedding**
+#### Property Value
 
-embedding mode only
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-```csharp
-public bool embedding;
-```
-
-### **progress_callback**
+### **embedding**
 
-called with a progress value between 0 and 1, pass NULL to disable
+embedding mode only
 
 ```csharp
-public IntPtr progress_callback;
+public bool embedding { get; set; }
 ```
 
-### **progress_callback_user_data**
-
-context pointer passed to the progress callback
+#### Property Value
 
-```csharp
-public IntPtr progress_callback_user_data;
-```
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.native.llamaftype.md b/docs/xmldocs/llama.native.llamaftype.md
index 2c76c9e1..7b98173d 100644
--- a/docs/xmldocs/llama.native.llamaftype.md
+++ b/docs/xmldocs/llama.native.llamaftype.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Native
 
+Supported model file types
+
 ```csharp
 public enum LLamaFtype
 ```
@@ -13,3 +15,21 @@ Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icom
 
 | Name | Value | Description |
 | --- | --: | --- |
+| LLAMA_FTYPE_ALL_F32 | 0 | All f32 |
+| LLAMA_FTYPE_MOSTLY_F16 | 1 | Mostly f16 |
+| LLAMA_FTYPE_MOSTLY_Q8_0 | 7 | Mostly 8 bit |
+| LLAMA_FTYPE_MOSTLY_Q4_0 | 2 | Mostly 4 bit |
+| LLAMA_FTYPE_MOSTLY_Q4_1 | 3 | Mostly 4 bit |
+| LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16 | 4 | Mostly 4 bit, tok_embeddings.weight and output.weight are f16 |
+| LLAMA_FTYPE_MOSTLY_Q5_0 | 8 | Mostly 5 bit |
+| LLAMA_FTYPE_MOSTLY_Q5_1 | 9 | Mostly 5 bit |
+| LLAMA_FTYPE_MOSTLY_Q2_K | 10 | K-Quant 2 bit |
+| LLAMA_FTYPE_MOSTLY_Q3_K_S | 11 | K-Quant 3 bit (Small) |
+| LLAMA_FTYPE_MOSTLY_Q3_K_M | 12 | K-Quant 3 bit (Medium) |
+| LLAMA_FTYPE_MOSTLY_Q3_K_L | 13 | K-Quant 3 bit (Large) |
+| LLAMA_FTYPE_MOSTLY_Q4_K_S | 14 | K-Quant 4 bit (Small) |
+| LLAMA_FTYPE_MOSTLY_Q4_K_M | 15 | K-Quant 4 bit (Medium) |
+| LLAMA_FTYPE_MOSTLY_Q5_K_S | 16 | K-Quant 5 bit (Small) |
+| LLAMA_FTYPE_MOSTLY_Q5_K_M | 17 | K-Quant 5 bit (Medium) |
+| LLAMA_FTYPE_MOSTLY_Q6_K | 18 | K-Quant 6 bit |
+| LLAMA_FTYPE_GUESSED | 1024 | File type was not specified |
diff --git a/docs/xmldocs/llama.native.llamagrammarelement.md b/docs/xmldocs/llama.native.llamagrammarelement.md
new file mode 100644
index 00000000..c836c3cf
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamagrammarelement.md
@@ -0,0 +1,96 @@
+# LLamaGrammarElement
+
+Namespace: LLama.Native
+
+An element of a grammar
+
+```csharp
+public struct LLamaGrammarElement
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaGrammarElement](./llama.native.llamagrammarelement.md)<br>
+Implements [IEquatable&lt;LLamaGrammarElement&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.iequatable-1)
+
+## Fields
+
+### **Type**
+
+The type of this element
+
+```csharp
+public LLamaGrammarElementType Type;
+```
+
+### **Value**
+
+Unicode code point or rule ID
+
+```csharp
+public uint Value;
+```
+
+## Constructors
+
+### **LLamaGrammarElement(LLamaGrammarElementType, UInt32)**
+
+Construct a new LLamaGrammarElement
+
+```csharp
+LLamaGrammarElement(LLamaGrammarElementType type, uint value)
+```
+
+#### Parameters
+
+`type` [LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)<br>
+
+`value` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
+
+## Methods
+
+### **Equals(LLamaGrammarElement)**
+
+```csharp
+bool Equals(LLamaGrammarElement other)
+```
+
+#### Parameters
+
+`other` [LLamaGrammarElement](./llama.native.llamagrammarelement.md)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Equals(Object)**
+
+```csharp
+bool Equals(object obj)
+```
+
+#### Parameters
+
+`obj` [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **GetHashCode()**
+
+```csharp
+int GetHashCode()
+```
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **IsCharElement()**
+
+```csharp
+bool IsCharElement()
+```
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.native.llamagrammarelementtype.md b/docs/xmldocs/llama.native.llamagrammarelementtype.md
new file mode 100644
index 00000000..bf69e5a7
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamagrammarelementtype.md
@@ -0,0 +1,24 @@
+# LLamaGrammarElementType
+
+Namespace: LLama.Native
+
+grammar element type
+
+```csharp
+public enum LLamaGrammarElementType
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [Enum](https://docs.microsoft.com/en-us/dotnet/api/system.enum) → [LLamaGrammarElementType](./llama.native.llamagrammarelementtype.md)<br>
+Implements [IComparable](https://docs.microsoft.com/en-us/dotnet/api/system.icomparable), [IFormattable](https://docs.microsoft.com/en-us/dotnet/api/system.iformattable), [IConvertible](https://docs.microsoft.com/en-us/dotnet/api/system.iconvertible)
+
+## Fields
+
+| Name | Value | Description |
+| --- | --: | --- |
+| END | 0 | end of rule definition |
+| ALT | 1 | start of alternate definition for rule |
+| RULE_REF | 2 | non-terminal element: reference to rule |
+| CHAR | 3 | terminal element: character (code point) |
+| CHAR_NOT | 4 | inverse char(s) ([^a], [^a-b] [^abc]) |
+| CHAR_RNG_UPPER | 5 | modifies a preceding CHAR or CHAR_ALT to be an inclusive range ([a-z]) |
+| CHAR_ALT | 6 | modifies a preceding CHAR or CHAR_RNG_UPPER to add an alternate char to match ([ab], [a-zA]) |
diff --git a/docs/xmldocs/llama.native.llamamodelquantizeparams.md b/docs/xmldocs/llama.native.llamamodelquantizeparams.md
new file mode 100644
index 00000000..03d6f630
--- /dev/null
+++ b/docs/xmldocs/llama.native.llamamodelquantizeparams.md
@@ -0,0 +1,55 @@
+# LLamaModelQuantizeParams
+
+Namespace: LLama.Native
+
+Quantizer parameters used in the native API
+
+```csharp
+public struct LLamaModelQuantizeParams
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ValueType](https://docs.microsoft.com/en-us/dotnet/api/system.valuetype) → [LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)
+
+## Fields
+
+### **nthread**
+
+number of threads to use for quantizing, if &lt;=0 will use std::thread::hardware_concurrency()
+
+```csharp
+public int nthread;
+```
+
+### **ftype**
+
+quantize to this llama_ftype
+
+```csharp
+public LLamaFtype ftype;
+```
+
+## Properties
+
+### **allow_requantize**
+
+allow quantizing non-f32/f16 tensors
+
+```csharp
+public bool allow_requantize { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **quantize_output_tensor**
+
+quantize output.weight
+
+```csharp
+public bool quantize_output_tensor { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
diff --git a/docs/xmldocs/llama.native.llamatokendataarray.md b/docs/xmldocs/llama.native.llamatokendataarray.md
index e9a05e53..b5ba8e5a 100644
--- a/docs/xmldocs/llama.native.llamatokendataarray.md
+++ b/docs/xmldocs/llama.native.llamatokendataarray.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Native
 
+Contains an array of LLamaTokenData, potentially sorted.
+
 ```csharp
 public struct LLamaTokenDataArray
 ```
@@ -12,34 +14,50 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 
 ### **data**
 
+The LLamaTokenData
+
 ```csharp
 public Memory<LLamaTokenData> data;
 ```
 
-### **size**
+### **sorted**
+
+Indicates if `data` is sorted by logits in descending order. If this is false the token data is in _no particular order_.
 
 ```csharp
-public ulong size;
+public bool sorted;
 ```
 
-### **sorted**
+## Constructors
+
+### **LLamaTokenDataArray(Memory&lt;LLamaTokenData&gt;, Boolean)**
+
+Create a new LLamaTokenDataArray
 
 ```csharp
-public bool sorted;
+LLamaTokenDataArray(Memory<LLamaTokenData> tokens, bool isSorted)
 ```
 
-## Constructors
+#### Parameters
+
+`tokens` [Memory&lt;LLamaTokenData&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`isSorted` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Create(ReadOnlySpan&lt;Single&gt;)**
 
-### **LLamaTokenDataArray(LLamaTokenData[], UInt64, Boolean)**
+Create a new LLamaTokenDataArray, copying the data from the given logits
 
 ```csharp
-LLamaTokenDataArray(LLamaTokenData[] data, ulong size, bool sorted)
+LLamaTokenDataArray Create(ReadOnlySpan<float> logits)
 ```
 
 #### Parameters
 
-`data` [LLamaTokenData[]](./llama.native.llamatokendata.md)<br>
+`logits` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
 
-`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+#### Returns
 
-`sorted` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+[LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
diff --git a/docs/xmldocs/llama.native.llamatokendataarraynative.md b/docs/xmldocs/llama.native.llamatokendataarraynative.md
index 1838d3a5..8a557cf2 100644
--- a/docs/xmldocs/llama.native.llamatokendataarraynative.md
+++ b/docs/xmldocs/llama.native.llamatokendataarraynative.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Native
 
+Contains a pointer to an array of LLamaTokenData which is pinned in memory.
+
 ```csharp
 public struct LLamaTokenDataArrayNative
 ```
@@ -12,18 +14,57 @@ Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object)
 
 ### **data**
 
+A pointer to an array of LlamaTokenData
+
 ```csharp
 public IntPtr data;
 ```
 
+**Remarks:**
+
+Memory must be pinned in place for all the time this LLamaTokenDataArrayNative is in use
+
 ### **size**
 
+Number of LLamaTokenData in the array
+
 ```csharp
 public ulong size;
 ```
 
+## Properties
+
 ### **sorted**
 
+Indicates if the items in the array are sorted
+
+```csharp
+public bool sorted { get; set; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **Create(LLamaTokenDataArray, LLamaTokenDataArrayNative&)**
+
+Create a new LLamaTokenDataArrayNative around the data in the LLamaTokenDataArray
+
 ```csharp
-public bool sorted;
+MemoryHandle Create(LLamaTokenDataArray array, LLamaTokenDataArrayNative& native)
 ```
+
+#### Parameters
+
+`array` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Data source
+
+`native` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Created native array
+
+#### Returns
+
+[MemoryHandle](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memoryhandle)<br>
+A memory handle, pinning the data in place until disposed
diff --git a/docs/xmldocs/llama.native.nativeapi.md b/docs/xmldocs/llama.native.nativeapi.md
index 787529da..764a9ff8 100644
--- a/docs/xmldocs/llama.native.nativeapi.md
+++ b/docs/xmldocs/llama.native.nativeapi.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Native
 
+Direct translation of the llama.cpp API
+
 ```csharp
 public class NativeApi
 ```
@@ -18,8 +20,174 @@ public NativeApi()
 
 ## Methods
 
+### **llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Int32, Single&)**
+
+Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public static int llama_sample_token_mirostat(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, int m, Single& mu)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
+
+`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
+Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single&)**
+
+Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public static int llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, Single& mu)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
+Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
+
+Selects the token with the highest probability.
+
+```csharp
+public static int llama_sample_token_greedy(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Pointer to LLamaTokenDataArray
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
+
+Randomly selects a token from the candidates based on their probabilities.
+
+```csharp
+public static int llama_sample_token(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Pointer to LLamaTokenDataArray
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_token_to_str(SafeLLamaContextHandle, Int32)**
+
+Token Id -&gt; String. Uses the vocabulary in the provided context
+
+```csharp
+public static IntPtr llama_token_to_str(SafeLLamaContextHandle ctx, int token)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Pointer to a string.
+
+### **llama_token_bos(SafeLLamaContextHandle)**
+
+Get the "Beginning of sentence" token
+
+```csharp
+public static int llama_token_bos(SafeLLamaContextHandle ctx)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_token_eos(SafeLLamaContextHandle)**
+
+Get the "End of sentence" token
+
+```csharp
+public static int llama_token_eos(SafeLLamaContextHandle ctx)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_token_nl(SafeLLamaContextHandle)**
+
+Get the "new line" token
+
+```csharp
+public static int llama_token_nl(SafeLLamaContextHandle ctx)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
 ### **llama_print_timings(SafeLLamaContextHandle)**
 
+Print out timing information for this context
+
 ```csharp
 public static void llama_print_timings(SafeLLamaContextHandle ctx)
 ```
@@ -30,6 +198,8 @@ public static void llama_print_timings(SafeLLamaContextHandle ctx)
 
 ### **llama_reset_timings(SafeLLamaContextHandle)**
 
+Reset all collected timing information for this context
+
 ```csharp
 public static void llama_reset_timings(SafeLLamaContextHandle ctx)
 ```
@@ -50,274 +220,403 @@ public static IntPtr llama_print_system_info()
 
 [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
 
-### **llama_model_quantize(String, String, LLamaFtype, Int32)**
+### **llama_model_n_vocab(SafeLlamaModelHandle)**
+
+Get the number of tokens in the model vocabulary
 
 ```csharp
-public static int llama_model_quantize(string fname_inp, string fname_out, LLamaFtype ftype, int nthread)
+public static int llama_model_n_vocab(SafeLlamaModelHandle model)
 ```
 
 #### Parameters
 
-`fname_inp` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
-`fname_out` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+#### Returns
 
-`ftype` [LLamaFtype](./llama.native.llamaftype.md)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_model_n_ctx(SafeLlamaModelHandle)**
+
+Get the size of the context window for the model
+
+```csharp
+public static int llama_model_n_ctx(SafeLlamaModelHandle model)
+```
+
+#### Parameters
 
-`nthread` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
 #### Returns
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-### **llama_sample_repetition_penalty(SafeLLamaContextHandle, IntPtr, Int32[], UInt64, Single)**
+### **llama_model_n_embd(SafeLlamaModelHandle)**
 
-Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
+Get the dimension of embedding vectors from this model
 
 ```csharp
-public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, IntPtr candidates, Int32[] last_tokens, ulong last_tokens_size, float penalty)
+public static int llama_model_n_embd(SafeLlamaModelHandle model)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Pointer to LLamaTokenDataArray
+#### Returns
 
-`last_tokens` [Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+### **llama_token_to_piece_with_model(SafeLlamaModelHandle, Int32, Byte*, Int32)**
 
-`penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+Convert a single token into text
 
-### **llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, IntPtr, Int32[], UInt64, Single, Single)**
+```csharp
+public static int llama_token_to_piece_with_model(SafeLlamaModelHandle model, int llamaToken, Byte* buffer, int length)
+```
 
-Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+`llamaToken` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`buffer` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+buffer to write string into
+
+`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+size of the buffer
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The length writte, or if the buffer is too small a negative that indicates the length required
+
+### **llama_tokenize_with_model(SafeLlamaModelHandle, Byte*, Int32*, Int32, Boolean)**
+
+Convert text into tokens
 
 ```csharp
-public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, IntPtr candidates, Int32[] last_tokens, ulong last_tokens_size, float alpha_frequency, float alpha_presence)
+public static int llama_tokenize_with_model(SafeLlamaModelHandle model, Byte* text, Int32* tokens, int n_max_tokens, bool add_bos)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Pointer to LLamaTokenDataArray
+`text` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
 
-`last_tokens` [Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`tokens` [Int32*](https://docs.microsoft.com/en-us/dotnet/api/system.int32*)<br>
 
-`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`n_max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`alpha_frequency` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-`alpha_presence` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+#### Returns
 
-### **llama_sample_softmax(SafeLLamaContextHandle, IntPtr)**
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Returns the number of tokens on success, no more than n_max_tokens.
+ Returns a negative number on failure - the number of tokens that would have been returned
 
-Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
+### **llama_log_set(LLamaLogCallback)**
+
+Register a callback to receive llama log messages
+
+```csharp
+public static void llama_log_set(LLamaLogCallback logCallback)
+```
+
+#### Parameters
+
+`logCallback` [LLamaLogCallback](./llama.native.llamalogcallback.md)<br>
+
+### **llama_grammar_init(LLamaGrammarElement**, UInt64, UInt64)**
+
+Create a new grammar from the given set of grammar rules
 
 ```csharp
-public static void llama_sample_softmax(SafeLLamaContextHandle ctx, IntPtr candidates)
+public static IntPtr llama_grammar_init(LLamaGrammarElement** rules, ulong n_rules, ulong start_rule_index)
+```
+
+#### Parameters
+
+`rules` [LLamaGrammarElement**](./llama.native.llamagrammarelement**.md)<br>
+
+`n_rules` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+#### Returns
+
+[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+### **llama_grammar_free(IntPtr)**
+
+Free all memory from the given SafeLLamaGrammarHandle
+
+```csharp
+public static void llama_grammar_free(IntPtr grammar)
+```
+
+#### Parameters
+
+`grammar` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+### **llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, SafeLLamaGrammarHandle)**
+
+Apply constraints from grammar
+
+```csharp
+public static void llama_sample_grammar(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, SafeLLamaGrammarHandle grammar)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Pointer to LLamaTokenDataArray
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
 
-### **llama_sample_top_k(SafeLLamaContextHandle, IntPtr, Int32, UInt64)**
+`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
 
-Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+### **llama_grammar_accept_token(SafeLLamaContextHandle, SafeLLamaGrammarHandle, Int32)**
+
+Accepts the sampled token into the grammar
 
 ```csharp
-public static void llama_sample_top_k(SafeLLamaContextHandle ctx, IntPtr candidates, int k, ulong min_keep)
+public static void llama_grammar_accept_token(SafeLLamaContextHandle ctx, SafeLLamaGrammarHandle grammar, int token)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Pointer to LLamaTokenDataArray
+`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
 
-`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+### **llama_model_quantize(String, String, LLamaModelQuantizeParams*)**
 
-### **llama_sample_top_p(SafeLLamaContextHandle, IntPtr, Single, UInt64)**
+Returns 0 on success
 
-Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+```csharp
+public static int llama_model_quantize(string fname_inp, string fname_out, LLamaModelQuantizeParams* param)
+```
+
+#### Parameters
+
+`fname_inp` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`fname_out` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`param` [LLamaModelQuantizeParams*](./llama.native.llamamodelquantizeparams*.md)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Returns 0 on success
+
+**Remarks:**
+
+not great API - very likely to change
+
+### **llama_sample_classifier_free_guidance(SafeLLamaContextHandle, LLamaTokenDataArrayNative, SafeLLamaContextHandle, Single)**
+
+Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
 
 ```csharp
-public static void llama_sample_top_p(SafeLLamaContextHandle ctx, IntPtr candidates, float p, ulong min_keep)
+public static void llama_sample_classifier_free_guidance(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative candidates, SafeLLamaContextHandle guidanceCtx, float scale)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Pointer to LLamaTokenDataArray
+`candidates` [LLamaTokenDataArrayNative](./llama.native.llamatokendataarraynative.md)<br>
+A vector of `llama_token_data` containing the candidate tokens, the logits must be directly extracted from the original generation context without being sorted.
 
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+`guidanceCtx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+A separate context from the same model. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
 
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
 
-### **llama_sample_tail_free(SafeLLamaContextHandle, IntPtr, Single, UInt64)**
+### **llama_sample_repetition_penalty(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Int32*, UInt64, Single)**
 
-Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
+Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
 
 ```csharp
-public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, IntPtr candidates, float z, ulong min_keep)
+public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, Int32* last_tokens, ulong last_tokens_size, float penalty)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
 Pointer to LLamaTokenDataArray
 
-`z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+`last_tokens` [Int32*](https://docs.microsoft.com/en-us/dotnet/api/system.int32*)<br>
 
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
 
-### **llama_sample_typical(SafeLLamaContextHandle, IntPtr, Single, UInt64)**
+`penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
+### **llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Int32*, UInt64, Single, Single)**
+
+Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
 
 ```csharp
-public static void llama_sample_typical(SafeLLamaContextHandle ctx, IntPtr candidates, float p, ulong min_keep)
+public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, Int32* last_tokens, ulong last_tokens_size, float alpha_frequency, float alpha_presence)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
 Pointer to LLamaTokenDataArray
 
-`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+`last_tokens` [Int32*](https://docs.microsoft.com/en-us/dotnet/api/system.int32*)<br>
 
-`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+`alpha_frequency` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`alpha_presence` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-### **llama_sample_temperature(SafeLLamaContextHandle, IntPtr, Single)**
+### **llama_sample_classifier_free_guidance(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, SafeLLamaContextHandle, Single)**
+
+Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
 
 ```csharp
-public static void llama_sample_temperature(SafeLLamaContextHandle ctx, IntPtr candidates, float temp)
+public static void llama_sample_classifier_free_guidance(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, SafeLLamaContextHandle guidance_ctx, float scale)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+A vector of `llama_token_data` containing the candidate tokens, the logits must be directly extracted from the original generation context without being sorted.
 
-`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+`guidance_ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+A separate context from the same model. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
 
-### **llama_sample_token_mirostat(SafeLLamaContextHandle, IntPtr, Single, Single, Int32, Single*)**
+`scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
 
-Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+### **llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
+
+Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
 
 ```csharp
-public static int llama_sample_token_mirostat(SafeLLamaContextHandle ctx, IntPtr candidates, float tau, float eta, int m, Single* mu)
+public static void llama_sample_softmax(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Pointer to LLamaTokenDataArray
 
-`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+### **llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Int32, UInt64)**
 
-`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
 
-`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
+```csharp
+public static void llama_sample_top_k(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, int k, ulong min_keep)
+```
 
-`mu` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
-Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+#### Parameters
 
-#### Returns
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Pointer to LLamaTokenDataArray
 
-### **llama_sample_token_mirostat_v2(SafeLLamaContextHandle, IntPtr, Single, Single, Single*)**
+`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
-Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
+
+Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
 
 ```csharp
-public static int llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, IntPtr candidates, float tau, float eta, Single* mu)
+public static void llama_sample_top_p(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Pointer to LLamaTokenDataArray
 
-`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
-The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
 
-`mu` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
-Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+### **llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
 
-#### Returns
+Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+```csharp
+public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float z, ulong min_keep)
+```
 
-### **llama_sample_token_greedy(SafeLLamaContextHandle, IntPtr)**
+#### Parameters
 
-Selects the token with the highest probability.
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
+Pointer to LLamaTokenDataArray
+
+`z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
+
+Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
 
 ```csharp
-public static int llama_sample_token_greedy(SafeLLamaContextHandle ctx, IntPtr candidates)
+public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
 Pointer to LLamaTokenDataArray
 
-#### Returns
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
 
-### **llama_sample_token(SafeLLamaContextHandle, IntPtr)**
+### **llama_sample_temperature(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single)**
 
-Randomly selects a token from the candidates based on their probabilities.
+Modify logits by temperature
 
 ```csharp
-public static int llama_sample_token(SafeLLamaContextHandle ctx, IntPtr candidates)
+public static void llama_sample_temperature(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float temp)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`candidates` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Pointer to LLamaTokenDataArray
-
-#### Returns
+`candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
 
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
 
 ### **llama_empty_call()**
 
+A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
+
 ```csharp
 public static bool llama_empty_call()
 ```
@@ -328,6 +627,8 @@ public static bool llama_empty_call()
 
 ### **llama_context_default_params()**
 
+Create a LLamaContextParams with default values
+
 ```csharp
 public static LLamaContextParams llama_context_default_params()
 ```
@@ -336,8 +637,22 @@ public static LLamaContextParams llama_context_default_params()
 
 [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
 
+### **llama_model_quantize_default_params()**
+
+Create a LLamaModelQuantizeParams with default values
+
+```csharp
+public static LLamaModelQuantizeParams llama_model_quantize_default_params()
+```
+
+#### Returns
+
+[LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)<br>
+
 ### **llama_mmap_supported()**
 
+Check if memory mapping is supported
+
 ```csharp
 public static bool llama_mmap_supported()
 ```
@@ -348,6 +663,8 @@ public static bool llama_mmap_supported()
 
 ### **llama_mlock_supported()**
 
+Check if memory lockingis supported
+
 ```csharp
 public static bool llama_mlock_supported()
 ```
@@ -356,39 +673,83 @@ public static bool llama_mlock_supported()
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **llama_init_from_file(String, LLamaContextParams)**
+### **llama_eval_export(SafeLLamaContextHandle, String)**
+
+Export a static computation graph for context of 511 and batch size of 1
+ NOTE: since this functionality is mostly for debugging and demonstration purposes, we hardcode these
+ parameters here to keep things simple
+ IMPORTANT: do not use for anything else other than debugging and testing!
+
+```csharp
+public static int llama_eval_export(SafeLLamaContextHandle ctx, string fname)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`fname` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_load_model_from_file(String, LLamaContextParams)**
 
 Various functions for loading a ggml llama model.
  Allocate (almost) all memory needed for the model.
  Return NULL on failure
 
 ```csharp
-public static IntPtr llama_init_from_file(string path_model, LLamaContextParams params_)
+public static IntPtr llama_load_model_from_file(string path_model, LLamaContextParams params)
 ```
 
 #### Parameters
 
 `path_model` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`params_` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+`params` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+### **llama_new_context_with_model(SafeLlamaModelHandle, LLamaContextParams)**
+
+Create a new llama_context with the given model.
+ Return value should always be wrapped in SafeLLamaContextHandle!
+
+```csharp
+public static IntPtr llama_new_context_with_model(SafeLlamaModelHandle model, LLamaContextParams params)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+`params` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
 
 #### Returns
 
 [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
 
-### **llama_init_backend()**
+### **llama_backend_init(Boolean)**
 
 not great API - very likely to change. 
  Initialize the llama + ggml backend
  Call once at the start of the program
 
 ```csharp
-public static void llama_init_backend()
+public static void llama_backend_init(bool numa)
 ```
 
+#### Parameters
+
+`numa` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
 ### **llama_free(IntPtr)**
 
-Frees all allocated memory
+Frees all allocated memory in the given llama_context
 
 ```csharp
 public static void llama_free(IntPtr ctx)
@@ -398,7 +759,19 @@ public static void llama_free(IntPtr ctx)
 
 `ctx` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
 
-### **llama_apply_lora_from_file(SafeLLamaContextHandle, String, String, Int32)**
+### **llama_free_model(IntPtr)**
+
+Frees all allocated memory associated with a model
+
+```csharp
+public static void llama_free_model(IntPtr model)
+```
+
+#### Parameters
+
+`model` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+### **llama_model_apply_lora_from_file(SafeLlamaModelHandle, String, String, Int32)**
 
 Apply a LoRA adapter to a loaded model
  path_base_model is the path to a higher quality model to use as a base for
@@ -407,12 +780,12 @@ Apply a LoRA adapter to a loaded model
  will be applied on top of the previous one
 
 ```csharp
-public static int llama_apply_lora_from_file(SafeLLamaContextHandle ctx, string path_lora, string path_base_model, int n_threads)
+public static int llama_model_apply_lora_from_file(SafeLlamaModelHandle model_ptr, string path_lora, string path_base_model, int n_threads)
 ```
 
 #### Parameters
 
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+`model_ptr` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
 
 `path_lora` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
@@ -472,11 +845,30 @@ public static ulong llama_get_state_size(SafeLLamaContextHandle ctx)
 
 [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
 
-### **llama_copy_state_data(SafeLLamaContextHandle, Byte[])**
+### **llama_copy_state_data(SafeLLamaContextHandle, Byte*)**
 
 Copies the state to the specified destination address.
  Destination needs to have allocated enough memory.
- Returns the number of bytes copied
+
+```csharp
+public static ulong llama_copy_state_data(SafeLLamaContextHandle ctx, Byte* dest)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+the number of bytes copied
+
+### **llama_copy_state_data(SafeLLamaContextHandle, Byte[])**
+
+Copies the state to the specified destination address.
+ Destination needs to have allocated enough memory (see llama_get_state_size)
 
 ```csharp
 public static ulong llama_copy_state_data(SafeLLamaContextHandle ctx, Byte[] dest)
@@ -491,11 +883,30 @@ public static ulong llama_copy_state_data(SafeLLamaContextHandle ctx, Byte[] des
 #### Returns
 
 [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+the number of bytes copied
+
+### **llama_set_state_data(SafeLLamaContextHandle, Byte*)**
+
+Set the state reading from the specified address
+
+```csharp
+public static ulong llama_set_state_data(SafeLLamaContextHandle ctx, Byte* src)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+the number of bytes read
 
 ### **llama_set_state_data(SafeLLamaContextHandle, Byte[])**
 
 Set the state reading from the specified address
- Returns the number of bytes read
 
 ```csharp
 public static ulong llama_set_state_data(SafeLLamaContextHandle ctx, Byte[] src)
@@ -510,6 +921,7 @@ public static ulong llama_set_state_data(SafeLLamaContextHandle ctx, Byte[] src)
 #### Returns
 
 [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+the number of bytes read
 
 ### **llama_load_session_file(SafeLLamaContextHandle, String, Int32[], UInt64, UInt64*)**
 
@@ -586,6 +998,10 @@ Returns 0 on success
 
 ### **llama_eval_with_pointer(SafeLLamaContextHandle, Int32*, Int32, Int32, Int32)**
 
+Run the llama inference to obtain the logits and probabilities for the next token.
+ tokens + n_tokens is the provided batch of new tokens to process
+ n_past is the number of tokens to use from previous eval calls
+
 ```csharp
 public static int llama_eval_with_pointer(SafeLLamaContextHandle ctx, Int32* tokens, int n_tokens, int n_past, int n_threads)
 ```
@@ -605,13 +1021,11 @@ public static int llama_eval_with_pointer(SafeLLamaContextHandle ctx, Int32* tok
 #### Returns
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Returns 0 on success
 
 ### **llama_tokenize(SafeLLamaContextHandle, String, Encoding, Int32[], Int32, Boolean)**
 
 Convert the provided text into tokens.
- The tokens pointer must be large enough to hold the resulting tokens.
- Returns the number of tokens on success, no more than n_max_tokens
- Returns a negative number on failure - the number of tokens that would have been returned
 
 ```csharp
 public static int llama_tokenize(SafeLLamaContextHandle ctx, string text, Encoding encoding, Int32[] tokens, int n_max_tokens, bool add_bos)
@@ -634,20 +1048,24 @@ public static int llama_tokenize(SafeLLamaContextHandle ctx, string text, Encodi
 #### Returns
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Returns the number of tokens on success, no more than n_max_tokens.
+ Returns a negative number on failure - the number of tokens that would have been returned
 
-### **llama_tokenize_native(SafeLLamaContextHandle, SByte[], Int32[], Int32, Boolean)**
+### **llama_tokenize_native(SafeLLamaContextHandle, Byte*, Int32*, Int32, Boolean)**
+
+Convert the provided text into tokens.
 
 ```csharp
-public static int llama_tokenize_native(SafeLLamaContextHandle ctx, SByte[] text, Int32[] tokens, int n_max_tokens, bool add_bos)
+public static int llama_tokenize_native(SafeLLamaContextHandle ctx, Byte* text, Int32* tokens, int n_max_tokens, bool add_bos)
 ```
 
 #### Parameters
 
 `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
 
-`text` [SByte[]](https://docs.microsoft.com/en-us/dotnet/api/system.sbyte)<br>
+`text` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
 
-`tokens` [Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+`tokens` [Int32*](https://docs.microsoft.com/en-us/dotnet/api/system.int32*)<br>
 
 `n_max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
 
@@ -656,9 +1074,13 @@ public static int llama_tokenize_native(SafeLLamaContextHandle ctx, SByte[] text
 #### Returns
 
 [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Returns the number of tokens on success, no more than n_max_tokens.
+ Returns a negative number on failure - the number of tokens that would have been returned
 
 ### **llama_n_vocab(SafeLLamaContextHandle)**
 
+Get the number of tokens in the model vocabulary for this context
+
 ```csharp
 public static int llama_n_vocab(SafeLLamaContextHandle ctx)
 ```
@@ -673,6 +1095,8 @@ public static int llama_n_vocab(SafeLLamaContextHandle ctx)
 
 ### **llama_n_ctx(SafeLLamaContextHandle)**
 
+Get the size of the context window for the model for this context
+
 ```csharp
 public static int llama_n_ctx(SafeLLamaContextHandle ctx)
 ```
@@ -687,6 +1111,8 @@ public static int llama_n_ctx(SafeLLamaContextHandle ctx)
 
 ### **llama_n_embd(SafeLLamaContextHandle)**
 
+Get the dimension of embedding vectors from the model for this context
+
 ```csharp
 public static int llama_n_embd(SafeLLamaContextHandle ctx)
 ```
@@ -703,8 +1129,8 @@ public static int llama_n_embd(SafeLLamaContextHandle ctx)
 
 Token logits obtained from the last call to llama_eval()
  The logits for the last token are stored in the last row
- Can be mutated in order to change the probabilities of the next token
- Rows: n_tokens
+ Can be mutated in order to change the probabilities of the next token.<br>
+ Rows: n_tokens<br>
  Cols: n_vocab
 
 ```csharp
@@ -735,52 +1161,3 @@ public static Single* llama_get_embeddings(SafeLLamaContextHandle ctx)
 #### Returns
 
 [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
-
-### **llama_token_to_str(SafeLLamaContextHandle, Int32)**
-
-Token Id -&gt; String. Uses the vocabulary in the provided context
-
-```csharp
-public static IntPtr llama_token_to_str(SafeLLamaContextHandle ctx, int token)
-```
-
-#### Parameters
-
-`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-#### Returns
-
-[IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
-Pointer to a string.
-
-### **llama_token_bos()**
-
-```csharp
-public static int llama_token_bos()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_token_eos()**
-
-```csharp
-public static int llama_token_eos()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **llama_token_nl()**
-
-```csharp
-public static int llama_token_nl()
-```
-
-#### Returns
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
diff --git a/docs/xmldocs/llama.native.safellamacontexthandle.md b/docs/xmldocs/llama.native.safellamacontexthandle.md
index ea713984..0fe73571 100644
--- a/docs/xmldocs/llama.native.safellamacontexthandle.md
+++ b/docs/xmldocs/llama.native.safellamacontexthandle.md
@@ -2,8 +2,10 @@
 
 Namespace: LLama.Native
 
+A safe wrapper around a llama_context
+
 ```csharp
-public class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable
+public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
@@ -11,6 +13,54 @@ Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idis
 
 ## Properties
 
+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public int ContextSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ModelHandle**
+
+Get the model which this context is using
+
+```csharp
+public SafeLlamaModelHandle ModelHandle { get; }
+```
+
+#### Property Value
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
 ### **IsInvalid**
 
 ```csharp
@@ -33,15 +83,21 @@ public bool IsClosed { get; }
 
 ## Constructors
 
-### **SafeLLamaContextHandle(IntPtr)**
+### **SafeLLamaContextHandle(IntPtr, SafeLlamaModelHandle)**
+
+Create a new SafeLLamaContextHandle
 
 ```csharp
-public SafeLLamaContextHandle(IntPtr handle)
+public SafeLLamaContextHandle(IntPtr handle, SafeLlamaModelHandle model)
 ```
 
 #### Parameters
 
 `handle` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+pointer to an allocated llama_context
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+the model which this context was created from
 
 ## Methods
 
@@ -54,3 +110,265 @@ protected bool ReleaseHandle()
 #### Returns
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Create(SafeLlamaModelHandle, LLamaContextParams)**
+
+Create a new llama_state for the given model
+
+```csharp
+public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaContextParams lparams)
+```
+
+#### Parameters
+
+`model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+`lparams` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Clone(LLamaContextParams)**
+
+Create a new llama context with a clone of the current llama context state
+
+```csharp
+public SafeLLamaContextHandle Clone(LLamaContextParams lparams)
+```
+
+#### Parameters
+
+`lparams` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+### **Tokenize(String, Boolean, Encoding)**
+
+Convert the given text into tokens
+
+```csharp
+public Int32[] Tokenize(string text, bool add_bos, Encoding encoding)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+The text to tokenize
+
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Whether the "BOS" token should be added
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+Encoding to use for the text
+
+#### Returns
+
+[Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **GetLogits()**
+
+Token logits obtained from the last call to llama_eval()
+ The logits for the last token are stored in the last row
+ Can be mutated in order to change the probabilities of the next token.<br>
+ Rows: n_tokens<br>
+ Cols: n_vocab
+
+```csharp
+public Span<float> GetLogits()
+```
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+### **TokenToString(Int32, Encoding)**
+
+Convert a token into a string
+
+```csharp
+public string TokenToString(int token, Encoding encoding)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode into a string
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **TokenToString(Int32, Encoding, StringBuilder)**
+
+Append a single llama token to a string builder
+
+```csharp
+public void TokenToString(int token, Encoding encoding, StringBuilder dest)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+`dest` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+string builder to append the result to
+
+### **TokenToSpan(Int32, Span&lt;Byte&gt;)**
+
+Convert a single llama token into bytes
+
+```csharp
+public int TokenToSpan(int token, Span<byte> dest)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`dest` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A span to attempt to write into. If this is too small nothing will be written
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The size of this token. **nothing will be written** if this is larger than `dest`
+
+### **Eval(ReadOnlySpan&lt;Int32&gt;, Int32, Int32)**
+
+Run the llama inference to obtain the logits and probabilities for the next token.
+
+```csharp
+public bool Eval(ReadOnlySpan<int> tokens, int n_past, int n_threads)
+```
+
+#### Parameters
+
+`tokens` [ReadOnlySpan&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
+The provided batch of new tokens to process
+
+`n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+the number of tokens to use from previous eval calls
+
+`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+Returns true on success
+
+### **GetStateSize()**
+
+Get the size of the state, when saved as bytes
+
+```csharp
+public ulong GetStateSize()
+```
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **GetState(Byte*, UInt64)**
+
+Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
+
+```csharp
+public ulong GetState(Byte* dest, ulong size)
+```
+
+#### Parameters
+
+`dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+Destination to write to
+
+`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes available to write to in dest (check required size with `GetStateSize()`)
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+The number of bytes written to dest
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+Thrown if dest is too small
+
+### **GetState(IntPtr, UInt64)**
+
+Get the raw state of this context, encoded as bytes. Data is written into the `dest` pointer.
+
+```csharp
+public ulong GetState(IntPtr dest, ulong size)
+```
+
+#### Parameters
+
+`dest` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+Destination to write to
+
+`size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes available to write to in dest (check required size with `GetStateSize()`)
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+The number of bytes written to dest
+
+#### Exceptions
+
+[ArgumentOutOfRangeException](https://docs.microsoft.com/en-us/dotnet/api/system.argumentoutofrangeexception)<br>
+Thrown if dest is too small
+
+### **SetState(Byte*)**
+
+Set the raw state of this context
+
+```csharp
+public ulong SetState(Byte* src)
+```
+
+#### Parameters
+
+`src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
+The pointer to read the state from
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes read from the src pointer
+
+### **SetState(IntPtr)**
+
+Set the raw state of this context
+
+```csharp
+public ulong SetState(IntPtr src)
+```
+
+#### Parameters
+
+`src` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+The pointer to read the state from
+
+#### Returns
+
+[UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+Number of bytes read from the src pointer
diff --git a/docs/xmldocs/llama.native.safellamagrammarhandle.md b/docs/xmldocs/llama.native.safellamagrammarhandle.md
new file mode 100644
index 00000000..653f0a36
--- /dev/null
+++ b/docs/xmldocs/llama.native.safellamagrammarhandle.md
@@ -0,0 +1,97 @@
+# SafeLLamaGrammarHandle
+
+Namespace: LLama.Native
+
+A safe reference to a `llama_grammar`
+
+```csharp
+public class SafeLLamaGrammarHandle : SafeLLamaHandleBase, System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+
+## Properties
+
+### **IsInvalid**
+
+```csharp
+public bool IsInvalid { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsClosed**
+
+```csharp
+public bool IsClosed { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **ReleaseHandle()**
+
+```csharp
+protected bool ReleaseHandle()
+```
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **Create(IReadOnlyList&lt;GrammarRule&gt;, UInt64)**
+
+Create a new llama_grammar
+
+```csharp
+public static SafeLLamaGrammarHandle Create(IReadOnlyList<GrammarRule> rules, ulong start_rule_index)
+```
+
+#### Parameters
+
+`rules` [IReadOnlyList&lt;GrammarRule&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1)<br>
+A list of list of elements, each inner list makes up one grammar rule
+
+`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+The index (in the outer list) of the start rule
+
+#### Returns
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **Create(LLamaGrammarElement**, UInt64, UInt64)**
+
+Create a new llama_grammar
+
+```csharp
+public static SafeLLamaGrammarHandle Create(LLamaGrammarElement** rules, ulong nrules, ulong start_rule_index)
+```
+
+#### Parameters
+
+`rules` [LLamaGrammarElement**](./llama.native.llamagrammarelement**.md)<br>
+rules list, each rule is a list of rule elements (terminated by a LLamaGrammarElementType.END element)
+
+`nrules` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+total number of rules
+
+`start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+index of the start rule of the grammar
+
+#### Returns
+
+[SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
diff --git a/docs/xmldocs/llama.native.safellamahandlebase.md b/docs/xmldocs/llama.native.safellamahandlebase.md
index 1c9f8ef8..eccbff03 100644
--- a/docs/xmldocs/llama.native.safellamahandlebase.md
+++ b/docs/xmldocs/llama.native.safellamahandlebase.md
@@ -2,6 +2,8 @@
 
 Namespace: LLama.Native
 
+Base class for all llama handles to native resources
+
 ```csharp
 public abstract class SafeLLamaHandleBase : System.Runtime.InteropServices.SafeHandle, System.IDisposable
 ```
diff --git a/docs/xmldocs/llama.native.safellamamodelhandle.md b/docs/xmldocs/llama.native.safellamamodelhandle.md
new file mode 100644
index 00000000..831ab0c4
--- /dev/null
+++ b/docs/xmldocs/llama.native.safellamamodelhandle.md
@@ -0,0 +1,220 @@
+# SafeLlamaModelHandle
+
+Namespace: LLama.Native
+
+A reference to a set of llama model weights
+
+```csharp
+public sealed class SafeLlamaModelHandle : SafeLLamaHandleBase, System.IDisposable
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CriticalFinalizerObject](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.constrainedexecution.criticalfinalizerobject) → [SafeHandle](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.safehandle) → [SafeLLamaHandleBase](./llama.native.safellamahandlebase.md) → [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
+
+## Properties
+
+### **VocabCount**
+
+Total number of tokens in vocabulary of this model
+
+```csharp
+public int VocabCount { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **ContextSize**
+
+Total number of tokens in the context
+
+```csharp
+public int ContextSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **EmbeddingSize**
+
+Dimension of embedding vectors
+
+```csharp
+public int EmbeddingSize { get; }
+```
+
+#### Property Value
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **IsInvalid**
+
+```csharp
+public bool IsInvalid { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **IsClosed**
+
+```csharp
+public bool IsClosed { get; }
+```
+
+#### Property Value
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+## Methods
+
+### **ReleaseHandle()**
+
+```csharp
+protected bool ReleaseHandle()
+```
+
+#### Returns
+
+[Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+### **LoadFromFile(String, LLamaContextParams)**
+
+Load a model from the given file path into memory
+
+```csharp
+public static SafeLlamaModelHandle LoadFromFile(string modelPath, LLamaContextParams lparams)
+```
+
+#### Parameters
+
+`modelPath` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`lparams` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **ApplyLoraFromFile(String, String, Int32)**
+
+Apply a LoRA adapter to a loaded model
+
+```csharp
+public void ApplyLoraFromFile(string lora, string modelBase, int threads)
+```
+
+#### Parameters
+
+`lora` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`modelBase` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+A path to a higher quality model to use as a base for the layers modified by the
+ adapter. Can be NULL to use the current loaded model.
+
+`threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Exceptions
+
+[RuntimeError](./llama.exceptions.runtimeerror.md)<br>
+
+### **TokenToSpan(Int32, Span&lt;Byte&gt;)**
+
+Convert a single llama token into bytes
+
+```csharp
+public int TokenToSpan(int llama_token, Span<byte> dest)
+```
+
+#### Parameters
+
+`llama_token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`dest` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+A span to attempt to write into. If this is too small nothing will be written
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The size of this token. **nothing will be written** if this is larger than `dest`
+
+### **TokenToString(Int32, Encoding)**
+
+Convert a single llama token into a string
+
+```csharp
+public string TokenToString(int llama_token, Encoding encoding)
+```
+
+#### Parameters
+
+`llama_token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+Encoding to use to decode the bytes into a string
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **TokenToString(Int32, Encoding, StringBuilder)**
+
+Append a single llama token to a string builder
+
+```csharp
+public void TokenToString(int llama_token, Encoding encoding, StringBuilder dest)
+```
+
+#### Parameters
+
+`llama_token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+Token to decode
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+`dest` [StringBuilder](https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder)<br>
+string builder to append the result to
+
+### **Tokenize(String, Boolean, Encoding)**
+
+Convert a string of text into tokens
+
+```csharp
+public Int32[] Tokenize(string text, bool add_bos, Encoding encoding)
+```
+
+#### Parameters
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **CreateContext(LLamaContextParams)**
+
+Create a new context for this model
+
+```csharp
+public SafeLLamaContextHandle CreateContext(LLamaContextParams params)
+```
+
+#### Parameters
+
+`params` [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
diff --git a/docs/xmldocs/llama.native.samplingapi.md b/docs/xmldocs/llama.native.samplingapi.md
new file mode 100644
index 00000000..db074c67
--- /dev/null
+++ b/docs/xmldocs/llama.native.samplingapi.md
@@ -0,0 +1,338 @@
+# SamplingApi
+
+Namespace: LLama.Native
+
+Direct translation of the llama.cpp sampling API
+
+```csharp
+public class SamplingApi
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [SamplingApi](./llama.native.samplingapi.md)
+
+## Constructors
+
+### **SamplingApi()**
+
+```csharp
+public SamplingApi()
+```
+
+## Methods
+
+### **llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArray, SafeLLamaGrammarHandle)**
+
+Apply grammar rules to candidate tokens
+
+```csharp
+public static void llama_sample_grammar(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, SafeLLamaGrammarHandle grammar)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+
+`grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
+
+### **llama_sample_repetition_penalty(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, UInt64, Single)**
+
+#### Caution
+
+last_tokens_size parameter is no longer needed
+
+---
+
+Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
+
+```csharp
+public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, ulong last_tokens_size, float penalty)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+`penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_repetition_penalty(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, Single)**
+
+Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
+
+```csharp
+public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, float penalty)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`penalty` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, UInt64, Single, Single)**
+
+#### Caution
+
+last_tokens_size parameter is no longer needed
+
+---
+
+Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+
+```csharp
+public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, ulong last_tokens_size, float alpha_frequency, float alpha_presence)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+`alpha_frequency` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`alpha_presence` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, LLamaTokenDataArray, Memory&lt;Int32&gt;, Single, Single)**
+
+Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
+
+```csharp
+public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, float alpha_frequency, float alpha_presence)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`last_tokens` [Memory&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.memory-1)<br>
+
+`alpha_frequency` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`alpha_presence` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArray)**
+
+Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
+
+```csharp
+public static void llama_sample_softmax(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+### **llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArray, Int32, UInt64)**
+
+Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+
+```csharp
+public static void llama_sample_top_k(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, int k, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)**
+
+Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
+
+```csharp
+public static void llama_sample_top_p(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float p, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)**
+
+Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
+
+```csharp
+public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float z, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)**
+
+Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
+
+```csharp
+public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float p, ulong min_keep)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+`p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+`min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
+
+### **llama_sample_temperature(SafeLLamaContextHandle, LLamaTokenDataArray, Single)**
+
+Sample with temperature.
+ As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual
+
+```csharp
+public static void llama_sample_temperature(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float temp)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+
+`temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+
+### **llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArray, Single, Single, Int32, Single&)**
+
+Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public static int llama_sample_token_mirostat(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float tau, float eta, int m, Single& mu)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+A vector of `LLamaTokenData` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+`m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
+
+`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
+Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArray, Single, Single, Single&)**
+
+Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
+
+```csharp
+public static int llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float tau, float eta, Single& mu)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+A vector of `LLamaTokenData` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
+
+`tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
+
+`eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
+The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
+
+`mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
+Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArray)**
+
+Selects the token with the highest probability.
+
+```csharp
+public static int llama_sample_token_greedy(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArray)**
+
+Randomly selects a token from the candidates based on their probabilities.
+
+```csharp
+public static int llama_sample_token(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`candidates` [LLamaTokenDataArray](./llama.native.llamatokendataarray.md)<br>
+Pointer to LLamaTokenDataArray
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatcompletion.md b/docs/xmldocs/llama.oldversion.chatcompletion.md
index af1dd253..a1169efa 100644
--- a/docs/xmldocs/llama.oldversion.chatcompletion.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletion.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletion : System.IEquatable`1[[LLama.OldVersion.ChatCompletion, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletion : System.IEquatable`1[[LLama.OldVersion.ChatCompletion, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletion](./llama.oldversion.chatcompletion.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatcompletionchoice.md b/docs/xmldocs/llama.oldversion.chatcompletionchoice.md
index c5f80d7b..ec1329f9 100644
--- a/docs/xmldocs/llama.oldversion.chatcompletionchoice.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchoice.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChoice, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChoice, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChoice](./llama.oldversion.chatcompletionchoice.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatcompletionchunk.md b/docs/xmldocs/llama.oldversion.chatcompletionchunk.md
index a15a033e..5280c3bc 100644
--- a/docs/xmldocs/llama.oldversion.chatcompletionchunk.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchunk.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChunk : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunk, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChunk : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunk, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunk](./llama.oldversion.chatcompletionchunk.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatcompletionchunkchoice.md b/docs/xmldocs/llama.oldversion.chatcompletionchunkchoice.md
index 16e2954e..17406848 100644
--- a/docs/xmldocs/llama.oldversion.chatcompletionchunkchoice.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchunkchoice.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChunkChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkChoice, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChunkChoice : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkChoice, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunkChoice](./llama.oldversion.chatcompletionchunkchoice.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatcompletionchunkdelta.md b/docs/xmldocs/llama.oldversion.chatcompletionchunkdelta.md
index a924879d..465e23d7 100644
--- a/docs/xmldocs/llama.oldversion.chatcompletionchunkdelta.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionchunkdelta.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionChunkDelta : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkDelta, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionChunkDelta : System.IEquatable`1[[LLama.OldVersion.ChatCompletionChunkDelta, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionChunkDelta](./llama.oldversion.chatcompletionchunkdelta.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatcompletionmessage.md b/docs/xmldocs/llama.oldversion.chatcompletionmessage.md
index 2856c180..af844e56 100644
--- a/docs/xmldocs/llama.oldversion.chatcompletionmessage.md
+++ b/docs/xmldocs/llama.oldversion.chatcompletionmessage.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatCompletionMessage : System.IEquatable`1[[LLama.OldVersion.ChatCompletionMessage, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatCompletionMessage : System.IEquatable`1[[LLama.OldVersion.ChatCompletionMessage, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatCompletionMessage](./llama.oldversion.chatcompletionmessage.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatmessagerecord.md b/docs/xmldocs/llama.oldversion.chatmessagerecord.md
index 8722f4bd..253ccbc7 100644
--- a/docs/xmldocs/llama.oldversion.chatmessagerecord.md
+++ b/docs/xmldocs/llama.oldversion.chatmessagerecord.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class ChatMessageRecord : System.IEquatable`1[[LLama.OldVersion.ChatMessageRecord, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class ChatMessageRecord : System.IEquatable`1[[LLama.OldVersion.ChatMessageRecord, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [ChatMessageRecord](./llama.oldversion.chatmessagerecord.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.chatsession-1.md b/docs/xmldocs/llama.oldversion.chatsession-1.md
index 4fcbeebf..1c68d554 100644
--- a/docs/xmldocs/llama.oldversion.chatsession-1.md
+++ b/docs/xmldocs/llama.oldversion.chatsession-1.md
@@ -2,6 +2,12 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public class ChatSession<T>
 ```
@@ -78,7 +84,7 @@ public ChatSession<T> WithPromptFile(string promptFilename, string encoding)
 
 ### **WithAntiprompt(String[])**
 
-Set the keyword to split the return value of chat AI.
+Set the keywords to split the return value of chat AI.
 
 ```csharp
 public ChatSession<T> WithAntiprompt(String[] antiprompt)
diff --git a/docs/xmldocs/llama.oldversion.completion.md b/docs/xmldocs/llama.oldversion.completion.md
index 39765402..1e93e449 100644
--- a/docs/xmldocs/llama.oldversion.completion.md
+++ b/docs/xmldocs/llama.oldversion.completion.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class Completion : System.IEquatable`1[[LLama.OldVersion.Completion, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class Completion : System.IEquatable`1[[LLama.OldVersion.Completion, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Completion](./llama.oldversion.completion.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.completionchoice.md b/docs/xmldocs/llama.oldversion.completionchoice.md
index e09df723..0c28f8f4 100644
--- a/docs/xmldocs/llama.oldversion.completionchoice.md
+++ b/docs/xmldocs/llama.oldversion.completionchoice.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionChoice : System.IEquatable`1[[LLama.OldVersion.CompletionChoice, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionChoice : System.IEquatable`1[[LLama.OldVersion.CompletionChoice, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionChoice](./llama.oldversion.completionchoice.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.completionchunk.md b/docs/xmldocs/llama.oldversion.completionchunk.md
index cc2ccec8..d1851c0b 100644
--- a/docs/xmldocs/llama.oldversion.completionchunk.md
+++ b/docs/xmldocs/llama.oldversion.completionchunk.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionChunk : System.IEquatable`1[[LLama.OldVersion.CompletionChunk, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionChunk : System.IEquatable`1[[LLama.OldVersion.CompletionChunk, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionChunk](./llama.oldversion.completionchunk.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.completionlogprobs.md b/docs/xmldocs/llama.oldversion.completionlogprobs.md
index 8c20201e..9b6829ed 100644
--- a/docs/xmldocs/llama.oldversion.completionlogprobs.md
+++ b/docs/xmldocs/llama.oldversion.completionlogprobs.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionLogprobs : System.IEquatable`1[[LLama.OldVersion.CompletionLogprobs, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionLogprobs : System.IEquatable`1[[LLama.OldVersion.CompletionLogprobs, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionLogprobs](./llama.oldversion.completionlogprobs.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.completionusage.md b/docs/xmldocs/llama.oldversion.completionusage.md
index ec996c50..803f7d58 100644
--- a/docs/xmldocs/llama.oldversion.completionusage.md
+++ b/docs/xmldocs/llama.oldversion.completionusage.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class CompletionUsage : System.IEquatable`1[[LLama.OldVersion.CompletionUsage, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class CompletionUsage : System.IEquatable`1[[LLama.OldVersion.CompletionUsage, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [CompletionUsage](./llama.oldversion.completionusage.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.embedding.md b/docs/xmldocs/llama.oldversion.embedding.md
index e1fa7a89..426f4209 100644
--- a/docs/xmldocs/llama.oldversion.embedding.md
+++ b/docs/xmldocs/llama.oldversion.embedding.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class Embedding : System.IEquatable`1[[LLama.OldVersion.Embedding, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class Embedding : System.IEquatable`1[[LLama.OldVersion.Embedding, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Embedding](./llama.oldversion.embedding.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.embeddingdata.md b/docs/xmldocs/llama.oldversion.embeddingdata.md
index 34f58e77..47932015 100644
--- a/docs/xmldocs/llama.oldversion.embeddingdata.md
+++ b/docs/xmldocs/llama.oldversion.embeddingdata.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class EmbeddingData : System.IEquatable`1[[LLama.OldVersion.EmbeddingData, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class EmbeddingData : System.IEquatable`1[[LLama.OldVersion.EmbeddingData, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [EmbeddingData](./llama.oldversion.embeddingdata.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.embeddingusage.md b/docs/xmldocs/llama.oldversion.embeddingusage.md
index f6d39441..206664a3 100644
--- a/docs/xmldocs/llama.oldversion.embeddingusage.md
+++ b/docs/xmldocs/llama.oldversion.embeddingusage.md
@@ -2,8 +2,14 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
-public class EmbeddingUsage : System.IEquatable`1[[LLama.OldVersion.EmbeddingUsage, LLamaSharp, Version=0.4.0.0, Culture=neutral, PublicKeyToken=null]]
+public class EmbeddingUsage : System.IEquatable`1[[LLama.OldVersion.EmbeddingUsage, LLamaSharp, Version=0.5.0.0, Culture=neutral, PublicKeyToken=null]]
 ```
 
 Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [EmbeddingUsage](./llama.oldversion.embeddingusage.md)<br>
diff --git a/docs/xmldocs/llama.oldversion.ichatmodel.md b/docs/xmldocs/llama.oldversion.ichatmodel.md
index ce7b7134..4d9a6d44 100644
--- a/docs/xmldocs/llama.oldversion.ichatmodel.md
+++ b/docs/xmldocs/llama.oldversion.ichatmodel.md
@@ -2,6 +2,12 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public interface IChatModel
 ```
diff --git a/docs/xmldocs/llama.oldversion.llamaembedder.md b/docs/xmldocs/llama.oldversion.llamaembedder.md
index 0259316d..5d80bfb3 100644
--- a/docs/xmldocs/llama.oldversion.llamaembedder.md
+++ b/docs/xmldocs/llama.oldversion.llamaembedder.md
@@ -2,6 +2,12 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public class LLamaEmbedder : System.IDisposable
 ```
diff --git a/docs/xmldocs/llama.oldversion.llamamodel.md b/docs/xmldocs/llama.oldversion.llamamodel.md
index 4f014907..e04cc398 100644
--- a/docs/xmldocs/llama.oldversion.llamamodel.md
+++ b/docs/xmldocs/llama.oldversion.llamamodel.md
@@ -2,6 +2,12 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public class LLamaModel : IChatModel, System.IDisposable
 ```
diff --git a/docs/xmldocs/llama.oldversion.llamaparams.md b/docs/xmldocs/llama.oldversion.llamaparams.md
index ce242f59..911fa2d8 100644
--- a/docs/xmldocs/llama.oldversion.llamaparams.md
+++ b/docs/xmldocs/llama.oldversion.llamaparams.md
@@ -2,6 +2,12 @@
 
 Namespace: LLama.OldVersion
 
+#### Caution
+
+The entire LLama.OldVersion namespace will be removed
+
+---
+
 ```csharp
 public struct LLamaParams
 ```
diff --git a/docs/xmldocs/llama.resettablellamamodel.md b/docs/xmldocs/llama.resettablellamamodel.md
deleted file mode 100644
index b43646a3..00000000
--- a/docs/xmldocs/llama.resettablellamamodel.md
+++ /dev/null
@@ -1,101 +0,0 @@
-# ResettableLLamaModel
-
-Namespace: LLama
-
-A LLamaModel what could be reset. Note that using this class will consume about 10% more memories.
-
-```csharp
-public class ResettableLLamaModel : LLamaModel, System.IDisposable
-```
-
-Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [LLamaModel](./llama.llamamodel.md) → [ResettableLLamaModel](./llama.resettablellamamodel.md)<br>
-Implements [IDisposable](https://docs.microsoft.com/en-us/dotnet/api/system.idisposable)
-
-## Properties
-
-### **OriginalState**
-
-The initial state of the model
-
-```csharp
-public Byte[] OriginalState { get; set; }
-```
-
-#### Property Value
-
-[Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
-
-### **ContextSize**
-
-The context size.
-
-```csharp
-public int ContextSize { get; }
-```
-
-#### Property Value
-
-[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
-
-### **Params**
-
-The model params set for this model.
-
-```csharp
-public ModelParams Params { get; set; }
-```
-
-#### Property Value
-
-[ModelParams](./llama.common.modelparams.md)<br>
-
-### **NativeHandle**
-
-The native handle, which is used to be passed to the native APIs. Please avoid using it 
- unless you know what is the usage of the Native API.
-
-```csharp
-public SafeLLamaContextHandle NativeHandle { get; }
-```
-
-#### Property Value
-
-[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
-
-### **Encoding**
-
-The encoding set for this model to deal with text input.
-
-```csharp
-public Encoding Encoding { get; }
-```
-
-#### Property Value
-
-[Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
-
-## Constructors
-
-### **ResettableLLamaModel(ModelParams, String)**
-
-
-
-```csharp
-public ResettableLLamaModel(ModelParams Params, string encoding)
-```
-
-#### Parameters
-
-`Params` [ModelParams](./llama.common.modelparams.md)<br>
-
-`encoding` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
-
-## Methods
-
-### **Reset()**
-
-Reset the state to the initial state.
-
-```csharp
-public void Reset()
-```
diff --git a/docs/xmldocs/llama.statefulexecutorbase.md b/docs/xmldocs/llama.statefulexecutorbase.md
index 6cd169e1..428610e8 100644
--- a/docs/xmldocs/llama.statefulexecutorbase.md
+++ b/docs/xmldocs/llama.statefulexecutorbase.md
@@ -13,17 +13,17 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)
 
 ## Properties
 
-### **Model**
+### **Context**
 
-The mode used by the executor.
+The context used by the executor.
 
 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; }
 ```
 
 #### Property Value
 
-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>
 
 ## Methods
 
@@ -111,17 +111,17 @@ protected abstract void PreprocessInputs(string text, InferStateArgs args)
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
 
-### **PostProcess(InferenceParams, InferStateArgs, IEnumerable`1&)**
+### **PostProcess(IInferenceParams, InferStateArgs, IEnumerable`1&)**
 
 Do some post processing after the inference.
 
 ```csharp
-protected abstract bool PostProcess(InferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
+protected abstract bool PostProcess(IInferenceParams inferenceParams, InferStateArgs args, IEnumerable`1& extraOutputs)
 ```
 
 #### Parameters
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
 
@@ -131,17 +131,17 @@ protected abstract bool PostProcess(InferenceParams inferenceParams, InferStateA
 
 [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
 
-### **InferInternal(InferenceParams, InferStateArgs)**
+### **InferInternal(IInferenceParams, InferStateArgs)**
 
 The core inference logic.
 
 ```csharp
-protected abstract void InferInternal(InferenceParams inferenceParams, InferStateArgs args)
+protected abstract void InferInternal(IInferenceParams inferenceParams, InferStateArgs args)
 ```
 
 #### Parameters
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `args` [InferStateArgs](./llama.statefulexecutorbase.inferstateargs.md)<br>
 
@@ -193,19 +193,19 @@ public abstract void LoadState(string filename)
 
 `filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-### **Infer(String, InferenceParams, CancellationToken)**
+### **Infer(String, IInferenceParams, CancellationToken)**
 
 Execute the inference.
 
 ```csharp
-public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Infer(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
@@ -213,19 +213,19 @@ public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, C
 
 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
 
-### **InferAsync(String, InferenceParams, CancellationToken)**
+### **InferAsync(String, IInferenceParams, CancellationToken)**
 
 Execute the inference asynchronously.
 
 ```csharp
-public IAsyncEnumerable<string> InferAsync(string text, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IAsyncEnumerable<string> InferAsync(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
diff --git a/docs/xmldocs/llama.statelessexecutor.md b/docs/xmldocs/llama.statelessexecutor.md
index 60db8326..d6995ef2 100644
--- a/docs/xmldocs/llama.statelessexecutor.md
+++ b/docs/xmldocs/llama.statelessexecutor.md
@@ -14,46 +14,65 @@ Implements [ILLamaExecutor](./llama.abstractions.illamaexecutor.md)
 
 ## Properties
 
-### **Model**
+### **Context**
 
-The mode used by the executor when running the inference.
+The context used by the executor when running the inference.
 
 ```csharp
-public LLamaModel Model { get; }
+public LLamaContext Context { get; private set; }
 ```
 
 #### Property Value
 
-[LLamaModel](./llama.llamamodel.md)<br>
+[LLamaContext](./llama.llamacontext.md)<br>
 
 ## Constructors
 
-### **StatelessExecutor(LLamaModel)**
+### **StatelessExecutor(LLamaWeights, IModelParams)**
 
+Create a new stateless executor which will use the given model
 
+```csharp
+public StatelessExecutor(LLamaWeights weights, IModelParams params)
+```
+
+#### Parameters
+
+`weights` [LLamaWeights](./llama.llamaweights.md)<br>
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+### **StatelessExecutor(LLamaContext)**
+
+#### Caution
+
+Use the constructor which automatically creates contexts using the LLamaWeights
+
+---
+
+Create a new stateless executor which will use the model used to create the given context
 
 ```csharp
-public StatelessExecutor(LLamaModel model)
+public StatelessExecutor(LLamaContext context)
 ```
 
 #### Parameters
 
-`model` [LLamaModel](./llama.llamamodel.md)<br>
-The LLama model.
+`context` [LLamaContext](./llama.llamacontext.md)<br>
 
 ## Methods
 
-### **Infer(String, InferenceParams, CancellationToken)**
+### **Infer(String, IInferenceParams, CancellationToken)**
 
 ```csharp
-public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, CancellationToken cancellationToken)
+public IEnumerable<string> Infer(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
 `cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
@@ -61,19 +80,19 @@ public IEnumerable<string> Infer(string text, InferenceParams inferenceParams, C
 
 [IEnumerable&lt;String&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
 
-### **InferAsync(String, InferenceParams, CancellationToken)**
+### **InferAsync(String, IInferenceParams, CancellationToken)**
 
 ```csharp
-public IAsyncEnumerable<string> InferAsync(string text, InferenceParams inferenceParams, CancellationToken token)
+public IAsyncEnumerable<string> InferAsync(string text, IInferenceParams inferenceParams, CancellationToken cancellationToken)
 ```
 
 #### Parameters
 
 `text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
 
-`inferenceParams` [InferenceParams](./llama.common.inferenceparams.md)<br>
+`inferenceParams` [IInferenceParams](./llama.abstractions.iinferenceparams.md)<br>
 
-`token` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
+`cancellationToken` [CancellationToken](https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken)<br>
 
 #### Returns
 
diff --git a/docs/xmldocs/llama.utils.md b/docs/xmldocs/llama.utils.md
new file mode 100644
index 00000000..38b6887d
--- /dev/null
+++ b/docs/xmldocs/llama.utils.md
@@ -0,0 +1,157 @@
+# Utils
+
+Namespace: LLama
+
+Assorted llama utilities
+
+```csharp
+public static class Utils
+```
+
+Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [Utils](./llama.utils.md)
+
+## Methods
+
+### **InitLLamaContextFromModelParams(IModelParams)**
+
+#### Caution
+
+Use LLamaWeights.LoadFromFile and LLamaWeights.CreateContext instead
+
+---
+
+```csharp
+public static SafeLLamaContextHandle InitLLamaContextFromModelParams(IModelParams params)
+```
+
+#### Parameters
+
+`params` [IModelParams](./llama.abstractions.imodelparams.md)<br>
+
+#### Returns
+
+[SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+### **Tokenize(SafeLLamaContextHandle, String, Boolean, Encoding)**
+
+#### Caution
+
+Use SafeLLamaContextHandle Tokenize method instead
+
+---
+
+```csharp
+public static IEnumerable<int> Tokenize(SafeLLamaContextHandle ctx, string text, bool add_bos, Encoding encoding)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`text` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+`add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[IEnumerable&lt;Int32&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerable-1)<br>
+
+### **GetLogits(SafeLLamaContextHandle, Int32)**
+
+#### Caution
+
+Use SafeLLamaContextHandle GetLogits method instead
+
+---
+
+```csharp
+public static Span<float> GetLogits(SafeLLamaContextHandle ctx, int length)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
+
+### **Eval(SafeLLamaContextHandle, Int32[], Int32, Int32, Int32, Int32)**
+
+#### Caution
+
+Use SafeLLamaContextHandle Eval method instead
+
+---
+
+```csharp
+public static int Eval(SafeLLamaContextHandle ctx, Int32[] tokens, int startIndex, int n_tokens, int n_past, int n_threads)
+```
+
+#### Parameters
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`tokens` [Int32[]](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`startIndex` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`n_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+#### Returns
+
+[Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+### **TokenToString(Int32, SafeLLamaContextHandle, Encoding)**
+
+#### Caution
+
+Use SafeLLamaContextHandle TokenToString method instead
+
+---
+
+```csharp
+public static string TokenToString(int token, SafeLLamaContextHandle ctx, Encoding encoding)
+```
+
+#### Parameters
+
+`token` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
+
+`ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
+
+### **PtrToString(IntPtr, Encoding)**
+
+#### Caution
+
+No longer used internally by LlamaSharp
+
+---
+
+```csharp
+public static string PtrToString(IntPtr ptr, Encoding encoding)
+```
+
+#### Parameters
+
+`ptr` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
+
+`encoding` [Encoding](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding)<br>
+
+#### Returns
+
+[String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
diff --git a/mkdocs.yml b/mkdocs.yml
index bdb3a44b..507e1229 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -5,12 +5,12 @@ nav:
     - Architecture: Architecture.md
     - Tricks for FAQ: Tricks.md
     - Contributing Guide: ContributingGuide.md
-    - LLamaModel:
-        - Model Parameters: LLamaModel/parameters.md
-        - Tokenization: LLamaModel/tokenization.md
-        - Get Embeddings: LLamaModel/embeddings.md
-        - Quantization: LLamaModel/quantization.md
-        - Save/Load State: LLamaModel/save-load-state.md
+    - LLamaContext:
+        - Context Parameters: LLamaContext/parameters.md
+        - Tokenization: LLamaContext/tokenization.md
+        - Get Embeddings: LLamaContext/embeddings.md
+        - Quantization: LLamaContext/quantization.md
+        - Save/Load State: LLamaContext/save-load-state.md
     - LLamaExecutors:
         - Inference Parameters: LLamaExecutors/parameters.md
         - Text-to-Text APIs: LLamaExecutors/text-to-text-apis.md
@@ -24,6 +24,7 @@ nav:
         - Chinese: NonEnglishUsage/Chinese.md
     - High-level Applications:
         - BotSharp: HighLevelApps/bot-sharp.md
+        - semantic-kernel: HighLevelApps/semantic-kernel.md
     - More:
         - Logger: More/log.md
     - Examples:
@@ -39,7 +40,9 @@ nav:
     - API Reference:
         - index: ./xmldocs/index.md
         - llama.abstractions.ihistorytransform: ./xmldocs/llama.abstractions.ihistorytransform.md
+        - llama.abstractions.iinferenceparams: ./xmldocs/llama.abstractions.iinferenceparams.md
         - llama.abstractions.illamaexecutor: ./xmldocs/llama.abstractions.illamaexecutor.md
+        - llama.abstractions.imodelparams: ./xmldocs/llama.abstractions.imodelparams.md
         - llama.abstractions.itextstreamtransform: ./xmldocs/llama.abstractions.itextstreamtransform.md
         - llama.abstractions.itexttransform: ./xmldocs/llama.abstractions.itexttransform.md
         - llama.chatsession: ./xmldocs/llama.chatsession.md
@@ -49,24 +52,44 @@ nav:
         - llama.common.illamalogger: ./xmldocs/llama.common.illamalogger.md
         - llama.common.inferenceparams: ./xmldocs/llama.common.inferenceparams.md
         - llama.common.llamadefaultlogger: ./xmldocs/llama.common.llamadefaultlogger.md
-        - llama.common.mirostatetype: ./xmldocs/llama.common.mirostatetype.md
+        - llama.common.mirostattype: ./xmldocs/llama.common.mirostattype.md
         - llama.common.modelparams: ./xmldocs/llama.common.modelparams.md
+        - llama.exceptions.grammarexpectedname: ./xmldocs/llama.exceptions.grammarexpectedname.md
+        - llama.exceptions.grammarexpectednext: ./xmldocs/llama.exceptions.grammarexpectednext.md
+        - llama.exceptions.grammarexpectedprevious: ./xmldocs/llama.exceptions.grammarexpectedprevious.md
+        - llama.exceptions.grammarformatexception: ./xmldocs/llama.exceptions.grammarformatexception.md
+        - llama.exceptions.grammarunexpectedcharaltelement: ./xmldocs/llama.exceptions.grammarunexpectedcharaltelement.md
+        - llama.exceptions.grammarunexpectedcharrngelement: ./xmldocs/llama.exceptions.grammarunexpectedcharrngelement.md
+        - llama.exceptions.grammarunexpectedendelement: ./xmldocs/llama.exceptions.grammarunexpectedendelement.md
+        - llama.exceptions.grammarunexpectedendofinput: ./xmldocs/llama.exceptions.grammarunexpectedendofinput.md
+        - llama.exceptions.grammarunexpectedhexcharscount: ./xmldocs/llama.exceptions.grammarunexpectedhexcharscount.md
+        - llama.exceptions.grammarunknownescapecharacter: ./xmldocs/llama.exceptions.grammarunknownescapecharacter.md
         - llama.exceptions.runtimeerror: ./xmldocs/llama.exceptions.runtimeerror.md
-        - llama.extensions.dictionaryextension: ./xmldocs/llama.extensions.dictionaryextension.md
+        - llama.extensions.imodelparamsextensions: ./xmldocs/llama.extensions.imodelparamsextensions.md
+        - llama.extensions.keyvaluepairextensions: ./xmldocs/llama.extensions.keyvaluepairextensions.md
+        - llama.grammars.grammar: ./xmldocs/llama.grammars.grammar.md
+        - llama.grammars.grammarrule: ./xmldocs/llama.grammars.grammarrule.md
         - llama.instructexecutor: ./xmldocs/llama.instructexecutor.md
         - llama.interactiveexecutor: ./xmldocs/llama.interactiveexecutor.md
+        - llama.llamacontext: ./xmldocs/llama.llamacontext.md
         - llama.llamaembedder: ./xmldocs/llama.llamaembedder.md
-        - llama.llamamodel: ./xmldocs/llama.llamamodel.md
         - llama.llamaquantizer: ./xmldocs/llama.llamaquantizer.md
         - llama.llamatransforms: ./xmldocs/llama.llamatransforms.md
+        - llama.llamaweights: ./xmldocs/llama.llamaweights.md
         - llama.native.llamacontextparams: ./xmldocs/llama.native.llamacontextparams.md
         - llama.native.llamaftype: ./xmldocs/llama.native.llamaftype.md
+        - llama.native.llamagrammarelement: ./xmldocs/llama.native.llamagrammarelement.md
+        - llama.native.llamagrammarelementtype: ./xmldocs/llama.native.llamagrammarelementtype.md
+        - llama.native.llamamodelquantizeparams: ./xmldocs/llama.native.llamamodelquantizeparams.md
         - llama.native.llamatokendata: ./xmldocs/llama.native.llamatokendata.md
         - llama.native.llamatokendataarray: ./xmldocs/llama.native.llamatokendataarray.md
         - llama.native.llamatokendataarraynative: ./xmldocs/llama.native.llamatokendataarraynative.md
         - llama.native.nativeapi: ./xmldocs/llama.native.nativeapi.md
         - llama.native.safellamacontexthandle: ./xmldocs/llama.native.safellamacontexthandle.md
+        - llama.native.safellamagrammarhandle: ./xmldocs/llama.native.safellamagrammarhandle.md
         - llama.native.safellamahandlebase: ./xmldocs/llama.native.safellamahandlebase.md
+        - llama.native.safellamamodelhandle: ./xmldocs/llama.native.safellamamodelhandle.md
+        - llama.native.samplingapi: ./xmldocs/llama.native.samplingapi.md
         - llama.oldversion.chatcompletion: ./xmldocs/llama.oldversion.chatcompletion.md
         - llama.oldversion.chatcompletionchoice: ./xmldocs/llama.oldversion.chatcompletionchoice.md
         - llama.oldversion.chatcompletionchunk: ./xmldocs/llama.oldversion.chatcompletionchunk.md
@@ -88,9 +111,9 @@ nav:
         - llama.oldversion.llamaembedder: ./xmldocs/llama.oldversion.llamaembedder.md
         - llama.oldversion.llamamodel: ./xmldocs/llama.oldversion.llamamodel.md
         - llama.oldversion.llamaparams: ./xmldocs/llama.oldversion.llamaparams.md
-        - llama.resettablellamamodel: ./xmldocs/llama.resettablellamamodel.md
         - llama.statefulexecutorbase: ./xmldocs/llama.statefulexecutorbase.md
         - llama.statelessexecutor: ./xmldocs/llama.statelessexecutor.md
+        - llama.utils: ./xmldocs/llama.utils.md
 
 theme:
   name: material