You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

llama.native.nativeapi.md 51 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518151915201521152215231524152515261527152815291530153115321533153415351536153715381539154015411542154315441545154615471548154915501551155215531554155515561557155815591560156115621563156415651566156715681569157015711572157315741575157615771578157915801581158215831584158515861587158815891590159115921593159415951596159715981599160016011602160316041605160616071608160916101611161216131614161516161617161816191620162116221623162416251626
  1. # NativeApi
  2. Namespace: LLama.Native
  3. Direct translation of the llama.cpp API
  4. ```csharp
  5. public static class NativeApi
  6. ```
  7. Inheritance [Object](https://docs.microsoft.com/en-us/dotnet/api/system.object) → [NativeApi](./llama.native.nativeapi.md)
  8. ## Methods
  9. ### **llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Int32, Single&)**
  10. Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
  11. ```csharp
  12. public static LLamaToken llama_sample_token_mirostat(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, int m, Single& mu)
  13. ```
  14. #### Parameters
  15. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  16. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  17. A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
  18. `tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  19. The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
  20. `eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  21. The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
  22. `m` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  23. The number of tokens considered in the estimation of `s_hat`. This is an arbitrary value that is used to calculate `s_hat`, which in turn helps to calculate the value of `k`. In the paper, they use `m = 100`, but you can experiment with different values to see how it affects the performance of the algorithm.
  24. `mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
  25. Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
  26. #### Returns
  27. [LLamaToken](./llama.native.llamatoken.md)<br>
  28. ### **llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single&)**
  29. Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
  30. ```csharp
  31. public static LLamaToken llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float tau, float eta, Single& mu)
  32. ```
  33. #### Parameters
  34. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  35. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  36. A vector of `llama_token_data` containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
  37. `tau` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  38. The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
  39. `eta` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  40. The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
  41. `mu` [Single&](https://docs.microsoft.com/en-us/dotnet/api/system.single&)<br>
  42. Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (`2 * tau`) and is updated in the algorithm based on the error between the target and observed surprisal.
  43. #### Returns
  44. [LLamaToken](./llama.native.llamatoken.md)<br>
  45. ### **llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
  46. Selects the token with the highest probability.
  47. ```csharp
  48. public static LLamaToken llama_sample_token_greedy(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
  49. ```
  50. #### Parameters
  51. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  52. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  53. Pointer to LLamaTokenDataArray
  54. #### Returns
  55. [LLamaToken](./llama.native.llamatoken.md)<br>
  56. ### **llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
  57. Randomly selects a token from the candidates based on their probabilities.
  58. ```csharp
  59. public static LLamaToken llama_sample_token(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
  60. ```
  61. #### Parameters
  62. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  63. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  64. Pointer to LLamaTokenDataArray
  65. #### Returns
  66. [LLamaToken](./llama.native.llamatoken.md)<br>
  67. ### **&lt;llama_get_embeddings&gt;g__llama_get_embeddings_native|30_0(SafeLLamaContextHandle)**
  68. ```csharp
  69. internal static Single* <llama_get_embeddings>g__llama_get_embeddings_native|30_0(SafeLLamaContextHandle ctx)
  70. ```
  71. #### Parameters
  72. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  73. #### Returns
  74. [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
  75. ### **&lt;llama_token_to_piece&gt;g__llama_token_to_piece_native|44_0(SafeLlamaModelHandle, LLamaToken, Byte*, Int32)**
  76. ```csharp
  77. internal static int <llama_token_to_piece>g__llama_token_to_piece_native|44_0(SafeLlamaModelHandle model, LLamaToken llamaToken, Byte* buffer, int length)
  78. ```
  79. #### Parameters
  80. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  81. `llamaToken` [LLamaToken](./llama.native.llamatoken.md)<br>
  82. `buffer` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
  83. `length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  84. #### Returns
  85. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  86. ### **&lt;TryLoadLibraries&gt;g__TryLoad|84_0(String)**
  87. ```csharp
  88. internal static IntPtr <TryLoadLibraries>g__TryLoad|84_0(string path)
  89. ```
  90. #### Parameters
  91. `path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  92. #### Returns
  93. [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  94. ### **&lt;TryLoadLibraries&gt;g__TryFindPath|84_1(String, &lt;&gt;c__DisplayClass84_0&)**
  95. ```csharp
  96. internal static string <TryLoadLibraries>g__TryFindPath|84_1(string filename, <>c__DisplayClass84_0& )
  97. ```
  98. #### Parameters
  99. `filename` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  100. `` [&lt;&gt;c__DisplayClass84_0&](./llama.native.nativeapi.<>c__displayclass84_0&.md)<br>
  101. #### Returns
  102. [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  103. ### **llama_set_n_threads(SafeLLamaContextHandle, UInt32, UInt32)**
  104. Set the number of threads used for decoding
  105. ```csharp
  106. public static void llama_set_n_threads(SafeLLamaContextHandle ctx, uint n_threads, uint n_threads_batch)
  107. ```
  108. #### Parameters
  109. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  110. `n_threads` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
  111. n_threads is the number of threads used for generation (single token)
  112. `n_threads_batch` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
  113. n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
  114. ### **llama_vocab_type(SafeLlamaModelHandle)**
  115. ```csharp
  116. public static LLamaVocabType llama_vocab_type(SafeLlamaModelHandle model)
  117. ```
  118. #### Parameters
  119. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  120. #### Returns
  121. [LLamaVocabType](./llama.native.llamavocabtype.md)<br>
  122. ### **llama_rope_type(SafeLlamaModelHandle)**
  123. ```csharp
  124. public static LLamaRopeType llama_rope_type(SafeLlamaModelHandle model)
  125. ```
  126. #### Parameters
  127. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  128. #### Returns
  129. [LLamaRopeType](./llama.native.llamaropetype.md)<br>
  130. ### **llama_grammar_init(LLamaGrammarElement**, UInt64, UInt64)**
  131. Create a new grammar from the given set of grammar rules
  132. ```csharp
  133. public static IntPtr llama_grammar_init(LLamaGrammarElement** rules, ulong n_rules, ulong start_rule_index)
  134. ```
  135. #### Parameters
  136. `rules` [LLamaGrammarElement**](./llama.native.llamagrammarelement**.md)<br>
  137. `n_rules` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  138. `start_rule_index` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  139. #### Returns
  140. [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  141. ### **llama_grammar_free(IntPtr)**
  142. Free all memory from the given SafeLLamaGrammarHandle
  143. ```csharp
  144. public static void llama_grammar_free(IntPtr grammar)
  145. ```
  146. #### Parameters
  147. `grammar` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  148. ### **llama_grammar_copy(SafeLLamaGrammarHandle)**
  149. Create a copy of an existing grammar instance
  150. ```csharp
  151. public static IntPtr llama_grammar_copy(SafeLLamaGrammarHandle grammar)
  152. ```
  153. #### Parameters
  154. `grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
  155. #### Returns
  156. [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  157. ### **llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, SafeLLamaGrammarHandle)**
  158. Apply constraints from grammar
  159. ```csharp
  160. public static void llama_sample_grammar(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, SafeLLamaGrammarHandle grammar)
  161. ```
  162. #### Parameters
  163. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  164. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  165. `grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
  166. ### **llama_grammar_accept_token(SafeLLamaContextHandle, SafeLLamaGrammarHandle, LLamaToken)**
  167. Accepts the sampled token into the grammar
  168. ```csharp
  169. public static void llama_grammar_accept_token(SafeLLamaContextHandle ctx, SafeLLamaGrammarHandle grammar, LLamaToken token)
  170. ```
  171. #### Parameters
  172. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  173. `grammar` [SafeLLamaGrammarHandle](./llama.native.safellamagrammarhandle.md)<br>
  174. `token` [LLamaToken](./llama.native.llamatoken.md)<br>
  175. ### **llava_validate_embed_size(SafeLLamaContextHandle, SafeLlavaModelHandle)**
  176. Sanity check for clip &lt;-&gt; llava embed size match
  177. ```csharp
  178. public static bool llava_validate_embed_size(SafeLLamaContextHandle ctxLlama, SafeLlavaModelHandle ctxClip)
  179. ```
  180. #### Parameters
  181. `ctxLlama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  182. LLama Context
  183. `ctxClip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
  184. Llava Model
  185. #### Returns
  186. [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  187. True if validate successfully
  188. ### **llava_image_embed_make_with_bytes(SafeLlavaModelHandle, Int32, Byte[], Int32)**
  189. Build an image embed from image file bytes
  190. ```csharp
  191. public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_bytes(SafeLlavaModelHandle ctx_clip, int n_threads, Byte[] image_bytes, int image_bytes_length)
  192. ```
  193. #### Parameters
  194. `ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
  195. SafeHandle to the Clip Model
  196. `n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  197. Number of threads
  198. `image_bytes` [Byte[]](https://docs.microsoft.com/en-us/dotnet/api/system.byte)<br>
  199. Binary image in jpeg format
  200. `image_bytes_length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  201. Bytes lenght of the image
  202. #### Returns
  203. [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
  204. SafeHandle to the Embeddings
  205. ### **llava_image_embed_make_with_filename(SafeLlavaModelHandle, Int32, String)**
  206. Build an image embed from a path to an image filename
  207. ```csharp
  208. public static SafeLlavaImageEmbedHandle llava_image_embed_make_with_filename(SafeLlavaModelHandle ctx_clip, int n_threads, string image_path)
  209. ```
  210. #### Parameters
  211. `ctx_clip` [SafeLlavaModelHandle](./llama.native.safellavamodelhandle.md)<br>
  212. SafeHandle to the Clip Model
  213. `n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  214. Number of threads
  215. `image_path` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  216. Image filename (jpeg) to generate embeddings
  217. #### Returns
  218. [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
  219. SafeHandel to the embeddings
  220. ### **llava_image_embed_free(IntPtr)**
  221. Free an embedding made with llava_image_embed_make_*
  222. ```csharp
  223. public static void llava_image_embed_free(IntPtr embed)
  224. ```
  225. #### Parameters
  226. `embed` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  227. Embeddings to release
  228. ### **llava_eval_image_embed(SafeLLamaContextHandle, SafeLlavaImageEmbedHandle, Int32, Int32&)**
  229. Write the image represented by embed into the llama context with batch size n_batch, starting at context
  230. pos n_past. on completion, n_past points to the next position in the context after the image embed.
  231. ```csharp
  232. public static bool llava_eval_image_embed(SafeLLamaContextHandle ctx_llama, SafeLlavaImageEmbedHandle embed, int n_batch, Int32& n_past)
  233. ```
  234. #### Parameters
  235. `ctx_llama` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  236. Llama Context
  237. `embed` [SafeLlavaImageEmbedHandle](./llama.native.safellavaimageembedhandle.md)<br>
  238. Embedding handle
  239. `n_batch` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  240. `n_past` [Int32&](https://docs.microsoft.com/en-us/dotnet/api/system.int32&)<br>
  241. #### Returns
  242. [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  243. True on success
  244. ### **llama_model_quantize(String, String, LLamaModelQuantizeParams*)**
  245. Returns 0 on success
  246. ```csharp
  247. public static uint llama_model_quantize(string fname_inp, string fname_out, LLamaModelQuantizeParams* param)
  248. ```
  249. #### Parameters
  250. `fname_inp` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  251. `fname_out` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  252. `param` [LLamaModelQuantizeParams*](./llama.native.llamamodelquantizeparams*.md)<br>
  253. #### Returns
  254. [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
  255. Returns 0 on success
  256. ### **llama_sample_repetition_penalties(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, LLamaToken*, UInt64, Single, Single, Single)**
  257. Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
  258. Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
  259. ```csharp
  260. public static void llama_sample_repetition_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, LLamaToken* last_tokens, ulong last_tokens_size, float penalty_repeat, float penalty_freq, float penalty_present)
  261. ```
  262. #### Parameters
  263. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  264. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  265. Pointer to LLamaTokenDataArray
  266. `last_tokens` [LLamaToken*](./llama.native.llamatoken*.md)<br>
  267. `last_tokens_size` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  268. `penalty_repeat` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  269. Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
  270. `penalty_freq` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  271. Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
  272. `penalty_present` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  273. Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
  274. ### **llama_sample_apply_guidance(SafeLLamaContextHandle, Span&lt;Single&gt;, ReadOnlySpan&lt;Single&gt;, Single)**
  275. Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
  276. ```csharp
  277. public static void llama_sample_apply_guidance(SafeLLamaContextHandle ctx, Span<float> logits, ReadOnlySpan<float> logits_guidance, float scale)
  278. ```
  279. #### Parameters
  280. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  281. `logits` [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
  282. Logits extracted from the original generation context.
  283. `logits_guidance` [ReadOnlySpan&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.readonlyspan-1)<br>
  284. Logits extracted from a separate context from the same model.
  285. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
  286. `scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  287. Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
  288. ### **llama_sample_apply_guidance(SafeLLamaContextHandle, Single*, Single*, Single)**
  289. Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
  290. ```csharp
  291. public static void llama_sample_apply_guidance(SafeLLamaContextHandle ctx, Single* logits, Single* logits_guidance, float scale)
  292. ```
  293. #### Parameters
  294. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  295. `logits` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
  296. Logits extracted from the original generation context.
  297. `logits_guidance` [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
  298. Logits extracted from a separate context from the same model.
  299. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
  300. `scale` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  301. Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
  302. ### **llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)**
  303. Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
  304. ```csharp
  305. public static void llama_sample_softmax(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates)
  306. ```
  307. #### Parameters
  308. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  309. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  310. Pointer to LLamaTokenDataArray
  311. ### **llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Int32, UInt64)**
  312. Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
  313. ```csharp
  314. public static void llama_sample_top_k(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, int k, ulong min_keep)
  315. ```
  316. #### Parameters
  317. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  318. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  319. Pointer to LLamaTokenDataArray
  320. `k` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  321. `min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  322. ### **llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
  323. Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
  324. ```csharp
  325. public static void llama_sample_top_p(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
  326. ```
  327. #### Parameters
  328. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  329. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  330. Pointer to LLamaTokenDataArray
  331. `p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  332. `min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  333. ### **llama_sample_min_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
  334. Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
  335. ```csharp
  336. public static void llama_sample_min_p(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
  337. ```
  338. #### Parameters
  339. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  340. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  341. Pointer to LLamaTokenDataArray
  342. `p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  343. `min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  344. ### **llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
  345. Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
  346. ```csharp
  347. public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float z, ulong min_keep)
  348. ```
  349. #### Parameters
  350. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  351. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  352. Pointer to LLamaTokenDataArray
  353. `z` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  354. `min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  355. ### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)**
  356. Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
  357. ```csharp
  358. public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float p, ulong min_keep)
  359. ```
  360. #### Parameters
  361. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  362. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  363. Pointer to LLamaTokenDataArray
  364. `p` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  365. `min_keep` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  366. ### **llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single)**
  367. Dynamic temperature implementation described in the paper https://arxiv.org/abs/2309.02772.
  368. ```csharp
  369. public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float min_temp, float max_temp, float exponent_val)
  370. ```
  371. #### Parameters
  372. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  373. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  374. Pointer to LLamaTokenDataArray
  375. `min_temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  376. `max_temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  377. `exponent_val` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  378. ### **llama_sample_temp(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single)**
  379. Modify logits by temperature
  380. ```csharp
  381. public static void llama_sample_temp(SafeLLamaContextHandle ctx, LLamaTokenDataArrayNative& candidates, float temp)
  382. ```
  383. #### Parameters
  384. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  385. `candidates` [LLamaTokenDataArrayNative&](./llama.native.llamatokendataarraynative&.md)<br>
  386. `temp` [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  387. ### **llama_get_embeddings(SafeLLamaContextHandle)**
  388. Get the embeddings for the input
  389. ```csharp
  390. public static Span<float> llama_get_embeddings(SafeLLamaContextHandle ctx)
  391. ```
  392. #### Parameters
  393. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  394. #### Returns
  395. [Span&lt;Single&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
  396. ### **llama_chat_apply_template(SafeLlamaModelHandle, Char*, LLamaChatMessage*, IntPtr, Boolean, Char*, Int32)**
  397. Apply chat template. Inspired by hf apply_chat_template() on python.
  398. Both "model" and "custom_template" are optional, but at least one is required. "custom_template" has higher precedence than "model"
  399. NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
  400. ```csharp
  401. public static int llama_chat_apply_template(SafeLlamaModelHandle model, Char* tmpl, LLamaChatMessage* chat, IntPtr n_msg, bool add_ass, Char* buf, int length)
  402. ```
  403. #### Parameters
  404. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  405. `tmpl` [Char*](https://docs.microsoft.com/en-us/dotnet/api/system.char*)<br>
  406. A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
  407. `chat` [LLamaChatMessage*](./llama.native.llamachatmessage*.md)<br>
  408. Pointer to a list of multiple llama_chat_message
  409. `n_msg` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  410. Number of llama_chat_message in this chat
  411. `add_ass` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  412. Whether to end the prompt with the token(s) that indicate the start of an assistant message.
  413. `buf` [Char*](https://docs.microsoft.com/en-us/dotnet/api/system.char*)<br>
  414. A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)
  415. `length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  416. The size of the allocated buffer
  417. #### Returns
  418. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  419. The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
  420. ### **llama_token_bos(SafeLlamaModelHandle)**
  421. Get the "Beginning of sentence" token
  422. ```csharp
  423. public static LLamaToken llama_token_bos(SafeLlamaModelHandle model)
  424. ```
  425. #### Parameters
  426. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  427. #### Returns
  428. [LLamaToken](./llama.native.llamatoken.md)<br>
  429. ### **llama_token_eos(SafeLlamaModelHandle)**
  430. Get the "End of sentence" token
  431. ```csharp
  432. public static LLamaToken llama_token_eos(SafeLlamaModelHandle model)
  433. ```
  434. #### Parameters
  435. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  436. #### Returns
  437. [LLamaToken](./llama.native.llamatoken.md)<br>
  438. ### **llama_token_nl(SafeLlamaModelHandle)**
  439. Get the "new line" token
  440. ```csharp
  441. public static LLamaToken llama_token_nl(SafeLlamaModelHandle model)
  442. ```
  443. #### Parameters
  444. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  445. #### Returns
  446. [LLamaToken](./llama.native.llamatoken.md)<br>
  447. ### **llama_add_bos_token(SafeLlamaModelHandle)**
  448. Returns -1 if unknown, 1 for true or 0 for false.
  449. ```csharp
  450. public static int llama_add_bos_token(SafeLlamaModelHandle model)
  451. ```
  452. #### Parameters
  453. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  454. #### Returns
  455. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  456. ### **llama_add_eos_token(SafeLlamaModelHandle)**
  457. Returns -1 if unknown, 1 for true or 0 for false.
  458. ```csharp
  459. public static int llama_add_eos_token(SafeLlamaModelHandle model)
  460. ```
  461. #### Parameters
  462. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  463. #### Returns
  464. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  465. ### **llama_token_prefix(SafeLlamaModelHandle)**
  466. codellama infill tokens, Beginning of infill prefix
  467. ```csharp
  468. public static int llama_token_prefix(SafeLlamaModelHandle model)
  469. ```
  470. #### Parameters
  471. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  472. #### Returns
  473. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  474. ### **llama_token_middle(SafeLlamaModelHandle)**
  475. codellama infill tokens, Beginning of infill middle
  476. ```csharp
  477. public static int llama_token_middle(SafeLlamaModelHandle model)
  478. ```
  479. #### Parameters
  480. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  481. #### Returns
  482. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  483. ### **llama_token_suffix(SafeLlamaModelHandle)**
  484. codellama infill tokens, Beginning of infill suffix
  485. ```csharp
  486. public static int llama_token_suffix(SafeLlamaModelHandle model)
  487. ```
  488. #### Parameters
  489. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  490. #### Returns
  491. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  492. ### **llama_token_eot(SafeLlamaModelHandle)**
  493. codellama infill tokens, End of infill middle
  494. ```csharp
  495. public static int llama_token_eot(SafeLlamaModelHandle model)
  496. ```
  497. #### Parameters
  498. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  499. #### Returns
  500. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  501. ### **llama_print_timings(SafeLLamaContextHandle)**
  502. Print out timing information for this context
  503. ```csharp
  504. public static void llama_print_timings(SafeLLamaContextHandle ctx)
  505. ```
  506. #### Parameters
  507. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  508. ### **llama_reset_timings(SafeLLamaContextHandle)**
  509. Reset all collected timing information for this context
  510. ```csharp
  511. public static void llama_reset_timings(SafeLLamaContextHandle ctx)
  512. ```
  513. #### Parameters
  514. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  515. ### **llama_print_system_info()**
  516. Print system information
  517. ```csharp
  518. public static IntPtr llama_print_system_info()
  519. ```
  520. #### Returns
  521. [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  522. ### **llama_token_to_piece(SafeLlamaModelHandle, LLamaToken, Span&lt;Byte&gt;)**
  523. Convert a single token into text
  524. ```csharp
  525. public static int llama_token_to_piece(SafeLlamaModelHandle model, LLamaToken llamaToken, Span<byte> buffer)
  526. ```
  527. #### Parameters
  528. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  529. `llamaToken` [LLamaToken](./llama.native.llamatoken.md)<br>
  530. `buffer` [Span&lt;Byte&gt;](https://docs.microsoft.com/en-us/dotnet/api/system.span-1)<br>
  531. buffer to write string into
  532. #### Returns
  533. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  534. The length written, or if the buffer is too small a negative that indicates the length required
  535. ### **llama_tokenize(SafeLlamaModelHandle, Byte*, Int32, LLamaToken*, Int32, Boolean, Boolean)**
  536. Convert text into tokens
  537. ```csharp
  538. public static int llama_tokenize(SafeLlamaModelHandle model, Byte* text, int text_len, LLamaToken* tokens, int n_max_tokens, bool add_bos, bool special)
  539. ```
  540. #### Parameters
  541. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  542. `text` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
  543. `text_len` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  544. `tokens` [LLamaToken*](./llama.native.llamatoken*.md)<br>
  545. `n_max_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  546. `add_bos` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  547. `special` [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  548. Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
  549. #### Returns
  550. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  551. Returns the number of tokens on success, no more than n_max_tokens.
  552. Returns a negative number on failure - the number of tokens that would have been returned
  553. ### **llama_log_set(LLamaLogCallback)**
  554. Register a callback to receive llama log messages
  555. ```csharp
  556. public static void llama_log_set(LLamaLogCallback logCallback)
  557. ```
  558. #### Parameters
  559. `logCallback` [LLamaLogCallback](./llama.native.llamalogcallback.md)<br>
  560. ### **llama_kv_cache_clear(SafeLLamaContextHandle)**
  561. Clear the KV cache
  562. ```csharp
  563. public static void llama_kv_cache_clear(SafeLLamaContextHandle ctx)
  564. ```
  565. #### Parameters
  566. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  567. ### **llama_kv_cache_seq_rm(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos)**
  568. Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
  569. ```csharp
  570. public static void llama_kv_cache_seq_rm(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1)
  571. ```
  572. #### Parameters
  573. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  574. `seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
  575. `p0` [LLamaPos](./llama.native.llamapos.md)<br>
  576. `p1` [LLamaPos](./llama.native.llamapos.md)<br>
  577. ### **llama_kv_cache_seq_cp(SafeLLamaContextHandle, LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)**
  578. Copy all tokens that belong to the specified sequence to another sequence
  579. Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
  580. ```csharp
  581. public static void llama_kv_cache_seq_cp(SafeLLamaContextHandle ctx, LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)
  582. ```
  583. #### Parameters
  584. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  585. `src` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
  586. `dest` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
  587. `p0` [LLamaPos](./llama.native.llamapos.md)<br>
  588. `p1` [LLamaPos](./llama.native.llamapos.md)<br>
  589. ### **llama_kv_cache_seq_keep(SafeLLamaContextHandle, LLamaSeqId)**
  590. Removes all tokens that do not belong to the specified sequence
  591. ```csharp
  592. public static void llama_kv_cache_seq_keep(SafeLLamaContextHandle ctx, LLamaSeqId seq)
  593. ```
  594. #### Parameters
  595. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  596. `seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
  597. ### **llama_kv_cache_seq_add(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)**
  598. Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1)
  599. If the KV cache is RoPEd, the KV data is updated accordingly:
  600. - lazily on next llama_decode()
  601. - explicitly with llama_kv_cache_update()
  602. ```csharp
  603. public static void llama_kv_cache_seq_add(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)
  604. ```
  605. #### Parameters
  606. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  607. `seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
  608. `p0` [LLamaPos](./llama.native.llamapos.md)<br>
  609. `p1` [LLamaPos](./llama.native.llamapos.md)<br>
  610. `delta` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  611. ### **llama_kv_cache_seq_div(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)**
  612. Integer division of the positions by factor of `d &gt; 1`
  613. If the KV cache is RoPEd, the KV data is updated accordingly:
  614. - lazily on next llama_decode()
  615. - explicitly with llama_kv_cache_update()
  616. <br>
  617. p0 &lt; 0 : [0, p1]
  618. <br>
  619. p1 &lt; 0 : [p0, inf)
  620. ```csharp
  621. public static void llama_kv_cache_seq_div(SafeLLamaContextHandle ctx, LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int d)
  622. ```
  623. #### Parameters
  624. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  625. `seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
  626. `p0` [LLamaPos](./llama.native.llamapos.md)<br>
  627. `p1` [LLamaPos](./llama.native.llamapos.md)<br>
  628. `d` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  629. ### **llama_kv_cache_seq_pos_max(SafeLLamaContextHandle, LLamaSeqId)**
  630. Returns the largest position present in the KV cache for the specified sequence
  631. ```csharp
  632. public static LLamaPos llama_kv_cache_seq_pos_max(SafeLLamaContextHandle ctx, LLamaSeqId seq)
  633. ```
  634. #### Parameters
  635. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  636. `seq` [LLamaSeqId](./llama.native.llamaseqid.md)<br>
  637. #### Returns
  638. [LLamaPos](./llama.native.llamapos.md)<br>
  639. ### **llama_kv_cache_defrag(SafeLLamaContextHandle)**
  640. Defragment the KV cache. This will be applied:
  641. - lazily on next llama_decode()
  642. - explicitly with llama_kv_cache_update()
  643. ```csharp
  644. public static LLamaPos llama_kv_cache_defrag(SafeLLamaContextHandle ctx)
  645. ```
  646. #### Parameters
  647. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  648. #### Returns
  649. [LLamaPos](./llama.native.llamapos.md)<br>
  650. ### **llama_kv_cache_update(SafeLLamaContextHandle)**
  651. Apply the KV cache updates (such as K-shifts, defragmentation, etc.)
  652. ```csharp
  653. public static void llama_kv_cache_update(SafeLLamaContextHandle ctx)
  654. ```
  655. #### Parameters
  656. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  657. ### **llama_batch_init(Int32, Int32, Int32)**
  658. Allocates a batch of tokens on the heap
  659. Each token can be assigned up to n_seq_max sequence ids
  660. The batch has to be freed with llama_batch_free()
  661. If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float)
  662. Otherwise, llama_batch.token will be allocated to store n_tokens llama_token
  663. The rest of the llama_batch members are allocated with size n_tokens
  664. All members are left uninitialized
  665. ```csharp
  666. public static LLamaNativeBatch llama_batch_init(int n_tokens, int embd, int n_seq_max)
  667. ```
  668. #### Parameters
  669. `n_tokens` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  670. `embd` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  671. `n_seq_max` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  672. Each token can be assigned up to n_seq_max sequence ids
  673. #### Returns
  674. [LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
  675. ### **llama_batch_free(LLamaNativeBatch)**
  676. Frees a batch of tokens allocated with llama_batch_init()
  677. ```csharp
  678. public static void llama_batch_free(LLamaNativeBatch batch)
  679. ```
  680. #### Parameters
  681. `batch` [LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
  682. ### **llama_decode(SafeLLamaContextHandle, LLamaNativeBatch)**
  683. ```csharp
  684. public static int llama_decode(SafeLLamaContextHandle ctx, LLamaNativeBatch batch)
  685. ```
  686. #### Parameters
  687. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  688. `batch` [LLamaNativeBatch](./llama.native.llamanativebatch.md)<br>
  689. #### Returns
  690. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  691. Positive return values does not mean a fatal error, but rather a warning:<br>
  692. - 0: success<br>
  693. - 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)<br>
  694. - &lt; 0: error<br>
  695. ### **llama_kv_cache_view_init(SafeLLamaContextHandle, Int32)**
  696. Create an empty KV cache view. (use only for debugging purposes)
  697. ```csharp
  698. public static LLamaKvCacheView llama_kv_cache_view_init(SafeLLamaContextHandle ctx, int n_max_seq)
  699. ```
  700. #### Parameters
  701. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  702. `n_max_seq` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  703. #### Returns
  704. [LLamaKvCacheView](./llama.native.llamakvcacheview.md)<br>
  705. ### **llama_kv_cache_view_free(LLamaKvCacheView&)**
  706. Free a KV cache view. (use only for debugging purposes)
  707. ```csharp
  708. public static void llama_kv_cache_view_free(LLamaKvCacheView& view)
  709. ```
  710. #### Parameters
  711. `view` [LLamaKvCacheView&](./llama.native.llamakvcacheview&.md)<br>
  712. ### **llama_kv_cache_view_update(SafeLLamaContextHandle, LLamaKvCacheView&)**
  713. Update the KV cache view structure with the current state of the KV cache. (use only for debugging purposes)
  714. ```csharp
  715. public static void llama_kv_cache_view_update(SafeLLamaContextHandle ctx, LLamaKvCacheView& view)
  716. ```
  717. #### Parameters
  718. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  719. `view` [LLamaKvCacheView&](./llama.native.llamakvcacheview&.md)<br>
  720. ### **llama_get_kv_cache_token_count(SafeLLamaContextHandle)**
  721. Returns the number of tokens in the KV cache (slow, use only for debug)
  722. If a KV cell has multiple sequences assigned to it, it will be counted multiple times
  723. ```csharp
  724. public static int llama_get_kv_cache_token_count(SafeLLamaContextHandle ctx)
  725. ```
  726. #### Parameters
  727. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  728. #### Returns
  729. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  730. ### **llama_get_kv_cache_used_cells(SafeLLamaContextHandle)**
  731. Returns the number of used KV cells (i.e. have at least one sequence assigned to them)
  732. ```csharp
  733. public static int llama_get_kv_cache_used_cells(SafeLLamaContextHandle ctx)
  734. ```
  735. #### Parameters
  736. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  737. #### Returns
  738. [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  739. ### **llama_beam_search(SafeLLamaContextHandle, LLamaBeamSearchCallback, IntPtr, UInt64, Int32, Int32, Int32)**
  740. Deterministically returns entire sentence constructed by a beam search.
  741. ```csharp
  742. public static void llama_beam_search(SafeLLamaContextHandle ctx, LLamaBeamSearchCallback callback, IntPtr callback_data, ulong n_beams, int n_past, int n_predict, int n_threads)
  743. ```
  744. #### Parameters
  745. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  746. Pointer to the llama_context.
  747. `callback` [LLamaBeamSearchCallback](./llama.native.nativeapi.llamabeamsearchcallback.md)<br>
  748. Invoked for each iteration of the beam_search loop, passing in beams_state.
  749. `callback_data` [IntPtr](https://docs.microsoft.com/en-us/dotnet/api/system.intptr)<br>
  750. A pointer that is simply passed back to callback.
  751. `n_beams` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  752. Number of beams to use.
  753. `n_past` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  754. Number of tokens already evaluated.
  755. `n_predict` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  756. Maximum number of tokens to predict. EOS may occur earlier.
  757. `n_threads` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  758. Number of threads.
  759. ### **llama_empty_call()**
  760. A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
  761. ```csharp
  762. public static void llama_empty_call()
  763. ```
  764. ### **llama_max_devices()**
  765. Get the maximum number of devices supported by llama.cpp
  766. ```csharp
  767. public static long llama_max_devices()
  768. ```
  769. #### Returns
  770. [Int64](https://docs.microsoft.com/en-us/dotnet/api/system.int64)<br>
  771. ### **llama_model_default_params()**
  772. Create a LLamaModelParams with default values
  773. ```csharp
  774. public static LLamaModelParams llama_model_default_params()
  775. ```
  776. #### Returns
  777. [LLamaModelParams](./llama.native.llamamodelparams.md)<br>
  778. ### **llama_context_default_params()**
  779. Create a LLamaContextParams with default values
  780. ```csharp
  781. public static LLamaContextParams llama_context_default_params()
  782. ```
  783. #### Returns
  784. [LLamaContextParams](./llama.native.llamacontextparams.md)<br>
  785. ### **llama_model_quantize_default_params()**
  786. Create a LLamaModelQuantizeParams with default values
  787. ```csharp
  788. public static LLamaModelQuantizeParams llama_model_quantize_default_params()
  789. ```
  790. #### Returns
  791. [LLamaModelQuantizeParams](./llama.native.llamamodelquantizeparams.md)<br>
  792. ### **llama_supports_mmap()**
  793. Check if memory mapping is supported
  794. ```csharp
  795. public static bool llama_supports_mmap()
  796. ```
  797. #### Returns
  798. [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  799. ### **llama_supports_mlock()**
  800. Check if memory locking is supported
  801. ```csharp
  802. public static bool llama_supports_mlock()
  803. ```
  804. #### Returns
  805. [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  806. ### **llama_supports_gpu_offload()**
  807. Check if GPU offload is supported
  808. ```csharp
  809. public static bool llama_supports_gpu_offload()
  810. ```
  811. #### Returns
  812. [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  813. ### **llama_set_rng_seed(SafeLLamaContextHandle, UInt32)**
  814. Sets the current rng seed.
  815. ```csharp
  816. public static void llama_set_rng_seed(SafeLLamaContextHandle ctx, uint seed)
  817. ```
  818. #### Parameters
  819. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  820. `seed` [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
  821. ### **llama_get_state_size(SafeLLamaContextHandle)**
  822. Returns the maximum size in bytes of the state (rng, logits, embedding
  823. and kv_cache) - will often be smaller after compacting tokens
  824. ```csharp
  825. public static ulong llama_get_state_size(SafeLLamaContextHandle ctx)
  826. ```
  827. #### Parameters
  828. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  829. #### Returns
  830. [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  831. ### **llama_copy_state_data(SafeLLamaContextHandle, Byte*)**
  832. Copies the state to the specified destination address.
  833. Destination needs to have allocated enough memory.
  834. ```csharp
  835. public static ulong llama_copy_state_data(SafeLLamaContextHandle ctx, Byte* dest)
  836. ```
  837. #### Parameters
  838. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  839. `dest` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
  840. #### Returns
  841. [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  842. the number of bytes copied
  843. ### **llama_set_state_data(SafeLLamaContextHandle, Byte*)**
  844. Set the state reading from the specified address
  845. ```csharp
  846. public static ulong llama_set_state_data(SafeLLamaContextHandle ctx, Byte* src)
  847. ```
  848. #### Parameters
  849. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  850. `src` [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
  851. #### Returns
  852. [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  853. the number of bytes read
  854. ### **llama_load_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)**
  855. Load session file
  856. ```csharp
  857. public static bool llama_load_session_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens_out, ulong n_token_capacity, UInt64& n_token_count_out)
  858. ```
  859. #### Parameters
  860. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  861. `path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  862. `tokens_out` [LLamaToken[]](./llama.native.llamatoken.md)<br>
  863. `n_token_capacity` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  864. `n_token_count_out` [UInt64&](https://docs.microsoft.com/en-us/dotnet/api/system.uint64&)<br>
  865. #### Returns
  866. [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  867. ### **llama_save_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)**
  868. Save session file
  869. ```csharp
  870. public static bool llama_save_session_file(SafeLLamaContextHandle ctx, string path_session, LLamaToken[] tokens, ulong n_token_count)
  871. ```
  872. #### Parameters
  873. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  874. `path_session` [String](https://docs.microsoft.com/en-us/dotnet/api/system.string)<br>
  875. `tokens` [LLamaToken[]](./llama.native.llamatoken.md)<br>
  876. `n_token_count` [UInt64](https://docs.microsoft.com/en-us/dotnet/api/system.uint64)<br>
  877. #### Returns
  878. [Boolean](https://docs.microsoft.com/en-us/dotnet/api/system.boolean)<br>
  879. ### **llama_token_get_text(SafeLlamaModelHandle, LLamaToken)**
  880. ```csharp
  881. public static Byte* llama_token_get_text(SafeLlamaModelHandle model, LLamaToken token)
  882. ```
  883. #### Parameters
  884. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  885. `token` [LLamaToken](./llama.native.llamatoken.md)<br>
  886. #### Returns
  887. [Byte*](https://docs.microsoft.com/en-us/dotnet/api/system.byte*)<br>
  888. ### **llama_token_get_score(SafeLlamaModelHandle, LLamaToken)**
  889. ```csharp
  890. public static float llama_token_get_score(SafeLlamaModelHandle model, LLamaToken token)
  891. ```
  892. #### Parameters
  893. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  894. `token` [LLamaToken](./llama.native.llamatoken.md)<br>
  895. #### Returns
  896. [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single)<br>
  897. ### **llama_token_get_type(SafeLlamaModelHandle, LLamaToken)**
  898. ```csharp
  899. public static LLamaTokenType llama_token_get_type(SafeLlamaModelHandle model, LLamaToken token)
  900. ```
  901. #### Parameters
  902. `model` [SafeLlamaModelHandle](./llama.native.safellamamodelhandle.md)<br>
  903. `token` [LLamaToken](./llama.native.llamatoken.md)<br>
  904. #### Returns
  905. [LLamaTokenType](./llama.native.llamatokentype.md)<br>
  906. ### **llama_n_ctx(SafeLLamaContextHandle)**
  907. Get the size of the context window for the model for this context
  908. ```csharp
  909. public static uint llama_n_ctx(SafeLLamaContextHandle ctx)
  910. ```
  911. #### Parameters
  912. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  913. #### Returns
  914. [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
  915. ### **llama_n_batch(SafeLLamaContextHandle)**
  916. Get the batch size for this context
  917. ```csharp
  918. public static uint llama_n_batch(SafeLLamaContextHandle ctx)
  919. ```
  920. #### Parameters
  921. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  922. #### Returns
  923. [UInt32](https://docs.microsoft.com/en-us/dotnet/api/system.uint32)<br>
  924. ### **llama_get_logits(SafeLLamaContextHandle)**
  925. Token logits obtained from the last call to llama_decode
  926. The logits for the last token are stored in the last row
  927. Can be mutated in order to change the probabilities of the next token.<br>
  928. Rows: n_tokens<br>
  929. Cols: n_vocab
  930. ```csharp
  931. public static Single* llama_get_logits(SafeLLamaContextHandle ctx)
  932. ```
  933. #### Parameters
  934. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  935. #### Returns
  936. [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
  937. ### **llama_get_logits_ith(SafeLLamaContextHandle, Int32)**
  938. Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
  939. ```csharp
  940. public static Single* llama_get_logits_ith(SafeLLamaContextHandle ctx, int i)
  941. ```
  942. #### Parameters
  943. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  944. `i` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  945. #### Returns
  946. [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>
  947. ### **llama_get_embeddings_ith(SafeLLamaContextHandle, Int32)**
  948. Get the embeddings for the ith sequence. Equivalent to: llama_get_embeddings(ctx) + i*n_embd
  949. ```csharp
  950. public static Single* llama_get_embeddings_ith(SafeLLamaContextHandle ctx, int i)
  951. ```
  952. #### Parameters
  953. `ctx` [SafeLLamaContextHandle](./llama.native.safellamacontexthandle.md)<br>
  954. `i` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
  955. #### Returns
  956. [Single*](https://docs.microsoft.com/en-us/dotnet/api/system.single*)<br>