You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

LLamaKvCacheView.cs 5.2 kB

April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
  1. using System;
  2. using System.Runtime.InteropServices;
  3. namespace LLama.Native;
  4. /// <summary>
  5. /// Information associated with an individual cell in the KV cache view (llama_kv_cache_view_cell)
  6. /// </summary>
  7. [StructLayout(LayoutKind.Sequential)]
  8. public struct LLamaKvCacheViewCell
  9. {
  10. /// <summary>
  11. /// The position for this cell. Takes KV cache shifts into account.
  12. /// May be negative if the cell is not populated.
  13. /// </summary>
  14. public LLamaPos pos;
  15. }
  16. /// <summary>
  17. /// An updateable view of the KV cache (llama_kv_cache_view)
  18. /// </summary>
  19. [StructLayout(LayoutKind.Sequential)]
  20. public unsafe struct LLamaKvCacheView
  21. {
  22. // Number of KV cache cells. This will be the same as the context size.
  23. int n_cells;
  24. // Maximum number of sequences that can exist in a cell. It's not an error
  25. // if there are more sequences in a cell than this value, however they will
  26. // not be visible in the view cells_sequences.
  27. int n_seq_max;
  28. // Number of tokens in the cache. For example, if there are two populated
  29. // cells, the first with 1 sequence id in it and the second with 2 sequence
  30. // ids then you'll have 3 tokens.
  31. int token_count;
  32. // Number of populated cache cells.
  33. int used_cells;
  34. // Maximum contiguous empty slots in the cache.
  35. int max_contiguous;
  36. // Index to the start of the max_contiguous slot range. Can be negative
  37. // when cache is full.
  38. int max_contiguous_idx;
  39. // Information for an individual cell.
  40. LLamaKvCacheViewCell* cells;
  41. // The sequences for each cell. There will be n_seq_max items per cell.
  42. LLamaSeqId* cells_sequences;
  43. }
  44. /// <summary>
  45. /// A safe handle for a LLamaKvCacheView
  46. /// </summary>
  47. public class LLamaKvCacheViewSafeHandle
  48. : SafeLLamaHandleBase
  49. {
  50. private readonly SafeLLamaContextHandle _ctx;
  51. private LLamaKvCacheView _view;
  52. /// <summary>
  53. /// Initialize a LLamaKvCacheViewSafeHandle which will call `llama_kv_cache_view_free` when disposed
  54. /// </summary>
  55. /// <param name="ctx"></param>
  56. /// <param name="view"></param>
  57. public LLamaKvCacheViewSafeHandle(SafeLLamaContextHandle ctx, LLamaKvCacheView view)
  58. : base((IntPtr)1, true)
  59. {
  60. _ctx = ctx;
  61. _view = view;
  62. }
  63. /// <summary>
  64. /// Allocate a new KV cache view which can be used to inspect the KV cache
  65. /// </summary>
  66. /// <param name="ctx"></param>
  67. /// <param name="maxSequences">The maximum number of sequences visible in this view per cell</param>
  68. /// <returns></returns>
  69. public static LLamaKvCacheViewSafeHandle Allocate(SafeLLamaContextHandle ctx, int maxSequences)
  70. {
  71. var result = NativeApi.llama_kv_cache_view_init(ctx, maxSequences);
  72. return new LLamaKvCacheViewSafeHandle(ctx, result);
  73. }
  74. /// <inheritdoc />
  75. protected override bool ReleaseHandle()
  76. {
  77. NativeApi.llama_kv_cache_view_free(ref _view);
  78. SetHandle(IntPtr.Zero);
  79. return true;
  80. }
  81. /// <summary>
  82. /// Update this view
  83. /// </summary>
  84. public void Update()
  85. {
  86. NativeApi.llama_kv_cache_view_update(_ctx, ref _view);
  87. }
  88. /// <summary>
  89. /// Get the raw KV cache view
  90. /// </summary>
  91. /// <returns></returns>
  92. public ref LLamaKvCacheView GetView()
  93. {
  94. return ref _view;
  95. }
  96. }
  97. public static partial class NativeApi
  98. {
  99. /// <summary>
  100. /// Create an empty KV cache view. (use only for debugging purposes)
  101. /// </summary>
  102. /// <param name="ctx"></param>
  103. /// <param name="n_seq_max"></param>
  104. /// <returns></returns>
  105. [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)]
  106. public static extern LLamaKvCacheView llama_kv_cache_view_init(SafeLLamaContextHandle ctx, int n_seq_max);
  107. /// <summary>
  108. /// Free a KV cache view. (use only for debugging purposes)
  109. /// </summary>
  110. [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)]
  111. public static extern void llama_kv_cache_view_free(ref LLamaKvCacheView view);
  112. /// <summary>
  113. /// Update the KV cache view structure with the current state of the KV cache. (use only for debugging purposes)
  114. /// </summary>
  115. /// <param name="ctx"></param>
  116. /// <param name="view"></param>
  117. [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)]
  118. public static extern void llama_kv_cache_view_update(SafeLLamaContextHandle ctx, ref LLamaKvCacheView view);
  119. /// <summary>
  120. /// Returns the number of tokens in the KV cache (slow, use only for debug)
  121. /// If a KV cell has multiple sequences assigned to it, it will be counted multiple times
  122. /// </summary>
  123. /// <param name="ctx"></param>
  124. /// <returns></returns>
  125. [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)]
  126. public static extern int llama_get_kv_cache_token_count(SafeLLamaContextHandle ctx);
  127. /// <summary>
  128. /// Returns the number of used KV cells (i.e. have at least one sequence assigned to them)
  129. /// </summary>
  130. /// <param name="ctx"></param>
  131. /// <returns></returns>
  132. [DllImport(libraryName, CallingConvention = CallingConvention.Cdecl)]
  133. public static extern int llama_get_kv_cache_used_cells(SafeLLamaContextHandle ctx);
  134. }