You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

LLamaModelParams.cs 3.2 kB

April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
  1. using System;
  2. using System.Runtime.InteropServices;
  3. namespace LLama.Native
  4. {
  5. /// <summary>
  6. /// A C# representation of the llama.cpp `llama_model_params` struct
  7. /// </summary>
  8. [StructLayout(LayoutKind.Sequential)]
  9. public unsafe struct LLamaModelParams
  10. {
  11. /// <summary>
  12. /// // number of layers to store in VRAM
  13. /// </summary>
  14. public int n_gpu_layers;
  15. /// <summary>
  16. /// how to split the model across multiple GPUs
  17. /// </summary>
  18. public GPUSplitMode split_mode;
  19. /// <summary>
  20. /// the GPU that is used for scratch and small tensors
  21. /// </summary>
  22. public int main_gpu;
  23. /// <summary>
  24. /// how to split layers across multiple GPUs (size: <see cref="NativeApi.llama_max_devices"/>)
  25. /// </summary>
  26. public float* tensor_split;
  27. /// <summary>
  28. /// called with a progress value between 0 and 1, pass NULL to disable. If the provided progress_callback
  29. /// returns true, model loading continues. If it returns false, model loading is immediately aborted.
  30. /// </summary>
  31. #if NETSTANDARD2_0
  32. // this code is intended to be used when running LlamaSharp on NET Framework 4.8 (NET Standard 2.0)
  33. // as NET Framework 4.8 does not play nice with the LlamaProgressCallback type
  34. public IntPtr progress_callback;
  35. #else
  36. public LlamaProgressCallback? progress_callback;
  37. #endif
  38. /// <summary>
  39. /// context pointer passed to the progress callback
  40. /// </summary>
  41. public void* progress_callback_user_data;
  42. /// <summary>
  43. /// override key-value pairs of the model meta data
  44. /// </summary>
  45. public LLamaModelMetadataOverride* kv_overrides;
  46. /// <summary>
  47. /// only load the vocabulary, no weights
  48. /// </summary>
  49. public bool vocab_only
  50. {
  51. readonly get => Convert.ToBoolean(_vocab_only);
  52. set => _vocab_only = Convert.ToSByte(value);
  53. }
  54. private sbyte _vocab_only;
  55. /// <summary>
  56. /// use mmap if possible
  57. /// </summary>
  58. public bool use_mmap
  59. {
  60. readonly get => Convert.ToBoolean(_use_mmap);
  61. set => _use_mmap = Convert.ToSByte(value);
  62. }
  63. private sbyte _use_mmap;
  64. /// <summary>
  65. /// force system to keep model in RAM
  66. /// </summary>
  67. public bool use_mlock
  68. {
  69. readonly get => Convert.ToBoolean(_use_mlock);
  70. set => _use_mlock = Convert.ToSByte(value);
  71. }
  72. private sbyte _use_mlock;
  73. /// <summary>
  74. /// Create a LLamaModelParams with default values
  75. /// </summary>
  76. /// <returns></returns>
  77. public static LLamaModelParams Default()
  78. {
  79. return llama_model_default_params();
  80. [DllImport(NativeApi.libraryName, CallingConvention = CallingConvention.Cdecl)]
  81. static extern LLamaModelParams llama_model_default_params();
  82. }
  83. }
  84. }