You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

LLamaModelQuantizeParams.cs 2.8 kB

April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
  1. using System;
  2. using System.Runtime.InteropServices;
  3. namespace LLama.Native
  4. {
  5. /// <summary>
  6. /// Quantizer parameters used in the native API
  7. /// </summary>
  8. /// <remarks>llama_model_quantize_params</remarks>
  9. [StructLayout(LayoutKind.Sequential)]
  10. public struct LLamaModelQuantizeParams
  11. {
  12. /// <summary>
  13. /// number of threads to use for quantizing, if &lt;=0 will use std::thread::hardware_concurrency()
  14. /// </summary>
  15. public int nthread;
  16. /// <summary>
  17. /// quantize to this llama_ftype
  18. /// </summary>
  19. public LLamaFtype ftype;
  20. /// <summary>
  21. /// output tensor type
  22. /// </summary>
  23. public GGMLType output_tensor_type;
  24. /// <summary>
  25. /// itoken embeddings tensor type
  26. /// </summary>
  27. public GGMLType token_embedding_type;
  28. /// <summary>
  29. /// allow quantizing non-f32/f16 tensors
  30. /// </summary>
  31. public bool allow_requantize
  32. {
  33. get => Convert.ToBoolean(_allow_requantize);
  34. set => _allow_requantize = Convert.ToSByte(value);
  35. }
  36. private sbyte _allow_requantize;
  37. /// <summary>
  38. /// quantize output.weight
  39. /// </summary>
  40. public bool quantize_output_tensor
  41. {
  42. get => Convert.ToBoolean(_quantize_output_tensor);
  43. set => _quantize_output_tensor = Convert.ToSByte(value);
  44. }
  45. private sbyte _quantize_output_tensor;
  46. /// <summary>
  47. /// only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
  48. /// </summary>
  49. public bool only_copy
  50. {
  51. get => Convert.ToBoolean(_only_copy);
  52. set => _only_copy = Convert.ToSByte(value);
  53. }
  54. private sbyte _only_copy;
  55. /// <summary>
  56. /// quantize all tensors to the default type
  57. /// </summary>
  58. public bool pure
  59. {
  60. get => Convert.ToBoolean(_pure);
  61. set => _pure = Convert.ToSByte(value);
  62. }
  63. private sbyte _pure;
  64. /// <summary>
  65. /// pointer to importance matrix data
  66. /// </summary>
  67. public IntPtr imatrix;
  68. /// <summary>
  69. /// pointer to vector containing overrides
  70. /// </summary>
  71. public IntPtr kv_overrides;
  72. /// <summary>
  73. /// Create a LLamaModelQuantizeParams with default values
  74. /// </summary>
  75. /// <returns></returns>
  76. public static LLamaModelQuantizeParams Default()
  77. {
  78. return llama_model_quantize_default_params();
  79. [DllImport(NativeApi.libraryName, CallingConvention = CallingConvention.Cdecl)]
  80. static extern LLamaModelQuantizeParams llama_model_quantize_default_params();
  81. }
  82. }
  83. }