You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

LLamaBeamsState.cs 1.3 kB

April 2024 Binary Update (#662) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state
1 year ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
  1. using System;
  2. using System.Runtime.InteropServices;
  3. namespace LLama.Native;
  4. /// <summary>
  5. /// Passed to beam_search_callback function.
  6. /// Whenever 0 &lt; common_prefix_length, this number of tokens should be copied from any of the beams
  7. /// (e.g. beams[0]) as they will be removed (shifted) from all beams in all subsequent callbacks.
  8. /// </summary>
  9. [StructLayout(LayoutKind.Sequential)]
  10. public struct LLamaBeamsState
  11. {
  12. /// <summary>
  13. /// The state of each individual beam
  14. /// </summary>
  15. private unsafe LLamaBeamView* beam_views;
  16. /// <summary>
  17. /// Number of elements in beam_views
  18. /// </summary>
  19. private nuint n_beams;
  20. /// <summary>
  21. /// Current max length of prefix tokens shared by all beams.
  22. /// </summary>
  23. public ulong CommonPrefixLength;
  24. /// <summary>
  25. /// True iff this is the last callback invocation.
  26. /// </summary>
  27. public bool LastCall;
  28. /// <summary>
  29. /// The current state of each beam
  30. /// </summary>
  31. public Span<LLamaBeamView> Beams
  32. {
  33. get
  34. {
  35. unsafe
  36. {
  37. if (n_beams > int.MaxValue)
  38. throw new InvalidOperationException("More than 2147483647 beams is not supported");
  39. return new Span<LLamaBeamView>(beam_views, (int)n_beams);
  40. }
  41. }
  42. }
  43. }