Rinne
495177fd0f
fix: typos.
1 year ago
Martin Evans
c325ac9127
April 2024 Binary Update ( #662 )
* Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669 ) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`.
- Added all new functions.
- Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs`
- Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here.
- Changed all token properties to return nullable tokens, to handle some models not having some tokens.
- Fixed `DefaultSamplingPipeline` to handle no newline token in some models.
* Moved native methods to more specific locations.
- Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already.
- Checking that GPU layer count is zero if GPU offload is not supported.
- Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs.
* Removed exception if `GpuLayerCount > 0` when GPU is not supported.
* - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle`
- Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext`
- Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle`
* Added update and defrag methods for KV cache in `SafeLLamaContextHandle`
* Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`
* Passing the sequence ID when saving a single sequence state
1 year ago
Martin Evans
a8ba9f05b3
March Binary Update ( #565 )
* Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586 )
* Added abort callback
* Added properties to get/set thread count on `LLamaContext`
* Fixed LLamaLogLevel numbering
1 year ago
Martin Evans
15a98b36d8
Updated everything to work with llama.cpp ce32060198
1 year ago
Martin Evans
42be9b136d
Switched form using raw integers, to a `LLamaToken` struct
1 year ago
Martin Evans
b868b056f7
Added metadata overrides to `IModelParams`
1 year ago
Martin Evans
439d14a061
Updated binaries:
- build run: https://github.com/SciSharp/LLamaSharp/actions/runs/7196891440
- commit: 9fb13f9584
1 year ago
Martin Evans
b34f72a883
- Added `SamplingPipeline` to inference params which overrides all other options with an entirely custom pipeline.
- Added a `Sample` method to `LLamaContext` which uses a custom pipeline
- Modified all executors to use the custom pipeline if it exists
1 year ago
Martin Evans
89fef05362
This commit ( 5fe721bdbe) accidentally removed a load of stuff that it shouldn't. Fixed that.
Originally from these PRs:
- https://github.com/SciSharp/LLamaSharp/pull/263
- https://github.com/SciSharp/LLamaSharp/pull/259
2 years ago
SignalRT
46ace3ddd7
Add targets in Web project
This allow to copy binaries and make the project work.
Update example model in appsettings
2 years ago
Martin Evans
e3468d04f0
Merge pull request #277 from martindevans/feature/min_p
MinP Sampler
2 years ago
Martin Evans
d743516070
- Added support for the MinP sampler
- Cleaned up comments in implementations of `IInferenceParams`
- Removed default values for all parameters in `LLamaContext.Sample` - they're never used and probably _shouldn't_ ever be used
2 years ago
SignalRT
97006a214f
Merge remote-tracking branch 'upstream/master' into RuntimeDetection
2 years ago
Martin Evans
31244ae691
Merge branch 'master' into YaRN_scaling_parameters
2 years ago
SignalRT
5fe721bdbe
Revert "Merge branch 'pr/268' into RuntimeDetection"
This reverts commit 091b8d58b3502a99b3bfbec9db457c92cc736beb, reversing
changes made to 9b2ca9cf8e .
2 years ago
Martin Evans
db1bc741b0
Modified `ContextSize` in parameters to be nullable. A null value means autodetect from the model.
2 years ago
Martin Evans
04ee64a6be
Exposed YaRN scaling parameters in IContextParams
2 years ago
Martin Evans
529b06b35b
- Fixed rope frequency/base to use the values in the model by default, instead of always overriding them by default!
2 years ago
Martin Evans
c786fb0ec8
Using `IReadOnlyList` instead of `IEnumerable` in `IInferenceParams`
2 years ago
Martin Evans
6a4cd506bd
Added a safe `TensorSplitsCollection` to the params which prevents incorrectly setting the `tensor_splits` collection
2 years ago
Martin Evans
18b15184ea
Added logger parameter in to LLama.Web context creation
2 years ago
sa_ddam213
952e77f97b
Remove old parameter
2 years ago
sa_ddam213
b4b4000342
Merge branch 'master' into upstream_master
# Conflicts:
# LLama.Web/Common/ModelOptions.cs
# LLama.Web/Services/ConnectionSessionService.cs
# LLama/LLamaStatelessExecutor.cs
# LLama/LLamaWeights.cs
2 years ago
Martin Evans
2a38808bca
- Added threads to context params, replaced all thread args with `uint?`
- Replaced all binaries
2 years ago
sa_ddam213
a8a498dc12
Fix up issues found during testing
2 years ago
sa_ddam213
9b8de007dc
Propagate ILogger
2 years ago
sa_ddam213
e2a17d6b6f
Refactor conflicting object name SessionOptions
2 years ago
sa_ddam213
44f1b91c29
Update Web to support version 0.5.1
2 years ago
sa_ddam213
c9108f8311
Add service for managing Models and Model Contexts
2 years ago
Martin Evans
0f03e8f1a3
Added workaround to LLama.Web and LLama.WebAPI
2 years ago
Martin Evans
669ae47ef7
- Split parameters into two interfaces
- params contains a list of loras, instead of just one
2 years ago
Martin Evans
bca55eace0
Initial changes to match the llama.cpp changes
2 years ago
Martin Evans
2056078aef
Initial changes required for GGUF support
2 years ago
Martin Evans
93f24f8a51
Switched to properly typed `Encoding` property
2 years ago
Martin Evans
759ae26f36
Merge branch 'master' into grammar_basics
2 years ago
Martin Evans
a9e6f21ab8
- Creating and destroying contexts in the stateless executor, saving memory. It now uses zero memory when not inferring!
- Passing encoding in the `IModelParams`, which reduces how often encoding needs to be passed around
2 years ago
Martin Evans
64416ca23c
- Created a slightly nicer way to create grammar (from `IReadOnlyList<IReadOnlyList<LLamaGrammarElement>>`)
- Integrated grammar into sampling
- Added a test for the grammar sampling
2 years ago
Martin Evans
f3511e390f
WIP demonstrating changes to support multi-context. You can see this in use in `TalkToYourself`, along with notes on what still needs improving.
The biggest single change is renaming `LLamaModel` to `LLamaContext`
2 years ago
Martin Evans
2c933c57a1
Fixed ModelOptions in Web project
2 years ago
sa_ddam213
bac9cba01a
InferenceParams abstractions
2 years ago
sa_ddam213
2a04e31b7d
ModelParams abstraction
2 years ago
sa_ddam213
3fec7a63c7
Add Instruct and Stateless support
2 years ago
sa_ddam213
a32a5e4ffe
Decouple connectionId from ModelSession
2 years ago
sa_ddam213
d9fbd56f10
Strongly type connection status
2 years ago
sa_ddam213
ef8cf0b283
Add RequestVerificationToken logic fo ajax prefilter, Tidy up js cancel logic
2 years ago
sa_ddam213
e574d89a40
Send prompt on Enter key
2 years ago
sa_ddam213
a139423581
Move session management to service, Use ILLamaExecutor in session to make more versatile, scroll bug
2 years ago
sa_ddam213
1ec59e120a
Move session management to service, Infer cancel support
2 years ago
sa_ddam213
fd215dce84
Update Readme
2 years ago
sa_ddam213
21b685649f
Add Readme
2 years ago