SignalRT
e8732efadd
Example InteractiveExecutor
Add an Example and modifications to the interactive executor to enable Llava Models.
Just a preview / demo
1 year ago
Martin Evans
a8ba9f05b3
March Binary Update ( #565 )
* Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586 )
* Added abort callback
* Added properties to get/set thread count on `LLamaContext`
* Fixed LLamaLogLevel numbering
1 year ago
Martin Evans
8ac1634233
Removed `llama_eval`. It is going to be completely removed in the next version of llama.cpp ( #553 )
1 year ago
Martin Evans
a690db5d3e
Fixed build error caused by extra unnecessary parameter
1 year ago
Martin Evans
a2e29d393c
Swapped `StatelessExecutor` to use `llama_decode`!
- Added `logits_i` argument to `Context.ApplyPenalty`
- Added a new exception type for `llama_decode` return code
1 year ago
Martin Evans
f160fbd6d1
Added a check for EOS token in LLamaStatelessExecutor
1 year ago
Martin Evans
2eb52b1630
made casts to/from int explicit, fixed places affected
1 year ago
Martin Evans
42be9b136d
Switched form using raw integers, to a `LLamaToken` struct
1 year ago
Martin Evans
82d84afaea
Resetting the custom sampling pipeline in the stateless executor
1 year ago
Martin Evans
b34f72a883
- Added `SamplingPipeline` to inference params which overrides all other options with an entirely custom pipeline.
- Added a `Sample` method to `LLamaContext` which uses a custom pipeline
- Modified all executors to use the custom pipeline if it exists
1 year ago
Martin Evans
d743516070
- Added support for the MinP sampler
- Cleaned up comments in implementations of `IInferenceParams`
- Removed default values for all parameters in `LLamaContext.Sample` - they're never used and probably _shouldn't_ ever be used
2 years ago
Martin Evans
7e3cde4c13
Moved helper methods into `LLamaBatchSafeHandle`
2 years ago
Martin Evans
ccb8afae46
Cleaned up stateless executor as preparation for changing it to use the new batched decoding system.
2 years ago
Martin Evans
a03fe003de
Fixed decoding of text "accumulating" over time (never properly clearing buffer)
2 years ago
Martin Evans
51d4411a58
Added two new classes for detokenization tasks:
- `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected.
- `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens.
Added tests for these classes and updated StatelessExecutor to use them.
Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).
2 years ago
Martin Evans
efdf3d630c
- Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens).
- Built a new (hacky) `Detokenize` method which handles this
2 years ago
Martin Evans
f1e5a8f995
- Passing the `ILogger` through to every call of `CreateContext`
- Passing `ILogger` into executors
2 years ago
sa_ddam213
4ec9aed47a
Revert LLamasSharp project changes
2 years ago
sa_ddam213
b4b4000342
Merge branch 'master' into upstream_master
# Conflicts:
# LLama.Web/Common/ModelOptions.cs
# LLama.Web/Services/ConnectionSessionService.cs
# LLama/LLamaStatelessExecutor.cs
# LLama/LLamaWeights.cs
2 years ago
Martin Evans
d8434ea9d6
Merge pull request #185 from martindevans/wip_major_api_change
Major llama.cpp API Change
2 years ago
Martin Evans
efb0664df0
- Added new binaries
- Fixed stateless executor out-of-context handling
- Fixed token tests
2 years ago
sa_ddam213
9b8de007dc
Propagate ILogger
2 years ago
Martin Evans
669ae47ef7
- Split parameters into two interfaces
- params contains a list of loras, instead of just one
2 years ago
Martin Evans
0d40338692
Fixed out-of-context handling in stateless executor
2 years ago
Martin Evans
d58fcbbd13
Fixed antiprompt checking
2 years ago
Martin Evans
08f1615e60
- Converted LLamaStatelessExecutor to run `Exec` calls inside an awaited task. This unblocks async callers while the model is being evaluated.
- Added a "spinner" to the `StatelessModeExecute` demo, which spins while waiting for the next token (demonstrating that it's not blocked).
2 years ago
Martin Evans
3f80190f85
Minimal changes required to remove non-async inference.
2 years ago
Martin Evans
77bd090150
Simplified `LLamaInteractExecutor` antiprompt matching by using new extension method
2 years ago
Martin Evans
614ba40948
- Added a `TokensEndsWithAnyString` extension to `IReadOnlyList<int>` which efficiently checks if a set of tokens ends with one of a set of strings.
- Minimal amount of characters converted
- Allocation free
- Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span
- Allocation free
2 years ago
Martin Evans
93f24f8a51
Switched to properly typed `Encoding` property
2 years ago
Martin Evans
759ae26f36
Merge branch 'master' into grammar_basics
2 years ago
Martin Evans
a9e6f21ab8
- Creating and destroying contexts in the stateless executor, saving memory. It now uses zero memory when not inferring!
- Passing encoding in the `IModelParams`, which reduces how often encoding needs to be passed around
2 years ago
Martin Evans
e7b217f462
Fixed out of context logic
2 years ago
Martin Evans
4738c26299
- Reduced context size of test, to speed it up
- Removed some unnecessary `ToArray` calls
- Initial pass on LLamaStatelessExecutor, the context overflow management is broken but I think I found where it's ported from
2 years ago
Martin Evans
64416ca23c
- Created a slightly nicer way to create grammar (from `IReadOnlyList<IReadOnlyList<LLamaGrammarElement>>`)
- Integrated grammar into sampling
- Added a test for the grammar sampling
2 years ago
Martin Evans
f3511e390f
WIP demonstrating changes to support multi-context. You can see this in use in `TalkToYourself`, along with notes on what still needs improving.
The biggest single change is renaming `LLamaModel` to `LLamaContext`
2 years ago
Martin Evans
270c6d55ef
Merge pull request #88 from martindevans/fix_serialization_nan
Fix serialization error due to NaN
2 years ago
Martin Evans
be52737488
Using a nullable float instead of NaN, this should fix the serialization issue reported in #85
2 years ago
Martin Evans
1fceeaf352
Applied fix from #84 (antiprompt does not work in stateless executor)
2 years ago
Yaohui Liu
d609b0e1d5
Merge branch 'master' of github.com:SciSharp/LLamaSharp into rinne-dev
2 years ago
Yaohui Liu
b60c8bd285
fix: antiprompt does not work in stateless executor.
2 years ago
Martin Evans
2b2d3af26b
Moved `Eval` out of `Utils` and into `SafeLLamaContextHandle`
2 years ago
Martin Evans
7fabcc1849
One last `TokenToString` case
2 years ago
Martin Evans
0e5e00e300
Moved `TokenToString` from Utils into `SafeLLamaContextHandle` (thin wrappers around the same method in `SafeLlamaModelHandle`)
2 years ago
sa_ddam213
bac9cba01a
InferenceParams abstractions
2 years ago
Martin Evans
c64507cb41
Correctly passing through mu value to mirostate instead of resetting it every time.
2 years ago
Martin Evans
ad28a5acdb
Merge branch 'master' into fix_multiple_enumeration
2 years ago
Rinne
4d7d4f2bfe
Merge pull request #59 from saddam213/master
Instruct & Stateless web example implemented
2 years ago
sa_ddam213
3fec7a63c7
Add Instruct and Stateless support
2 years ago
Martin Evans
f3fa73de2b
Implemented a new `LlamaModel.State` handle which internally stores the state as natively allocated memory. This allows it to exceed the 2GB limit on C# arrays.
2 years ago