Yaohui Liu
d03e1dbe30
feat: support cuda feature detection.
2 years ago
SignalRT
5fe721bdbe
Revert "Merge branch 'pr/268' into RuntimeDetection"
This reverts commit 091b8d58b3502a99b3bfbec9db457c92cc736beb, reversing
changes made to 9b2ca9cf8e .
2 years ago
SignalRT
200011e186
Revert "Merge feat: add detection template for cuda and avx. #268"
This reverts commit b4b3ea9d99 .
2 years ago
SignalRT
b4b3ea9d99
Merge feat: add detection template for cuda and avx. #268
Just merge cuda and avx detection and change layout.
2 years ago
Yaohui Liu
b893c6f609
feat: add detection template for cuda and avx.
2 years ago
Martin Evans
c7fdb9712c
Added binaries, built from ` 6961c4bd0b`
2 years ago
Martin Evans
a024d2242e
It works!
had to update binary to `b1426`
2 years ago
Martin Evans
8cd81251b4
initial setup
2 years ago
Martin Evans
15db194c17
Added multi GPU support
2 years ago
Martin Evans
e89ca5cc17
Fixed a few minor warnings
2 years ago
Martin Evans
1f8c94e386
Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538 )
2 years ago
Martin Evans
0d40338692
Fixed out-of-context handling in stateless executor
2 years ago
Martin Evans
9e958e896b
safe handle for batch
2 years ago
Martin Evans
ce1fc51163
Added some more native methods
2 years ago
Martin Evans
bca55eace0
Initial changes to match the llama.cpp changes
2 years ago
Haiping
10678a83d6
Merge pull request #65 from martindevans/alternative_dependency_loading
CPU Feature Detection
2 years ago
sa_ddam213
09d8f434f2
Extract LLamaLogLevel, Remove Logger class
2 years ago
Martin Evans
8f58a40fb9
Added Linux dependency loading
2 years ago
Martin Evans
dd4957471f
Changed paths to match what the GitHub build action produces
2 years ago
Martin Evans
756a1ad0ba
Added a new way to load dependencies, performing CPU feature detection
2 years ago
Martin Evans
bcf06e2652
Added some comments on various native methods
2 years ago
Martin Evans
2022b82947
Added binaries generated by this action: https://github.com/SciSharp/LLamaSharp/actions/runs/6002797872/job/16279896150
Based on this version: 6b73ef1201
2 years ago
Martin Evans
0c98ae1955
Passing ctx to `llama_token_nl(_ctx)`
2 years ago
Martin Evans
6ffa28f964
Removed `LLAMA_MAX_DEVICES` (not used)
2 years ago
Martin Evans
2056078aef
Initial changes required for GGUF support
2 years ago
Martin Evans
829f32b27d
- Added `Obsolete` attributes to the entire `OldVersion` namespace, so they can be removed in the future
- Minor changes to cleanup some of the compiler warnings
2 years ago
Martin Evans
d7f971fc22
Improved `NativeApi` file a bit:
- Added some more comments
- Modified `llama_tokenize` to not allocate
- Modified `llama_tokenize_native` to take a pointer instead of an array, allowing use with no allocations
- Removed GgmlInitParams (not used)
2 years ago
sa_ddam213
726987b761
Add native logging output
2 years ago
Martin Evans
2b2d3af26b
Moved `Eval` out of `Utils` and into `SafeLLamaContextHandle`
2 years ago
Martin Evans
2d811b2603
- Moved `GetLogits` into `SafeLLamaContextHandle`
- Added disposal check into `SafeLLamaContextHandle`
2 years ago
Martin Evans
cd3cf2b77d
- Moved tokenization from `Utils.Tokenize` into `SafeLLamaContextHandle.Tokenize`, one less thing in `Utils`.
- Also refactored it to return an `int[]` instead of an `IEnumerable<int>`, solving the "multiple enumeration" problems at the source!
2 years ago
Yaohui Liu
bb46a990d0
fix: add bug info for native api.
2 years ago
Martin Evans
afb9d24f3a
Added model `Tokenize` method
2 years ago
Martin Evans
369c915afe
Added TokenToString conversion on model handle
2 years ago
Martin Evans
b721072aa5
Exposed some extra model properties on safe handle
2 years ago
Martin Evans
f16aa58e12
Updated to use the new loading system in llama (llama_state). This new system has split model weights and contexts into two separate things, allowing one set of weights to be shared between many contexts.
This change _only_ implements the low level API and makes no effort to update the LlamaSharp higher level abstraction.
It is built upon llama `b3f138d`, necessary DLLs are **not** included in this commit.
2 years ago
Rinne
c5e8b3eba2
Merge pull request #56 from martindevans/memory_mapped_save_loading_and_saving
Memory Mapped LoadState/SaveState
2 years ago
Rinne
1b0523f630
Merge branch 'master' into master
2 years ago
Martin Evans
4d72420a04
Replaced `SaveState` and `LoadState` implementations. These new implementations map the file into memory and then pass the pointer directly into the native API. This improves things in two ways:
- A C# array cannot exceed 2,147,483,591 bytes. In my own use of LlamaSharp I encountered this limit.
- This saves an extra copy of the entire state data into a C# `byte[]`, so it should be faster.
This does _not_ fix some other places where `GetStateData` is used. I'll look at those in a separate PR.
2 years ago
SignalRT
56a37a0d7d
Update to lates llama.cpp
Adapt the interface change in llama_backend_init
2 years ago
unknown
dba866ffcf
Update API method name
2 years ago
Yaohui Liu
1062fe1a7e
feat: upgrade the native libraries.
2 years ago
Yaohui Liu
9850417a12
feat: update quantize native params.
2 years ago
Yaohui Liu
3bf74ec9b9
feat: add chat session for refactored code.
2 years ago
Yaohui Liu
264fb9a706
refactor: LLamaModel and LLamaExecutor.
2 years ago
Yaohui Liu
3a62f087fe
fix: encoding error when using other languages.
2 years ago
Yaohui Liu
18c2ff2395
refactor: instruct mode and examples.
2 years ago
Yaohui Liu
55d5a8ae51
fix: quantization error with fp16.
2 years ago
Yaohui Liu
19979f664a
feat: support loading and saving state.
2 years ago
Yaohui Liu
4314f64b9c
feat: add check for backend package.
2 years ago