LLamaSharp

Commit Graph

Author	SHA1	Message	Date
jlsantiago	3b2836eac4	Llava api (#563 ) * Add llava_binaries, update all binaries to make the test * Llava API + LlavaTest Preliminary * First prototype of Load + Unit Test * Temporary run test con branch LlavaAPI * Disable Embed test to review the rest of the test * Restore Embedding test * Use BatchThread to eval image embeddings Test Threads default value to ensure it doesn´t produce problems. * Rename test file * Update action versions * Test only one method, no release embeddings * Revert "Test only one method, no release embeddings" This reverts commit `264e176dcc`. * Correct API call * Only test llava related functionality * Cuda and Cblast binaries * Restore build policy * Changes related with code review * Add SafeHandles * Set overwrite to upload-artifact@v4 * Revert to upload-artifact@v3 * revert to upload-artifact@v3	1 year ago
Martin Evans	ce4de7d607	llama_decode lock (#595 ) * Added a lock object into `SafeLlamaModelHandle` which all calls to `llama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp. * Modified the lock to be global over _all_ inferences. This seems to be necessary (at least with the CUDA backend).	1 year ago
Clovis Henrique Ribeiro	d0f79814e9	Added conditional compilation code to progress_callback (in LlamaModelParams struct) so the struct plays nice with legacy NET Framework 4.8 (#593 )	1 year ago
Martin Evans	f0b0bbcbb7	Mutable Logits (#586 ) Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.	1 year ago
Martin Evans	a8ba9f05b3	March Binary Update (#565 ) * Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586) * Added abort callback * Added properties to get/set thread count on `LLamaContext` * Fixed LLamaLogLevel numbering	1 year ago
dependabot[bot]	4068a6f03b	build(deps): bump System.Text.Json from 8.0.1 to 8.0.2 Bumps [System.Text.Json](https://github.com/dotnet/runtime) from 8.0.1 to 8.0.2. - [Release notes](https://github.com/dotnet/runtime/releases) - [Commits](https://github.com/dotnet/runtime/compare/v8.0.1...v8.0.2) --- updated-dependencies: - dependency-name: System.Text.Json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	1 year ago
Martin Evans	defac000ad	Added a `%(RecursiveDir)` element to the props file, this causes files to be copied along with the folder structure rather than dumped into the root. (#561 )	1 year ago
Martin Evans	8ac1634233	Removed `llama_eval`. It is going to be completely removed in the next version of llama.cpp (#553 )	1 year ago
Martin Evans	f0e7e7cc0a	Removed `SamplingApi`. it has been marked as Obsolete for a while, replaced by instance methods on `LLamaTokenDataArray` (#552 )	1 year ago
Martin Evans	7d84625a67	Classifier Free Guidance (#536 ) * Added a `Guidance` method to `LLamaTokenDataArray` which applies classifier free guidance * Factored out a safer `llama_sample_apply_guidance` method based on spans * Created a guided sampling demo using the batched executor * fixed comment, "classifier free" not "context free" * Rebased onto master and fixed breakage due to changes in `BaseSamplingPipeline` * Asking user for guidance weight * Progress bar in batched fork demo * Improved fork example (using tree display) * Added proper disposal of resources in batched examples * Added some more comments in BatchedExecutorGuidance	1 year ago
Martin Evans	91a7967869	`ReadOnlySpan<float>` in ISamplingPipeline (#538 ) * - Modified ISamplingPipeline to accept `ReadOnlySpan<float>` of logits directly. This moves responsibility to copy the logits into the pipeline. - Added a flag to `BaseSamplingPipeline` indicating if a logit copy is necessary. Skipping it in most cases. * Fixed `RestoreProtectedTokens` not working if logit processing is skipped * - Implemented a new greedy sampling pipeline (always sample most likely token) - Moved `Grammar` into `BaseSamplingPipeline` - Removed "protected tokens" concept from `BaseSamplingPipeline`. Was introducing a lot of incidental complexity. - Implemented newline logit save/restore in `DefaultSamplingPipeline` (only place protected tokens was used) * Implemented pipelines for mirostat v1 and v2	1 year ago
Scott W Harden	a6394001a1	NativeLibraryConfig: WithLogs(LLamaLogLevel) (#529 ) Adds a NativeLibraryConfig.WithLogs() overload to let the user indicate the log level (with "info" as the default)	1 year ago
Scott W Harden	4c3077d0f0	ChatSession: improve exception message The original message contained the word "preceeded" which should be spelled as "preceded"	1 year ago
Martin Evans	c7d0dc915a	Assorted small changes to clean up some code warnings	1 year ago
Martin Evans	174f21a385	0.10.0	1 year ago
Martin Evans	d03c1a9201	Merge pull request #503 from martindevans/batched_executor_again Introduced a new `BatchedExecutor`	1 year ago
Martin Evans	d47b6afe4d	Normalizing embeddings in `LLamaEmbedder`. As is done in llama.cpp: `2891c8aa9a/examples/embedding/embedding.cpp (L92)`	1 year ago
Martin Evans	e9d9042576	Added `Divide` to `KvAccessor`	1 year ago
Martin Evans	1cc463b9b7	Added a finalizer to `BatchedExecutor`	1 year ago
Martin Evans	0c2cff0e1c	Added a Finalizer for `Conversation` in case it is not correctly disposed.	1 year ago
Martin Evans	949861a581	- Added a `Modify` method to `Conversation`. This grants temporary access to directly modify the KV cache. - Re-implmented `Rewind` as an extension method using `Modify` internally - Implemented `ShiftLeft`, which shifts everything over except for some starting tokens. This is the same as the `StatelessExecutor` out-of-context handling. - Starting batch at epoch 1, this ensures that conversations (starting at zero) are below the current epoch. It also means `0` can always be used as a value guaranteed to be below the current epoch.	1 year ago
Martin Evans	b0acecf080	Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix). Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state. Added two new examples, demonstrating forking and rewinding.	1 year ago
Martin Evans	90915c5a99	Added increment and decrement operators to `LLamaPos`	1 year ago
Martin Evans	82c471eac4	Merge pull request #500 from martindevans/improved_kv_cache_methods Small KV Cache Handling Improvements	1 year ago
Martin Evans	c5146bac23	- Exposed KV debug view through `SafeLLamaContextHandle` - Added `KvCacheSequenceDivide` - Moved count tokens/cells methods to `SafeLLamaContextHandle`	1 year ago
Martin Evans	744758f110	Using `AddRange` in `LLamaEmbedder`	1 year ago
Martin Evans	c7103e86e4	Added new file types to quantisation	1 year ago
Martin Evans	17385e12b6	Merge pull request #479 from martindevans/update_binaries_feb_2024 Update binaries feb 2024	1 year ago
Martin Evans	bac40a3b7a	Added new binaries, from this run: https://github.com/SciSharp/LLamaSharp/actions/runs/7792319886	1 year ago
Jason Couture	c963b051e2	Add nuspec for OpenCL (CLBLAST)	1 year ago
Martin Evans	765c697f77	Fixed number type	1 year ago
Martin Evans	b2e815d51e	Updated all binaries (from this run: https://github.com/SciSharp/LLamaSharp/actions/runs/7746303349 )	1 year ago
Martin Evans	15a98b36d8	Updated everything to work with llama.cpp `ce32060198`	1 year ago
Martin Evans	c9c8cd0d62	- Swapped embeddings generator to use `llama_decode` - Modified `GetEmbeddings` method to be async	1 year ago
Martin Evans	22aba9a671	Merge pull request #473 from martindevans/base_handle_removed_constructor Removed `SafeLLamaHandleBase` Constructor	1 year ago
Martin Evans	5da2a2f64b	- Removed one of the constructors of `SafeLLamaHandleBase`, which implicitly states that memory is owned. Better to be explicit about this kind of thing! - Also fixed `ToString()` in `SafeLLamaHandleBase`	1 year ago
Martin Evans	9b995510d6	Removed all setters in `IModelParams` and `IContextParams`, allowing implementations to be immutable.	1 year ago
Jason Couture	ec59c5bf9e	Fix missing library name prefix for cuda	1 year ago
Jason Couture	443ce4fff4	While the dllimport changes work, manual path searching needed to be updated	1 year ago
Jason Couture	db7e1e88f8	Use llama instead of libllama in `[DllImport]` This results in windows users not needing to rename the DLL. This allows native llama builds to be dropped in, even on windows. I also took the time to update the documentation, removing references to renaming the files, since the names now match. Fixes #463	1 year ago
dependabot[bot]	d8eb817bf5	build(deps): bump System.Text.Json from 8.0.0 to 8.0.1 Bumps [System.Text.Json](https://github.com/dotnet/runtime) from 8.0.0 to 8.0.1. - [Release notes](https://github.com/dotnet/runtime/releases) - [Commits](https://github.com/dotnet/runtime/compare/v8.0.0...v8.0.1) --- updated-dependencies: - dependency-name: System.Text.Json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	1 year ago
Martin Evans	92b9bbe779	Added methods to `SafeLLamaContextHandle` for KV cache manipulation	1 year ago
Martin Evans	a690db5d3e	Fixed build error caused by extra unnecessary parameter	1 year ago
Martin Evans	96c26c25f5	Merge pull request #445 from martindevans/stateless_executor_llama_decode Swapped `StatelessExecutor` to use `llama_decode`!	1 year ago
Martin Evans	9fe878ae1f	- Fixed example - Growing more than double, if necessary	1 year ago
Martin Evans	9ede1bedc2	Automatically growing batch n_seq_max when exceeded. This means no parameters need to be picked when the batch is created.	1 year ago
Martin Evans	a2e29d393c	Swapped `StatelessExecutor` to use `llama_decode`! - Added `logits_i` argument to `Context.ApplyPenalty` - Added a new exception type for `llama_decode` return code	1 year ago
Martin Evans	5b6e82a594	Improved the BatchedDecoding demo: - using less `NativeHandle` - Using `StreamingTokenDecoder` instead of obsolete detokenize method	1 year ago
Martin Evans	99969e538e	- Removed some unused `eval` methods. - Added a `DecodeAsync` overload which runs the work in a task - Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents. - Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.	1 year ago
Martin Evans	36a9335588	Removed `LLamaBatchSafeHandle` (using unmanaged memory, created by llama.cpp) and replaced it with a fully managed `LLamaBatch`. Modified the `BatchedDecoding` example to use new managed batch.	1 year ago

1 2 3 4 5 ...

509 Commits (experimental_cpp)