Martin Evans
|
1f8c94e386
|
Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538)
|
2 years ago |
Martin Evans
|
efb0664df0
|
- Added new binaries
- Fixed stateless executor out-of-context handling
- Fixed token tests
|
2 years ago |
Martin Evans
|
669ae47ef7
|
- Split parameters into two interfaces
- params contains a list of loras, instead of just one
|
2 years ago |
Martin Evans
|
ce1fc51163
|
Added some more native methods
|
2 years ago |
Martin Evans
|
bca55eace0
|
Initial changes to match the llama.cpp changes
|
2 years ago |
Martin Evans
|
daf09eae64
|
Skipping tokenization of empty strings (saves allocating an empty array every time)
|
2 years ago |
Martin Evans
|
bba801f4b7
|
Added a property to get the KV cache size from a context
|
2 years ago |
SignalRT
|
fb007e5921
|
Changes to compile in VS Mac + change model to llama2
This commit includes changes to compile en VS Mac + changest to use llama2 not codellama.
It includes MacOS binaries in memory and metal
|
2 years ago |
Martin Evans
|
95dc12dd76
|
Switched to codellama-7b.gguf in tests (probably temporarily)
|
2 years ago |
Martin Evans
|
0c98ae1955
|
Passing ctx to `llama_token_nl(_ctx)`
|
2 years ago |
Martin Evans
|
2830e5755c
|
- Applied a lot of minor R# code quality suggestions. Lots of unnecessary imports removed.
- Deleted `NativeInfo` (internal class, not used anywhere)
|
2 years ago |
Martin Evans
|
a9e6f21ab8
|
- Creating and destroying contexts in the stateless executor, saving memory. It now uses zero memory when not inferring!
- Passing encoding in the `IModelParams`, which reduces how often encoding needs to be passed around
|
2 years ago |
Martin Evans
|
1b35be2e0c
|
Added some additional basic tests
|
2 years ago |