Using CUDA while decoupling from the CUDA Toolkit as a hard-dependency
Possible solution for https://github.com/SciSharp/LLamaSharp/issues/350
Adding an alternative, fallback method of detection system-supported cuda version to make CUDA Toolkit installation optional. Technically, it uses output of the command line tool "nvidia-smi" (preinstalled with nvidia drivers), which also contains information about cuda version supported on system.
Can confirm it works only on Windows, but I suppose that similar approach can be utilized for Linux and MacOS as well. Didn't touch the code for these 2 platforms, nevertheless.
After that, cuda can be utilized simply by putting nvidia libraries from llama.cpp original repo, "bin-win-cublas-cu12.2.0-x64.zip" asset to the root folder of the built program. For example, to folder: "\LLama.Examples\bin\Debug\net8.0\".
- using `Lazy<T>` to initialize it automatically.
- Added in `AVX512` support for all dotnet versions (but not autodetected).
- Added in AVX version auto detection.