* Added a lock object into `SafeLlamaModelHandle` which all calls to `llama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp.
* Modified the lock to be global over _all_ inferences. This seems to be necessary (at least with the CUDA backend).
/// This object exists to ensure there is only ever 1 inference running at a time. This is a workaround for thread safety issues in llama.cpp itself.
/// Most notably CUDA, which seems to use some global singleton resources and will crash if multiple inferences are run (even against different models).