# Understand LLamaSharp context `LLamaContext` is the most important component as a link between native APIs and higher-level APIs. It contains the basic settings for model inference and holds the kv-cache, which could significantly accelerate the model inference. Since `LLamaContext` is not coupled with `LLamaWeights`, it's possible to create multiple context based on one piece of model weight. Each `ILLamaExecutor` will hold a `LLamaContext` instance, but it's possible to switch to different context in an executor. If your application has multiple sessions, please take care of managing `LLamaContext`. `LLamaContext` takes the following parameters as its settings. Note that the parameters could not be changed once the context has been created. ```cs public interface IContextParams { /// /// Model context size (n_ctx) /// uint? ContextSize { get; } /// /// batch size for prompt processing (must be >=32 to use BLAS) (n_batch) /// uint BatchSize { get; } /// /// Seed for the random number generator (seed) /// uint Seed { get; } /// /// Whether to use embedding mode. (embedding) Note that if this is set to true, /// The LLamaModel won't produce text response anymore. /// bool EmbeddingMode { get; } /// /// RoPE base frequency (null to fetch from the model) /// float? RopeFrequencyBase { get; } /// /// RoPE frequency scaling factor (null to fetch from the model) /// float? RopeFrequencyScale { get; } /// /// The encoding to use for models /// Encoding Encoding { get; } /// /// Number of threads (null = autodetect) (n_threads) /// uint? Threads { get; } /// /// Number of threads to use for batch processing (null = autodetect) (n_threads) /// uint? BatchThreads { get; } /// /// YaRN extrapolation mix factor (null = from model) /// float? YarnExtrapolationFactor { get; } /// /// YaRN magnitude scaling factor (null = from model) /// float? YarnAttentionFactor { get; } /// /// YaRN low correction dim (null = from model) /// float? YarnBetaFast { get; } /// /// YaRN high correction dim (null = from model) /// float? YarnBetaSlow { get; } /// /// YaRN original context length (null = from model) /// uint? YarnOriginalContext { get; } /// /// YaRN scaling method to use. /// RopeScalingType? YarnScalingType { get; } /// /// Override the type of the K cache /// GGMLType? TypeK { get; } /// /// Override the type of the V cache /// GGMLType? TypeV { get; } /// /// Whether to disable offloading the KQV cache to the GPU /// bool NoKqvOffload { get; } /// /// defragment the KV cache if holes/size > defrag_threshold, Set to < 0 to disable (default) /// float DefragThreshold { get; } /// /// Whether to pool (sum) embedding results by sequence id (ignored if no pooling layer) /// bool DoPooling { get; } } ``` `LLamaContext` has its state, which could be saved and loaded. ```cs LLamaContext.SaveState(string filename) LLamaContext.GetState() ```