Different from LLamaModel, when using an exeuctor, InferenceParams is passed to the Infer method instead of constructor. This is because executors only define the ways to run the model, therefore in each run, you can change the settings for this time inference.
Namespace: LLama.Common
public class InferenceParams
Inheritance Object → InferenceParams
number of tokens to keep from initial prompt
public int TokensKeep { get; set; }
how many new tokens to predict (n_predict), set to -1 to inifinitely generate response
until it complete.
public int MaxTokens { get; set; }
logit bias for specific tokens
public Dictionary<int, float> LogitBias { get; set; }
Sequences where the model will stop generating further tokens.
public IEnumerable<string> AntiPrompts { get; set; }
path to file for saving/loading model eval state
public string PathSession { get; set; }
string to suffix user inputs with
public string InputSuffix { get; set; }
string to prefix user inputs with
public string InputPrefix { get; set; }
0 or lower to use vocab size
public int TopK { get; set; }
1.0 = disabled
public float TopP { get; set; }
1.0 = disabled
public float TfsZ { get; set; }
1.0 = disabled
public float TypicalP { get; set; }
1.0 = disabled
public float Temperature { get; set; }
1.0 = disabled
public float RepeatPenalty { get; set; }
last n tokens to penalize (0 = disable penalty, -1 = context size) (repeat_last_n)
public int RepeatLastTokensCount { get; set; }
frequency penalty coefficient
0.0 = disabled
public float FrequencyPenalty { get; set; }
presence penalty coefficient
0.0 = disabled
public float PresencePenalty { get; set; }
Mirostat uses tokens instead of words.
algorithm described in the paper https://arxiv.org/abs/2007.14966.
0 = disabled, 1 = mirostat, 2 = mirostat 2.0
public MiroStateType Mirostat { get; set; }
MiroStateType
target entropy
public float MirostatTau { get; set; }
learning rate
public float MirostatEta { get; set; }
consider newlines as a repeatable token (penalize_nl)
public bool PenalizeNL { get; set; }