You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

tokenization.md 658 B

12345678910111213141516171819202122232425
  1. # Tokenization/Detokenization
  2. A pair of APIs to make conversion between text and tokens.
  3. ## Tokenization
  4. The basic usage is to call `Tokenize` after initializing the model.
  5. ```cs
  6. LLamaModel model = new LLamaModel(new ModelParams("<modelPath>"));
  7. string text = "hello";
  8. int[] tokens = model.Tokenize(text).ToArray();
  9. ```
  10. Depending on different model (or vocab), the output will be various.
  11. ## Detokenization
  12. Similar to tokenization, just pass an `IEnumerable<int>` to `Detokenize` method.
  13. ```cs
  14. LLamaModel model = new LLamaModel(new ModelParams("<modelPath>"));
  15. int[] tokens = new int[] {125, 2568, 13245};
  16. string text = model.Detokenize(tokens);
  17. ```