You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

stemmatization.py 710 B

1234567891011121314151617181920
  1. from nltk.tokenize import word_tokenize
  2. from nltk.stem import PorterStemmer
  3. def stemmatize_sentence(sentence):
  4. stemmer = PorterStemmer()
  5. word_list = word_tokenize(sentence)
  6. stemmatized_ouput = ' '.join([stemmer.stem(w) for w in word_list])
  7. return stemmatized_ouput
  8. def stemmatize(train_texts, test_texts=None):
  9. ### Stemmatize Sentences
  10. stemmatized_texts_train = []
  11. stemmatized_texts_test = []
  12. for text in train_texts:
  13. stemmatized_texts_train.append(stemmatize_sentence(text))
  14. if test_texts is not None:
  15. for text in test_texts:
  16. stemmatized_texts_test.append(stemmatize_sentence(text))
  17. return stemmatized_texts_train, stemmatized_texts_test

在信息安全领域,漏洞评估和管理是关键任务之一。本作品探讨了如何利用预训练文本大模型来评估和研判漏洞的严重等级,具体基于通用漏洞评分系统。传统漏洞评分方法依赖于手动分析和专家评审。而基于自然语言处理文本大模型通过其深度学习能力,可以自动化地处理和分析大量的安全相关文本数据,从而提高漏洞评估的效率和准确性。结合词干提取、词性还原能够更好地发挥自然语言处理文本大模型的预测能力与准确度。