You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

remove_stop_words.py 771 B

1234567891011121314151617181920
  1. from nltk.corpus import stopwords
  2. from nltk.tokenize import word_tokenize
  3. def remove_stop_words_from_sentence(sentence):
  4. stop_words = set(stopwords.words('english'))
  5. word_tokens = word_tokenize(sentence)
  6. filtered_ouput = ' '.join([w for w in word_tokens if not w in stop_words])
  7. return filtered_ouput
  8. def remove_stop_words(train_texts, test_texts=None):
  9. ### Remove stop words from sentences
  10. filtered_texts_train = []
  11. filtered_texts_test = []
  12. for text in train_texts:
  13. filtered_texts_train.append(remove_stop_words_from_sentence(text))
  14. if test_texts is not None:
  15. for text in test_texts:
  16. filtered_texts_test.append(remove_stop_words_from_sentence(text))
  17. return filtered_texts_train, filtered_texts_test

在信息安全领域,漏洞评估和管理是关键任务之一。本作品探讨了如何利用预训练文本大模型来评估和研判漏洞的严重等级,具体基于通用漏洞评分系统。传统漏洞评分方法依赖于手动分析和专家评审。而基于自然语言处理文本大模型通过其深度学习能力,可以自动化地处理和分析大量的安全相关文本数据,从而提高漏洞评估的效率和准确性。结合词干提取、词性还原能够更好地发挥自然语言处理文本大模型的预测能力与准确度。