* 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释 * BucketSampler增加一条错误检测 * 1.修改ClipGradientCallback的bug;删除LRSchedulerCallback中的print,之后应该传入pbar进行打印;2.增加MLP注释 * update MLP module * 增加metric注释;修改trainer save过程中的bug * Update README.md fix tutorial link * Add ENAS (Efficient Neural Architecture Search) * add ignore_type in DataSet.add_field * * AutoPadder will not pad when dtype is None * add ignore_type in DataSet.apply * 修复fieldarray中padder潜在bug * 修复crf中typo; 以及可能导致数值不稳定的地方 * 修复CRF中可能存在的bug * change two default init arguments of Trainer into None * Changes to Callbacks: * 给callback添加给定几个只读属性 * 通过manager设置这些属性 * 代码优化,减轻@transfer的负担 * * 将enas相关代码放到automl目录下 * 修复fast_param_mapping的一个bug * Trainer添加自动创建save目录 * Vocabulary的打印,显示内容 * * 给vocabulary添加遍历方法 * 修复CRF为负数的bug * add SQuAD metric * add sigmoid activate function in MLP * - add star transformer model - add ConllLoader, for all kinds of conll-format files - add JsonLoader, for json-format files - add SSTLoader, for SST-2 & SST-5 - change Callback interface - fix batch multi-process when killed - add README to list models and their performance * - fix test * - fix callback & tests * - update README * 修改部分bug;调整callback * 准备发布0.4.0版本“ * update readme * support parallel loss * 防止多卡的情况导致无法正确计算loss“ * update advance_tutorial jupyter notebook * 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove. 2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。 3. 在utils中新增一个cache_result()修饰器,用于cache函数的返回值。 4. callback中新增update_every属性 * 1.DataSet.apply()报错时提供错误的index 2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序 3.embedloader在embed读取时遇到不规则的数据跳过这一行. * update attention * doc tools * fix some doc errors * 修改为中文注释,增加viterbi解码方法 * 样例版本 * - add pad sequence for lstm - add csv, conll, json filereader - update dataloader - remove useless dataloader - fix trainer loss print - fix tests * - fix test_tutorial * 注释增加 * 测试文档 * 本地暂存 * 本地暂存 * 修改文档的顺序 * - add document * 本地暂存 * update pooling * update bert * update documents in MLP * update documents in snli * combine self attention module to attention.py * update documents on losses.py * 对DataSet的文档进行更新 * update documents on metrics * 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档 * 增加对Trainer的注释 * 完善了trainer,callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏 * update char level encoder * update documents on embedding.py * - update doc * 补充注释,并修改部分代码 * - update doc - add get_embeddings * 修改了文档配置项 * 修改embedding为init_embed初始化 * 1.增加对Trainer和Tester的多卡支持; * - add test - fix jsonloader * 删除了注释教程 * 给 dataset 增加了get_field_names * 修复bug * - add Const - fix bugs * 修改部分注释 * - add model runner for easier test models - add model tests * 修改了 docs 的配置和架构 * 修改了核心部分的一大部分文档,TODO: 1. 完善 trainer 和 tester 部分的文档 2. 研究注释样例与测试 * core部分的注释基本检查完成 * 修改了 io 部分的注释 * 全部改为相对路径引用 * 全部改为相对路径引用 * small change * 1. 从安装文件中删除api/automl的安装 2. metric中存在seq_len的bug 3. sampler中存在命名错误,已修改 * 修复 bug :兼容 cpu 版本的 PyTorch TODO:其它地方可能也存在类似的 bug * 修改文档中的引用部分 * 把 tqdm.autonotebook 换成tqdm.auto * - fix batch & vocab * 上传了文档文件 *.rst * 上传了文档文件和若干 TODO * 讨论并整合了若干模块 * core部分的测试和一些小修改 * 删除了一些冗余文档 * update init files * update const files * update const files * 增加cnn的测试 * fix a little bug * - update attention - fix tests * 完善测试 * 完成快速入门教程 * 修改了sequence_modeling 命名为 sequence_labeling 的文档 * 重新 apidoc 解决改名的遗留问题 * 修改文档格式 * 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask * 增加了一行提示 * 在文档中展示 dataset_loader * 提示 Dataset.read_csv 会被 CSVLoader 替换 * 完成 Callback 和 Trainer 之间的文档 * index更新了部分 * 删除冗余的print * 删除用于分词的metric,因为有可能引起错误 * 修改文档中的中文名称 * 完成了详细介绍文档 * tutorial 的 ipynb 文件 * 修改了一些介绍文档 * 修改了 models 和 modules 的主页介绍 * 加上了 titlesonly 这个设置 * 修改了模块文档展示的标题 * 修改了 core 和 io 的开篇介绍 * 修改了 modules 和 models 开篇介绍 * 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释 * 修改了一些注释 * delete an old metric in test * 修改 tutorials 的测试文件 * 把暂不发布的功能移到 legacy 文件夹 * 删除了不能运行的测试 * 修改 callback 的测试文件 * 删除了过时的教程和测试文件 * cache_results 参数的修改 * 修改 io 的测试文件; 删除了一些过时的测试 * 修复bug * 修复无法通过test_utils.py的测试 * 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar * 1. 修复metric中的bug; 2.增加metric测试 * add model summary * 增加别名 * 删除encoder中的嵌套层 * 修改了 core 部分 import 的顺序,__all__ 暴露的内容 * 修改了 models 部分 import 的顺序,__all__ 暴露的内容 * 修改了文件名 * 修改了 modules 模块的__all__ 和 import * fix var runn * 增加vocab的clear方法 * 一些符合 PEP8 的微调 * 更新了cache_results的例子 * 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index * 修改了一个typo * 修改了 README.md * update documents on bert * update documents on encoder/bert * 增加一个fitlog callback,实现与fitlog实验记录 * typo * - update dataset_loader * 增加了到 fitlog 文档的链接。 * 增加了 DataSet Loader 的文档 * - add star-transformer reproductiontags/v0.4.0
@@ -0,0 +1,7 @@ | |||
include requirements.txt | |||
include LICENSE | |||
include README.md | |||
prune test/ | |||
prune reproduction/ | |||
prune fastNLP/api | |||
prune fastNLP/automl |
@@ -6,87 +6,108 @@ | |||
![Hex.pm](https://img.shields.io/hexpm/l/plug.svg) | |||
[![Documentation Status](https://readthedocs.org/projects/fastnlp/badge/?version=latest)](http://fastnlp.readthedocs.io/?badge=latest) | |||
FastNLP is a modular Natural Language Processing system based on PyTorch, built for fast development of NLP models. | |||
fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个命名实体识别(NER)、中文分词或文本分类任务; 也可以使用他构建许多复杂的网络模型,进行科研。它具有如下的特性: | |||
- 统一的Tabular式数据容器,让数据预处理过程简洁明了。内置多种数据集的DataSet Loader,省去预处理代码。 | |||
- 各种方便的NLP工具,例如预处理embedding加载; 中间数据cache等; | |||
- 详尽的中文文档以供查阅; | |||
- 提供诸多高级模块,例如Variational LSTM, Transformer, CRF等; | |||
- 封装CNNText,Biaffine等模型可供直接使用; | |||
- 便捷且具有扩展性的训练器; 提供多种内置callback函数,方便实验记录、异常捕获等。 | |||
## 安装指南 | |||
fastNLP 依赖如下包: | |||
+ numpy | |||
+ torch>=0.4.0 | |||
+ tqdm | |||
+ nltk | |||
其中torch的安装可能与操作系统及 CUDA 的版本相关,请参见 PyTorch 官网 。 | |||
在依赖包安装完成的情况,您可以在命令行执行如下指令完成安装 | |||
```shell | |||
pip install fastNLP | |||
``` | |||
## 内置组件 | |||
大部分用于的 NLP 任务神经网络都可以看做由编码(encoder)、聚合(aggregator)、解码(decoder)三种模块组成。 | |||
![](./docs/source/figures/text_classification.png) | |||
fastNLP 在 modules 模块中内置了三种模块的诸多组件,可以帮助用户快速搭建自己所需的网络。 三种模块的功能和常见组件如下: | |||
A deep learning NLP model is the composition of three types of modules: | |||
<table> | |||
<tr> | |||
<td><b> module type </b></td> | |||
<td><b> functionality </b></td> | |||
<td><b> example </b></td> | |||
<td><b> 类型 </b></td> | |||
<td><b> 功能 </b></td> | |||
<td><b> 例子 </b></td> | |||
</tr> | |||
<tr> | |||
<td> encoder </td> | |||
<td> encode the input into some abstract representation </td> | |||
<td> 将输入编码为具有具 有表示能力的向量 </td> | |||
<td> embedding, RNN, CNN, transformer | |||
</tr> | |||
<tr> | |||
<td> aggregator </td> | |||
<td> aggregate and reduce information </td> | |||
<td> 从多个向量中聚合信息 </td> | |||
<td> self-attention, max-pooling </td> | |||
</tr> | |||
<tr> | |||
<td> decoder </td> | |||
<td> decode the representation into the output </td> | |||
<td> 将具有某种表示意义的 向量解码为需要的输出 形式 </td> | |||
<td> MLP, CRF </td> | |||
</tr> | |||
</table> | |||
For example: | |||
![](docs/source/figures/text_classification.png) | |||
## Requirements | |||
- Python>=3.6 | |||
- numpy>=1.14.2 | |||
- torch>=0.4.0 | |||
- tensorboardX | |||
- tqdm>=4.28.1 | |||
## 完整模型 | |||
fastNLP 为不同的 NLP 任务实现了许多完整的模型,它们都经过了训练和测试。 | |||
## Resources | |||
你可以在以下两个地方查看相关信息 | |||
- [介绍](reproduction/) | |||
- [源码](fastNLP/models/) | |||
- [Tutorials](https://github.com/fastnlp/fastNLP/tree/master/tutorials) | |||
- [Documentation](https://fastnlp.readthedocs.io/en/latest/) | |||
- [Source Code](https://github.com/fastnlp/fastNLP) | |||
## Installation | |||
Run the following commands to install fastNLP package. | |||
```shell | |||
pip install fastNLP | |||
``` | |||
## 项目结构 | |||
![](./docs/source/figures/workflow.png) | |||
## Project Structure | |||
fastNLP的大致工作流程如上图所示,而项目结构如下: | |||
<table> | |||
<tr> | |||
<td><b> fastNLP </b></td> | |||
<td> an open-source NLP library </td> | |||
</tr> | |||
<tr> | |||
<td><b> fastNLP.api </b></td> | |||
<td> APIs for end-to-end prediction </td> | |||
<td> 开源的自然语言处理库 </td> | |||
</tr> | |||
<tr> | |||
<td><b> fastNLP.core </b></td> | |||
<td> data representation & train/test procedure </td> | |||
<td> 实现了核心功能,包括数据处理组件、训练器、测速器等 </td> | |||
</tr> | |||
<tr> | |||
<td><b> fastNLP.models </b></td> | |||
<td> a collection of NLP models </td> | |||
<td> 实现了一些完整的神经网络模型 </td> | |||
</tr> | |||
<tr> | |||
<td><b> fastNLP.modules </b></td> | |||
<td> a collection of PyTorch sub-models/components/wheels </td> | |||
<td> 实现了用于搭建神经网络模型的诸多组件 </td> | |||
</tr> | |||
<tr> | |||
<td><b> fastNLP.io </b></td> | |||
<td> readers & savers </td> | |||
<td> 实现了读写功能,包括数据读入,模型读写等 </td> | |||
</tr> | |||
</table> | |||
## 参考资源 | |||
- [教程](https://github.com/fastnlp/fastNLP/tree/master/tutorials) | |||
- [文档](https://fastnlp.readthedocs.io/en/latest/) | |||
- [源码](https://github.com/fastnlp/fastNLP) | |||
*In memory of @FengZiYjun. May his soul rest in peace. We will miss you very very much!* |
@@ -3,6 +3,7 @@ | |||
# You can set these variables from the command line. | |||
SPHINXOPTS = | |||
SPHINXAPIDOC = sphinx-apidoc | |||
SPHINXBUILD = sphinx-build | |||
SPHINXPROJ = fastNLP | |||
SOURCEDIR = source | |||
@@ -12,6 +13,12 @@ BUILDDIR = build | |||
help: | |||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | |||
apidoc: | |||
$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ) | |||
server: | |||
cd build/html && python -m http.server | |||
.PHONY: help Makefile | |||
# Catch-all target: route all unknown targets to Sphinx using the new | |||
@@ -14,6 +14,7 @@ | |||
# | |||
import os | |||
import sys | |||
sys.path.insert(0, os.path.abspath('../../')) | |||
# -- Project information ----------------------------------------------------- | |||
@@ -23,10 +24,9 @@ copyright = '2018, xpqiu' | |||
author = 'xpqiu' | |||
# The short X.Y version | |||
version = '0.2' | |||
version = '0.4' | |||
# The full version, including alpha/beta/rc tags | |||
release = '0.2' | |||
release = '0.4' | |||
# -- General configuration --------------------------------------------------- | |||
@@ -42,9 +42,15 @@ extensions = [ | |||
'sphinx.ext.viewcode', | |||
'sphinx.ext.autosummary', | |||
'sphinx.ext.mathjax', | |||
'sphinx.ext.todo' | |||
] | |||
autodoc_default_options = { | |||
'member-order': 'bysource', | |||
'special-members': '__init__', | |||
'undoc-members': True, | |||
} | |||
# Add any paths that contain templates here, relative to this directory. | |||
templates_path = ['_templates'] | |||
@@ -62,17 +68,16 @@ master_doc = 'index' | |||
# | |||
# This is also used if you do content translation via gettext catalogs. | |||
# Usually you set "language" from the command line for these cases. | |||
language = None | |||
language = "zh_CN" | |||
# List of patterns, relative to source directory, that match files and | |||
# directories to ignore when looking for source files. | |||
# This pattern also affects html_static_path and html_extra_path . | |||
exclude_patterns = [] | |||
exclude_patterns = ['modules.rst'] | |||
# The name of the Pygments (syntax highlighting) style to use. | |||
pygments_style = 'sphinx' | |||
# -- Options for HTML output ------------------------------------------------- | |||
# The theme to use for HTML and HTML Help pages. See the documentation for | |||
@@ -84,7 +89,10 @@ html_theme = 'sphinx_rtd_theme' | |||
# further. For a list of options available for each theme, see the | |||
# documentation. | |||
# | |||
# html_theme_options = {} | |||
html_theme_options = { | |||
'collapse_navigation': False, | |||
'titles_only': True | |||
} | |||
# Add any paths that contain custom static files (such as style sheets) here, | |||
# relative to this directory. They are copied after the builtin static files, | |||
@@ -107,22 +115,21 @@ html_static_path = ['_static'] | |||
# Output file base name for HTML help builder. | |||
htmlhelp_basename = 'fastNLPdoc' | |||
# -- Options for LaTeX output ------------------------------------------------ | |||
latex_elements = { | |||
# The paper size ('letterpaper' or 'a4paper'). | |||
# | |||
# 'papersize': 'letterpaper', | |||
# The font size ('10pt', '11pt' or '12pt'). | |||
# | |||
# 'pointsize': '10pt', | |||
# Additional stuff for the LaTeX preamble. | |||
# | |||
# 'preamble': '', | |||
# Latex figure (float) alignment | |||
# | |||
# 'figure_align': 'htbp', | |||
@@ -136,7 +143,6 @@ latex_documents = [ | |||
'xpqiu', 'manual'), | |||
] | |||
# -- Options for manual page output ------------------------------------------ | |||
# One entry per manual page. List of tuples | |||
@@ -146,7 +152,6 @@ man_pages = [ | |||
[author], 1) | |||
] | |||
# -- Options for Texinfo output ---------------------------------------------- | |||
# Grouping the document tree into Texinfo files. List of tuples | |||
@@ -159,4 +164,14 @@ texinfo_documents = [ | |||
] | |||
# -- Extension configuration ------------------------------------------------- | |||
# -- Extension configuration ------------------------------------------------- | |||
def maybe_skip_member(app, what, name, obj, skip, options): | |||
if name.startswith("_"): | |||
return True | |||
if obj.__doc__ is None: | |||
return True | |||
return False | |||
def setup(app): | |||
app.connect('autodoc-skip-member', maybe_skip_member) |
@@ -1,36 +0,0 @@ | |||
fastNLP.api | |||
============ | |||
fastNLP.api.api | |||
---------------- | |||
.. automodule:: fastNLP.api.api | |||
:members: | |||
fastNLP.api.converter | |||
---------------------- | |||
.. automodule:: fastNLP.api.converter | |||
:members: | |||
fastNLP.api.model\_zoo | |||
----------------------- | |||
.. automodule:: fastNLP.api.model_zoo | |||
:members: | |||
fastNLP.api.pipeline | |||
--------------------- | |||
.. automodule:: fastNLP.api.pipeline | |||
:members: | |||
fastNLP.api.processor | |||
---------------------- | |||
.. automodule:: fastNLP.api.processor | |||
:members: | |||
.. automodule:: fastNLP.api | |||
:members: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.batch | |||
================== | |||
.. automodule:: fastNLP.core.batch | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.callback | |||
===================== | |||
.. automodule:: fastNLP.core.callback | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.const | |||
================== | |||
.. automodule:: fastNLP.core.const | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.dataset | |||
==================== | |||
.. automodule:: fastNLP.core.dataset | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.field | |||
================== | |||
.. automodule:: fastNLP.core.field | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.instance | |||
===================== | |||
.. automodule:: fastNLP.core.instance | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.losses | |||
=================== | |||
.. automodule:: fastNLP.core.losses | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.metrics | |||
==================== | |||
.. automodule:: fastNLP.core.metrics | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.optimizer | |||
====================== | |||
.. automodule:: fastNLP.core.optimizer | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -1,84 +1,29 @@ | |||
fastNLP.core | |||
============= | |||
fastNLP.core.batch | |||
------------------- | |||
.. automodule:: fastNLP.core.batch | |||
:members: | |||
fastNLP.core.dataset | |||
--------------------- | |||
.. automodule:: fastNLP.core.dataset | |||
:members: | |||
fastNLP.core.fieldarray | |||
------------------------ | |||
.. automodule:: fastNLP.core.fieldarray | |||
:members: | |||
fastNLP.core.instance | |||
---------------------- | |||
.. automodule:: fastNLP.core.instance | |||
:members: | |||
fastNLP.core.losses | |||
-------------------- | |||
.. automodule:: fastNLP.core.losses | |||
:members: | |||
fastNLP.core.metrics | |||
--------------------- | |||
.. automodule:: fastNLP.core.metrics | |||
:members: | |||
fastNLP.core.optimizer | |||
----------------------- | |||
.. automodule:: fastNLP.core.optimizer | |||
:members: | |||
fastNLP.core.predictor | |||
----------------------- | |||
.. automodule:: fastNLP.core.predictor | |||
:members: | |||
fastNLP.core.sampler | |||
--------------------- | |||
.. automodule:: fastNLP.core.sampler | |||
:members: | |||
fastNLP.core.tester | |||
-------------------- | |||
.. automodule:: fastNLP.core.tester | |||
:members: | |||
fastNLP.core.trainer | |||
--------------------- | |||
.. automodule:: fastNLP.core.trainer | |||
:members: | |||
fastNLP.core.utils | |||
------------------- | |||
.. automodule:: fastNLP.core.utils | |||
:members: | |||
fastNLP.core.vocabulary | |||
------------------------ | |||
.. automodule:: fastNLP.core.vocabulary | |||
:members: | |||
fastNLP.core | |||
============ | |||
.. automodule:: fastNLP.core | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
子模块 | |||
---------- | |||
.. toctree:: | |||
:titlesonly: | |||
fastNLP.core.batch | |||
fastNLP.core.callback | |||
fastNLP.core.const | |||
fastNLP.core.dataset | |||
fastNLP.core.field | |||
fastNLP.core.instance | |||
fastNLP.core.losses | |||
fastNLP.core.metrics | |||
fastNLP.core.optimizer | |||
fastNLP.core.sampler | |||
fastNLP.core.tester | |||
fastNLP.core.trainer | |||
fastNLP.core.utils | |||
fastNLP.core.vocabulary | |||
@@ -0,0 +1,7 @@ | |||
fastNLP.core.sampler | |||
==================== | |||
.. automodule:: fastNLP.core.sampler | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.tester | |||
=================== | |||
.. automodule:: fastNLP.core.tester | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.trainer | |||
==================== | |||
.. automodule:: fastNLP.core.trainer | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.utils | |||
================== | |||
.. automodule:: fastNLP.core.utils | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.core.vocabulary | |||
======================= | |||
.. automodule:: fastNLP.core.vocabulary | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.io.base\_loader | |||
======================= | |||
.. automodule:: fastNLP.io.base_loader | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.io.dataset\_loader | |||
========================== | |||
.. automodule:: fastNLP.io.dataset_loader | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.io.embed\_loader | |||
======================== | |||
.. automodule:: fastNLP.io.embed_loader | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.io.model\_io | |||
==================== | |||
.. automodule:: fastNLP.io.model_io | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -1,42 +1,19 @@ | |||
fastNLP.io | |||
=========== | |||
fastNLP.io | |||
========== | |||
fastNLP.io.base\_loader | |||
------------------------ | |||
.. automodule:: fastNLP.io.base_loader | |||
:members: | |||
fastNLP.io.config\_io | |||
---------------------- | |||
.. automodule:: fastNLP.io.config_io | |||
:members: | |||
fastNLP.io.dataset\_loader | |||
--------------------------- | |||
.. automodule:: fastNLP.io.dataset_loader | |||
:members: | |||
fastNLP.io.embed\_loader | |||
------------------------- | |||
.. automodule:: fastNLP.io.embed_loader | |||
:members: | |||
fastNLP.io.logger | |||
------------------ | |||
.. automodule:: fastNLP.io.logger | |||
.. automodule:: fastNLP.io | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
fastNLP.io.model\_io | |||
--------------------- | |||
子模块 | |||
---------- | |||
.. automodule:: fastNLP.io.model_io | |||
:members: | |||
.. toctree:: | |||
:titlesonly: | |||
fastNLP.io.base_loader | |||
fastNLP.io.dataset_loader | |||
fastNLP.io.embed_loader | |||
fastNLP.io.model_io | |||
.. automodule:: fastNLP.io | |||
:members: |
@@ -0,0 +1,7 @@ | |||
fastNLP.models.biaffine\_parser | |||
=============================== | |||
.. automodule:: fastNLP.models.biaffine_parser | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.models.cnn\_text\_classification | |||
======================================== | |||
.. automodule:: fastNLP.models.cnn_text_classification | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -1,42 +1,20 @@ | |||
fastNLP.models | |||
=============== | |||
fastNLP.models | |||
============== | |||
fastNLP.models.base\_model | |||
--------------------------- | |||
.. automodule:: fastNLP.models.base_model | |||
:members: | |||
fastNLP.models.biaffine\_parser | |||
-------------------------------- | |||
.. automodule:: fastNLP.models.biaffine_parser | |||
:members: | |||
fastNLP.models.char\_language\_model | |||
------------------------------------- | |||
.. automodule:: fastNLP.models.char_language_model | |||
:members: | |||
fastNLP.models.cnn\_text\_classification | |||
----------------------------------------- | |||
.. automodule:: fastNLP.models.cnn_text_classification | |||
:members: | |||
fastNLP.models.sequence\_modeling | |||
---------------------------------- | |||
.. automodule:: fastNLP.models.sequence_modeling | |||
.. automodule:: fastNLP.models | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
fastNLP.models.snli | |||
-------------------- | |||
子模块 | |||
---------- | |||
.. automodule:: fastNLP.models.snli | |||
:members: | |||
.. toctree:: | |||
:titlesonly: | |||
fastNLP.models.biaffine_parser | |||
fastNLP.models.cnn_text_classification | |||
fastNLP.models.sequence_labeling | |||
fastNLP.models.snli | |||
fastNLP.models.star_transformer | |||
.. automodule:: fastNLP.models | |||
:members: |
@@ -0,0 +1,7 @@ | |||
fastNLP.models.sequence\_labeling | |||
================================= | |||
.. automodule:: fastNLP.models.sequence_labeling | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.models.snli | |||
=================== | |||
.. automodule:: fastNLP.models.snli | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.models.star\_transformer | |||
================================ | |||
.. automodule:: fastNLP.models.star_transformer | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.aggregator.attention | |||
==================================== | |||
.. automodule:: fastNLP.modules.aggregator.attention | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.aggregator.pooling | |||
================================== | |||
.. automodule:: fastNLP.modules.aggregator.pooling | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -1,36 +1,17 @@ | |||
fastNLP.modules.aggregator | |||
=========================== | |||
fastNLP.modules.aggregator | |||
========================== | |||
fastNLP.modules.aggregator.attention | |||
------------------------------------- | |||
.. automodule:: fastNLP.modules.aggregator.attention | |||
:members: | |||
fastNLP.modules.aggregator.avg\_pool | |||
------------------------------------- | |||
.. automodule:: fastNLP.modules.aggregator.avg_pool | |||
:members: | |||
fastNLP.modules.aggregator.kmax\_pool | |||
-------------------------------------- | |||
.. automodule:: fastNLP.modules.aggregator.kmax_pool | |||
.. automodule:: fastNLP.modules.aggregator | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
fastNLP.modules.aggregator.max\_pool | |||
------------------------------------- | |||
.. automodule:: fastNLP.modules.aggregator.max_pool | |||
:members: | |||
子模块 | |||
---------- | |||
fastNLP.modules.aggregator.self\_attention | |||
------------------------------------------- | |||
.. toctree:: | |||
:titlesonly: | |||
.. automodule:: fastNLP.modules.aggregator.self_attention | |||
:members: | |||
fastNLP.modules.aggregator.attention | |||
fastNLP.modules.aggregator.pooling | |||
.. automodule:: fastNLP.modules.aggregator | |||
:members: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.decoder.CRF | |||
=========================== | |||
.. automodule:: fastNLP.modules.decoder.crf | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.decoder.MLP | |||
=========================== | |||
.. automodule:: fastNLP.modules.decoder.mlp | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -1,18 +1,18 @@ | |||
fastNLP.modules.decoder | |||
======================== | |||
fastNLP.modules.decoder | |||
======================= | |||
fastNLP.modules.decoder.CRF | |||
---------------------------- | |||
.. automodule:: fastNLP.modules.decoder.CRF | |||
.. automodule:: fastNLP.modules.decoder | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
fastNLP.modules.decoder.MLP | |||
---------------------------- | |||
子模块 | |||
---------- | |||
.. automodule:: fastNLP.modules.decoder.MLP | |||
:members: | |||
.. toctree:: | |||
:titlesonly: | |||
fastNLP.modules.decoder.crf | |||
fastNLP.modules.decoder.mlp | |||
fastNLP.modules.decoder.utils | |||
.. automodule:: fastNLP.modules.decoder | |||
:members: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.decoder.utils | |||
============================= | |||
.. automodule:: fastNLP.modules.decoder.utils | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.bert | |||
============================ | |||
.. automodule:: fastNLP.modules.encoder.bert | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.char\_encoder | |||
===================================== | |||
.. automodule:: fastNLP.modules.encoder.char_encoder | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.conv\_maxpool | |||
===================================== | |||
.. automodule:: fastNLP.modules.encoder.conv_maxpool | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.embedding | |||
================================= | |||
.. automodule:: fastNLP.modules.encoder.embedding | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.lstm | |||
============================ | |||
.. automodule:: fastNLP.modules.encoder.lstm | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -1,60 +1,23 @@ | |||
fastNLP.modules.encoder | |||
======================== | |||
fastNLP.modules.encoder | |||
======================= | |||
fastNLP.modules.encoder.char\_embedding | |||
---------------------------------------- | |||
.. automodule:: fastNLP.modules.encoder.char_embedding | |||
:members: | |||
fastNLP.modules.encoder.conv | |||
----------------------------- | |||
.. automodule:: fastNLP.modules.encoder.conv | |||
:members: | |||
fastNLP.modules.encoder.conv\_maxpool | |||
-------------------------------------- | |||
.. automodule:: fastNLP.modules.encoder.conv_maxpool | |||
:members: | |||
fastNLP.modules.encoder.embedding | |||
---------------------------------- | |||
.. automodule:: fastNLP.modules.encoder.embedding | |||
:members: | |||
fastNLP.modules.encoder.linear | |||
------------------------------- | |||
.. automodule:: fastNLP.modules.encoder.linear | |||
:members: | |||
fastNLP.modules.encoder.lstm | |||
----------------------------- | |||
.. automodule:: fastNLP.modules.encoder.lstm | |||
:members: | |||
fastNLP.modules.encoder.masked\_rnn | |||
------------------------------------ | |||
.. automodule:: fastNLP.modules.encoder.masked_rnn | |||
:members: | |||
fastNLP.modules.encoder.transformer | |||
------------------------------------ | |||
.. automodule:: fastNLP.modules.encoder.transformer | |||
.. automodule:: fastNLP.modules.encoder | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
fastNLP.modules.encoder.variational\_rnn | |||
----------------------------------------- | |||
子模块 | |||
---------- | |||
.. automodule:: fastNLP.modules.encoder.variational_rnn | |||
:members: | |||
.. toctree:: | |||
:titlesonly: | |||
fastNLP.modules.encoder.bert | |||
fastNLP.modules.encoder.char_encoder | |||
fastNLP.modules.encoder.conv_maxpool | |||
fastNLP.modules.encoder.embedding | |||
fastNLP.modules.encoder.lstm | |||
fastNLP.modules.encoder.star_transformer | |||
fastNLP.modules.encoder.transformer | |||
fastNLP.modules.encoder.variational_rnn | |||
.. automodule:: fastNLP.modules.encoder | |||
:members: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.star\_transformer | |||
========================================= | |||
.. automodule:: fastNLP.modules.encoder.star_transformer | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.transformer | |||
=================================== | |||
.. automodule:: fastNLP.modules.encoder.transformer | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||
fastNLP.modules.encoder.variational\_rnn | |||
======================================== | |||
.. automodule:: fastNLP.modules.encoder.variational_rnn | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: |
@@ -1,30 +1,17 @@ | |||
fastNLP.modules | |||
================ | |||
fastNLP.modules | |||
=============== | |||
.. toctree:: | |||
fastNLP.modules.aggregator | |||
fastNLP.modules.decoder | |||
fastNLP.modules.encoder | |||
fastNLP.modules.dropout | |||
------------------------ | |||
.. automodule:: fastNLP.modules.dropout | |||
:members: | |||
fastNLP.modules.other\_modules | |||
------------------------------- | |||
.. automodule:: fastNLP.modules.other_modules | |||
.. automodule:: fastNLP.modules | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
fastNLP.modules.utils | |||
---------------------- | |||
.. automodule:: fastNLP.modules.utils | |||
:members: | |||
子模块 | |||
----------- | |||
.. toctree:: | |||
:titlesonly: | |||
.. automodule:: fastNLP.modules | |||
:members: | |||
fastNLP.modules.aggregator | |||
fastNLP.modules.decoder | |||
fastNLP.modules.encoder |
@@ -1,13 +1,20 @@ | |||
fastNLP | |||
======== | |||
API 文档 | |||
=============== | |||
.. automodule:: fastNLP | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
内部模块 | |||
----------- | |||
.. toctree:: | |||
:titlesonly: | |||
:maxdepth: 3 | |||
fastNLP.api | |||
fastNLP.core | |||
fastNLP.io | |||
fastNLP.models | |||
fastNLP.modules | |||
fastNLP.models | |||
.. automodule:: fastNLP | |||
:members: |
@@ -1,63 +1,79 @@ | |||
fastNLP documentation | |||
fastNLP 中文文档 | |||
===================== | |||
A Modularized and Extensible Toolkit for Natural Language Processing. Currently still in incubation. | |||
fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个命名实体识别(NER)、中文分词或文本分类任务; | |||
也可以使用他构建许多复杂的网络模型,进行科研。它具有如下的特性: | |||
Introduction | |||
- 统一的Tabular式数据容器,让数据预处理过程简洁明了。内置多种数据集的DataSet Loader,省去预处理代码。 | |||
- 各种方便的NLP工具,例如预处理embedding加载; 中间数据cache等; | |||
- 详尽的中文文档以供查阅; | |||
- 提供诸多高级模块,例如Variational LSTM, Transformer, CRF等; | |||
- 封装CNNText,Biaffine等模型可供直接使用; | |||
- 便捷且具有扩展性的训练器; 提供多种内置callback函数,方便实验记录、异常捕获等。 | |||
内置组件 | |||
------------ | |||
FastNLP is a modular Natural Language Processing system based on | |||
PyTorch, built for fast development of NLP models. | |||
大部分用于的 NLP 任务神经网络都可以看做由编码(encoder)、聚合(aggregator)、解码(decoder)三种模块组成。 | |||
A deep learning NLP model is the composition of three types of modules: | |||
.. image:: figures/text_classification.png | |||
fastNLP 在 :mod:`~fastNLP.modules` 模块中内置了三种模块的诸多组件,可以帮助用户快速搭建自己所需的网络。 | |||
三种模块的功能和常见组件如下: | |||
+-----------------------+-----------------------+-----------------------+ | |||
| module type | functionality | example | | |||
+=======================+=======================+=======================+ | |||
| encoder | encode the input into | embedding, RNN, CNN, | | |||
| | some abstract | transformer | | |||
| | representation | | | |||
| encoder | 将输入编码为具有具 | embedding, RNN, CNN, | | |||
| | 有表示能力的向量 | transformer | | |||
+-----------------------+-----------------------+-----------------------+ | |||
| aggregator | aggregate and reduce | self-attention, | | |||
| | information | max-pooling | | |||
| aggregator | 从多个向量中聚合信息 | self-attention, | | |||
| | | max-pooling | | |||
+-----------------------+-----------------------+-----------------------+ | |||
| decoder | decode the | MLP, CRF | | |||
| | representation into | | | |||
| | the output | | | |||
| decoder | 将具有某种表示意义的 | MLP, CRF | | |||
| | 向量解码为需要的输出 | | | |||
| | 形式 | | | |||
+-----------------------+-----------------------+-----------------------+ | |||
For example: | |||
.. image:: figures/text_classification.png | |||
内置模型 | |||
---------------- | |||
fastNLP 在 :mod:`~fastNLP.models` 模块中内置了如 :class:`~fastNLP.models.CNNText` 、 | |||
:class:`~fastNLP.models.SeqLabeling` 等完整的模型,以供用户直接使用。 | |||
.. todo:: | |||
这些模型的介绍如下表所示:(模型名称 + 介绍 + 任务上的结果) | |||
用户手册 | |||
---------------- | |||
User's Guide | |||
------------ | |||
.. toctree:: | |||
:maxdepth: 2 | |||
:maxdepth: 1 | |||
user/installation | |||
user/quickstart | |||
安装指南 <user/installation> | |||
快速入门 <user/quickstart> | |||
详细指南 <user/tutorial_one> | |||
API Reference | |||
API 文档 | |||
------------- | |||
If you are looking for information on a specific function, class or | |||
method, this part of the documentation is for you. | |||
除了用户手册之外,你还可以通过查阅 API 文档来找到你所需要的工具。 | |||
.. toctree:: | |||
:titlesonly: | |||
:maxdepth: 2 | |||
fastNLP API <fastNLP> | |||
fastNLP | |||
fitlog | |||
------ | |||
用户可以 `点此 <https://fitlog.readthedocs.io/zh/latest/>`_ 查看fitlog的文档。 | |||
fitlog 是由我们团队开发,用于帮助用户记录日志并管理代码的工具 | |||
Indices and tables | |||
索引与搜索 | |||
================== | |||
* :ref:`genindex` | |||
@@ -0,0 +1,8 @@ | |||
fastNLP | |||
======= | |||
.. toctree:: | |||
:titlesonly: | |||
:maxdepth: 4 | |||
fastNLP |
@@ -1,376 +0,0 @@ | |||
fastNLP 10分钟上手教程 | |||
=============== | |||
教程原文见 https://github.com/fastnlp/fastNLP/blob/master/tutorials/fastnlp_10min_tutorial.ipynb | |||
fastNLP提供方便的数据预处理,训练和测试模型的功能 | |||
DataSet & Instance | |||
------------------ | |||
fastNLP用DataSet和Instance保存和处理数据。每个DataSet表示一个数据集,每个Instance表示一个数据样本。一个DataSet存有多个Instance,每个Instance可以自定义存哪些内容。 | |||
有一些read\_\*方法,可以轻松从文件读取数据,存成DataSet。 | |||
.. code:: ipython3 | |||
from fastNLP import DataSet | |||
from fastNLP import Instance | |||
# 从csv读取数据到DataSet | |||
win_path = "C:\\Users\zyfeng\Desktop\FudanNLP\\fastNLP\\test\\data_for_tests\\tutorial_sample_dataset.csv" | |||
dataset = DataSet.read_csv(win_path, headers=('raw_sentence', 'label'), sep='\t') | |||
print(dataset[0]) | |||
.. parsed-literal:: | |||
{'raw_sentence': A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., | |||
'label': 1} | |||
.. code:: ipython3 | |||
# DataSet.append(Instance)加入新数据 | |||
dataset.append(Instance(raw_sentence='fake data', label='0')) | |||
dataset[-1] | |||
.. parsed-literal:: | |||
{'raw_sentence': fake data, | |||
'label': 0} | |||
.. code:: ipython3 | |||
# DataSet.apply(func, new_field_name)对数据预处理 | |||
# 将所有数字转为小写 | |||
dataset.apply(lambda x: x['raw_sentence'].lower(), new_field_name='raw_sentence') | |||
# label转int | |||
dataset.apply(lambda x: int(x['label']), new_field_name='label_seq', is_target=True) | |||
# 使用空格分割句子 | |||
dataset.drop(lambda x: len(x['raw_sentence'].split()) == 0) | |||
def split_sent(ins): | |||
return ins['raw_sentence'].split() | |||
dataset.apply(split_sent, new_field_name='words', is_input=True) | |||
.. code:: ipython3 | |||
# DataSet.drop(func)筛除数据 | |||
# 删除低于某个长度的词语 | |||
dataset.drop(lambda x: len(x['words']) <= 3) | |||
.. code:: ipython3 | |||
# 分出测试集、训练集 | |||
test_data, train_data = dataset.split(0.3) | |||
print("Train size: ", len(test_data)) | |||
print("Test size: ", len(train_data)) | |||
.. parsed-literal:: | |||
Train size: 54 | |||
Test size: | |||
Vocabulary | |||
---------- | |||
fastNLP中的Vocabulary轻松构建词表,将词转成数字 | |||
.. code:: ipython3 | |||
from fastNLP import Vocabulary | |||
# 构建词表, Vocabulary.add(word) | |||
vocab = Vocabulary(min_freq=2) | |||
train_data.apply(lambda x: [vocab.add(word) for word in x['words']]) | |||
vocab.build_vocab() | |||
# index句子, Vocabulary.to_index(word) | |||
train_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True) | |||
test_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True) | |||
print(test_data[0]) | |||
.. parsed-literal:: | |||
{'raw_sentence': the plot is romantic comedy boilerplate from start to finish ., | |||
'label': 2, | |||
'label_seq': 2, | |||
'words': ['the', 'plot', 'is', 'romantic', 'comedy', 'boilerplate', 'from', 'start', 'to', 'finish', '.'], | |||
'word_seq': [2, 13, 9, 24, 25, 26, 15, 27, 11, 28, 3]} | |||
.. code:: ipython3 | |||
# 假设你们需要做强化学习或者gan之类的项目,也许你们可以使用这里的dataset | |||
from fastNLP.core.batch import Batch | |||
from fastNLP.core.sampler import RandomSampler | |||
batch_iterator = Batch(dataset=train_data, batch_size=2, sampler=RandomSampler()) | |||
for batch_x, batch_y in batch_iterator: | |||
print("batch_x has: ", batch_x) | |||
print("batch_y has: ", batch_y) | |||
break | |||
.. parsed-literal:: | |||
batch_x has: {'words': array([list(['this', 'kind', 'of', 'hands-on', 'storytelling', 'is', 'ultimately', 'what', 'makes', 'shanghai', 'ghetto', 'move', 'beyond', 'a', 'good', ',', 'dry', ',', 'reliable', 'textbook', 'and', 'what', 'allows', 'it', 'to', 'rank', 'with', 'its', 'worthy', 'predecessors', '.']), | |||
list(['the', 'entire', 'movie', 'is', 'filled', 'with', 'deja', 'vu', 'moments', '.'])], | |||
dtype=object), 'word_seq': tensor([[ 19, 184, 6, 1, 481, 9, 206, 50, 91, 1210, 1609, 1330, | |||
495, 5, 63, 4, 1269, 4, 1, 1184, 7, 50, 1050, 10, | |||
8, 1611, 16, 21, 1039, 1, 2], | |||
[ 3, 711, 22, 9, 1282, 16, 2482, 2483, 200, 2, 0, 0, | |||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | |||
0, 0, 0, 0, 0, 0, 0]])} | |||
batch_y has: {'label_seq': tensor([3, 2])} | |||
Model | |||
----- | |||
.. code:: ipython3 | |||
# 定义一个简单的Pytorch模型 | |||
from fastNLP.models import CNNText | |||
model = CNNText(embed_num=len(vocab), embed_dim=50, num_classes=5, padding=2, dropout=0.1) | |||
model | |||
.. parsed-literal:: | |||
CNNText( | |||
(embed): Embedding( | |||
(embed): Embedding(77, 50, padding_idx=0) | |||
(dropout): Dropout(p=0.0) | |||
) | |||
(conv_pool): ConvMaxpool( | |||
(convs): ModuleList( | |||
(0): Conv1d(50, 3, kernel_size=(3,), stride=(1,), padding=(2,)) | |||
(1): Conv1d(50, 4, kernel_size=(4,), stride=(1,), padding=(2,)) | |||
(2): Conv1d(50, 5, kernel_size=(5,), stride=(1,), padding=(2,)) | |||
) | |||
) | |||
(dropout): Dropout(p=0.1) | |||
(fc): Linear( | |||
(linear): Linear(in_features=12, out_features=5, bias=True) | |||
) | |||
) | |||
Trainer & Tester | |||
---------------- | |||
使用fastNLP的Trainer训练模型 | |||
.. code:: ipython3 | |||
from fastNLP import Trainer | |||
from copy import deepcopy | |||
from fastNLP import CrossEntropyLoss | |||
from fastNLP import AccuracyMetric | |||
.. code:: ipython3 | |||
# 进行overfitting测试 | |||
copy_model = deepcopy(model) | |||
overfit_trainer = Trainer(model=copy_model, | |||
train_data=test_data, | |||
dev_data=test_data, | |||
loss=CrossEntropyLoss(pred="output", target="label_seq"), | |||
metrics=AccuracyMetric(), | |||
n_epochs=10, | |||
save_path=None) | |||
overfit_trainer.train() | |||
.. parsed-literal:: | |||
training epochs started 2018-12-07 14:07:20 | |||
.. parsed-literal:: | |||
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=20), HTML(value='')), layout=Layout(display='… | |||
.. parsed-literal:: | |||
Epoch 1/10. Step:2/20. AccuracyMetric: acc=0.037037 | |||
Epoch 2/10. Step:4/20. AccuracyMetric: acc=0.296296 | |||
Epoch 3/10. Step:6/20. AccuracyMetric: acc=0.333333 | |||
Epoch 4/10. Step:8/20. AccuracyMetric: acc=0.555556 | |||
Epoch 5/10. Step:10/20. AccuracyMetric: acc=0.611111 | |||
Epoch 6/10. Step:12/20. AccuracyMetric: acc=0.481481 | |||
Epoch 7/10. Step:14/20. AccuracyMetric: acc=0.62963 | |||
Epoch 8/10. Step:16/20. AccuracyMetric: acc=0.685185 | |||
Epoch 9/10. Step:18/20. AccuracyMetric: acc=0.722222 | |||
Epoch 10/10. Step:20/20. AccuracyMetric: acc=0.777778 | |||
.. code:: ipython3 | |||
# 实例化Trainer,传入模型和数据,进行训练 | |||
trainer = Trainer(model=model, | |||
train_data=train_data, | |||
dev_data=test_data, | |||
loss=CrossEntropyLoss(pred="output", target="label_seq"), | |||
metrics=AccuracyMetric(), | |||
n_epochs=5) | |||
trainer.train() | |||
print('Train finished!') | |||
.. parsed-literal:: | |||
training epochs started 2018-12-07 14:08:10 | |||
.. parsed-literal:: | |||
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=5), HTML(value='')), layout=Layout(display='i… | |||
.. parsed-literal:: | |||
Epoch 1/5. Step:1/5. AccuracyMetric: acc=0.037037 | |||
Epoch 2/5. Step:2/5. AccuracyMetric: acc=0.037037 | |||
Epoch 3/5. Step:3/5. AccuracyMetric: acc=0.037037 | |||
Epoch 4/5. Step:4/5. AccuracyMetric: acc=0.185185 | |||
Epoch 5/5. Step:5/5. AccuracyMetric: acc=0.240741 | |||
Train finished! | |||
.. code:: ipython3 | |||
from fastNLP import Tester | |||
tester = Tester(data=test_data, model=model, metrics=AccuracyMetric()) | |||
acc = tester.test() | |||
.. parsed-literal:: | |||
[tester] | |||
AccuracyMetric: acc=0.240741 | |||
In summary | |||
---------- | |||
fastNLP Trainer的伪代码逻辑 | |||
--------------------------- | |||
1. 准备DataSet,假设DataSet中共有如下的fields | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
:: | |||
['raw_sentence', 'word_seq1', 'word_seq2', 'raw_label','label'] | |||
通过 | |||
DataSet.set_input('word_seq1', word_seq2', flag=True)将'word_seq1', 'word_seq2'设置为input | |||
通过 | |||
DataSet.set_target('label', flag=True)将'label'设置为target | |||
2. 初始化模型 | |||
~~~~~~~~~~~~~ | |||
:: | |||
class Model(nn.Module): | |||
def __init__(self): | |||
xxx | |||
def forward(self, word_seq1, word_seq2): | |||
# (1) 这里使用的形参名必须和DataSet中的input field的名称对应。因为我们是通过形参名, 进行赋值的 | |||
# (2) input field的数量可以多于这里的形参数量。但是不能少于。 | |||
xxxx | |||
# 输出必须是一个dict | |||
3. Trainer的训练过程 | |||
~~~~~~~~~~~~~~~~~~~~ | |||
:: | |||
(1) 从DataSet中按照batch_size取出一个batch,调用Model.forward | |||
(2) 将 Model.forward的结果 与 标记为target的field 传入Losser当中。 | |||
由于每个人写的Model.forward的output的dict可能key并不一样,比如有人是{'pred':xxx}, {'output': xxx}; | |||
另外每个人将target可能也会设置为不同的名称, 比如有人是label, 有人设置为target; | |||
为了解决以上的问题,我们的loss提供映射机制 | |||
比如CrossEntropyLosser的需要的输入是(prediction, target)。但是forward的output是{'output': xxx}; 'label'是target | |||
那么初始化losser的时候写为CrossEntropyLosser(prediction='output', target='label')即可 | |||
(3) 对于Metric是同理的 | |||
Metric计算也是从 forward的结果中取值 与 设置target的field中取值。 也是可以通过映射找到对应的值 | |||
一些问题. | |||
--------- | |||
1. DataSet中为什么需要设置input和target | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
:: | |||
只有被设置为input或者target的数据才会在train的过程中被取出来 | |||
(1.1) 我们只会在设置为input的field中寻找传递给Model.forward的参数。 | |||
(1.2) 我们在传递值给losser或者metric的时候会使用来自: | |||
(a)Model.forward的output | |||
(b)被设置为target的field | |||
2. 我们是通过forwad中的形参名将DataSet中的field赋值给对应的参数 | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
:: | |||
(1.1) 构建模型过程中, | |||
例如: | |||
DataSet中x,seq_lens是input,那么forward就应该是 | |||
def forward(self, x, seq_lens): | |||
pass | |||
我们是通过形参名称进行匹配的field的 | |||
1. 加载数据到DataSet | |||
~~~~~~~~~~~~~~~~~~~~ | |||
2. 使用apply操作对DataSet进行预处理 | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
:: | |||
(2.1) 处理过程中将某些field设置为input,某些field设置为target | |||
3. 构建模型 | |||
~~~~~~~~~~~ | |||
:: | |||
(3.1) 构建模型过程中,需要注意forward函数的形参名需要和DataSet中设置为input的field名称是一致的。 | |||
例如: | |||
DataSet中x,seq_lens是input,那么forward就应该是 | |||
def forward(self, x, seq_lens): | |||
pass | |||
我们是通过形参名称进行匹配的field的 | |||
(3.2) 模型的forward的output需要是dict类型的。 | |||
建议将输出设置为{"pred": xx}. | |||
@@ -1,113 +0,0 @@ | |||
FastNLP 1分钟上手教程 | |||
===================== | |||
教程原文见 https://github.com/fastnlp/fastNLP/blob/master/tutorials/fastnlp_1min_tutorial.ipynb | |||
step 1 | |||
------ | |||
读取数据集 | |||
.. code:: ipython3 | |||
from fastNLP import DataSet | |||
# linux_path = "../test/data_for_tests/tutorial_sample_dataset.csv" | |||
win_path = "C:\\Users\zyfeng\Desktop\FudanNLP\\fastNLP\\test\\data_for_tests\\tutorial_sample_dataset.csv" | |||
ds = DataSet.read_csv(win_path, headers=('raw_sentence', 'label'), sep='\t') | |||
step 2 | |||
------ | |||
数据预处理 1. 类型转换 2. 切分验证集 3. 构建词典 | |||
.. code:: ipython3 | |||
# 将所有数字转为小写 | |||
ds.apply(lambda x: x['raw_sentence'].lower(), new_field_name='raw_sentence') | |||
# label转int | |||
ds.apply(lambda x: int(x['label']), new_field_name='label_seq', is_target=True) | |||
def split_sent(ins): | |||
return ins['raw_sentence'].split() | |||
ds.apply(split_sent, new_field_name='words', is_input=True) | |||
.. code:: ipython3 | |||
# 分割训练集/验证集 | |||
train_data, dev_data = ds.split(0.3) | |||
print("Train size: ", len(train_data)) | |||
print("Test size: ", len(dev_data)) | |||
.. parsed-literal:: | |||
Train size: 54 | |||
Test size: 23 | |||
.. code:: ipython3 | |||
from fastNLP import Vocabulary | |||
vocab = Vocabulary(min_freq=2) | |||
train_data.apply(lambda x: [vocab.add(word) for word in x['words']]) | |||
# index句子, Vocabulary.to_index(word) | |||
train_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True) | |||
dev_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True) | |||
step 3 | |||
------ | |||
定义模型 | |||
.. code:: ipython3 | |||
from fastNLP.models import CNNText | |||
model = CNNText(embed_num=len(vocab), embed_dim=50, num_classes=5, padding=2, dropout=0.1) | |||
step 4 | |||
------ | |||
开始训练 | |||
.. code:: ipython3 | |||
from fastNLP import Trainer, CrossEntropyLoss, AccuracyMetric | |||
trainer = Trainer(model=model, | |||
train_data=train_data, | |||
dev_data=dev_data, | |||
loss=CrossEntropyLoss(), | |||
metrics=AccuracyMetric() | |||
) | |||
trainer.train() | |||
print('Train finished!') | |||
.. parsed-literal:: | |||
training epochs started 2018-12-07 14:03:41 | |||
.. parsed-literal:: | |||
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=6), HTML(value='')), layout=Layout(display='i… | |||
.. parsed-literal:: | |||
Epoch 1/3. Step:2/6. AccuracyMetric: acc=0.26087 | |||
Epoch 2/3. Step:4/6. AccuracyMetric: acc=0.347826 | |||
Epoch 3/3. Step:6/6. AccuracyMetric: acc=0.608696 | |||
Train finished! | |||
本教程结束。更多操作请参考进阶教程。 | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
@@ -1,5 +0,0 @@ | |||
fastNLP 进阶教程 | |||
=============== | |||
教程原文见 https://github.com/fastnlp/fastNLP/blob/master/tutorials/fastnlp_advanced_tutorial/advance_tutorial.ipynb | |||
@@ -1,5 +0,0 @@ | |||
fastNLP 开发者指南 | |||
=============== | |||
原文见 https://github.com/fastnlp/fastNLP/blob/master/tutorials/tutorial_for_developer.md | |||
@@ -1,17 +1,20 @@ | |||
============ | |||
Installation | |||
============ | |||
=============== | |||
安装指南 | |||
=============== | |||
.. contents:: | |||
:local: | |||
Make sure your environment satisfies https://github.com/fastnlp/fastNLP/blob/master/requirements.txt . | |||
fastNLP 依赖如下包:: | |||
Run the following commands to install fastNLP package: | |||
torch>=0.4.0 | |||
numpy | |||
tqdm | |||
nltk | |||
.. code:: shell | |||
pip install fastNLP | |||
其中torch的安装可能与操作系统及 CUDA 的版本相关,请参见 `PyTorch 官网 <https://pytorch.org/get-started/locally/>`_ 。 | |||
在依赖包安装完成的情况,您可以在命令行执行如下指令完成安装 | |||
.. code:: shell | |||
>>> pip install fastNLP |
@@ -1,11 +1,124 @@ | |||
Quickstart | |||
========== | |||
=============== | |||
快速入门 | |||
=============== | |||
.. toctree:: | |||
:maxdepth: 1 | |||
这是一个简单的分类任务 (数据来源 `kaggle <https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews>`_ )。 | |||
给出一段文字,预测它的标签是0~4中的哪一个。 | |||
../tutorials/fastnlp_1_minute_tutorial | |||
../tutorials/fastnlp_10tmin_tutorial | |||
../tutorials/fastnlp_advanced_tutorial | |||
../tutorials/fastnlp_developer_guide | |||
我们可以使用 fastNLP 中 io 模块中的 :class:`~fastNLP.io.CSVLoader` 类,轻松地从 csv 文件读取我们的数据。 | |||
.. code-block:: python | |||
from fastNLP.io import CSVLoader | |||
loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t') | |||
dataset = loader.load("./sample_data/tutorial_sample_dataset.csv") | |||
此时的 `dataset[0]` 的值如下,可以看到,数据集中的每个数据包含 ``raw_sentence`` 和 ``label`` 两个字段,他们的类型都是 ``str``:: | |||
{'raw_sentence': A series of escapades demonstrating the adage that what is good for the | |||
goose is also good for the gander , some of which occasionally amuses but none of which | |||
amounts to much of a story . type=str, | |||
'label': 1 type=str} | |||
我们使用 :class:`~fastNLP.DataSet` 类的 :meth:`~fastNLP.DataSet.apply` 方法将 ``raw_sentence`` 中字母变成小写,并将句子分词。 | |||
.. code-block:: python | |||
dataset.apply(lambda x: x['raw_sentence'].lower(), new_field_name='sentence') | |||
dataset.apply(lambda x: x['sentence'].split(), new_field_name='words', is_input=True) | |||
然后我们再用 :class:`~fastNLP.Vocabulary` 类来统计数据中出现的单词,并将单词序列转化为训练可用的数字序列。 | |||
.. code-block:: python | |||
from fastNLP import Vocabulary | |||
vocab = Vocabulary(min_freq=2).from_dataset(dataset, field_name='words') | |||
vocab.index_dataset(dataset, field_name='words',new_field_name='words') | |||
同时,我们也将原来 str 类型的标签转化为数字,并设置为训练中的标准答案 ``target`` | |||
.. code-block:: python | |||
dataset.apply(lambda x: int(x['label']), new_field_name='target', is_target=True) | |||
现在我们可以导入 fastNLP 内置的文本分类模型 :class:`~fastNLP.models.CNNText` , | |||
.. code-block:: python | |||
from fastNLP.models import CNNText | |||
model = CNNText((len(vocab),50), num_classes=5, padding=2, dropout=0.1) | |||
:class:`~fastNLP.models.CNNText` 的网络结构如下:: | |||
CNNText( | |||
(embed): Embedding( | |||
177, 50 | |||
(dropout): Dropout(p=0.0) | |||
) | |||
(conv_pool): ConvMaxpool( | |||
(convs): ModuleList( | |||
(0): Conv1d(50, 3, kernel_size=(3,), stride=(1,), padding=(2,)) | |||
(1): Conv1d(50, 4, kernel_size=(4,), stride=(1,), padding=(2,)) | |||
(2): Conv1d(50, 5, kernel_size=(5,), stride=(1,), padding=(2,)) | |||
) | |||
) | |||
(dropout): Dropout(p=0.1) | |||
(fc): Linear(in_features=12, out_features=5, bias=True) | |||
) | |||
下面我们用 :class:`~fastNLP.DataSet` 类的 :meth:`~fastNLP.DataSet.split` 方法将数据集划分为 ``train_data`` 和 ``dev_data`` | |||
两个部分,分别用于训练和验证 | |||
.. code-block:: python | |||
train_data, dev_data = dataset.split(0.2) | |||
最后我们用 fastNLP 的 :class:`~fastNLP.Trainer` 进行训练,训练的过程中需要传入模型 ``model`` ,训练数据集 ``train_data`` , | |||
验证数据集 ``dev_data`` ,损失函数 ``loss`` 和衡量标准 ``metrics`` 。 | |||
其中损失函数使用的是 fastNLP 提供的 :class:`~fastNLP.CrossEntropyLoss` 损失函数; | |||
衡量标准使用的是 fastNLP 提供的 :class:`~fastNLP.AccuracyMetric` 正确率指标。 | |||
.. code-block:: python | |||
from fastNLP import Trainer, CrossEntropyLoss, AccuracyMetric | |||
trainer = Trainer(model=model, train_data=train_data, dev_data=dev_data, | |||
loss=CrossEntropyLoss(), metrics=AccuracyMetric()) | |||
trainer.train() | |||
训练过程的输出如下:: | |||
input fields after batch(if batch size is 2): | |||
words: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2, 26]) | |||
target fields after batch(if batch size is 2): | |||
target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2]) | |||
training epochs started 2019-05-09-10-59-39 | |||
Evaluation at Epoch 1/10. Step:2/20. AccuracyMetric: acc=0.333333 | |||
Evaluation at Epoch 2/10. Step:4/20. AccuracyMetric: acc=0.533333 | |||
Evaluation at Epoch 3/10. Step:6/20. AccuracyMetric: acc=0.533333 | |||
Evaluation at Epoch 4/10. Step:8/20. AccuracyMetric: acc=0.533333 | |||
Evaluation at Epoch 5/10. Step:10/20. AccuracyMetric: acc=0.6 | |||
Evaluation at Epoch 6/10. Step:12/20. AccuracyMetric: acc=0.8 | |||
Evaluation at Epoch 7/10. Step:14/20. AccuracyMetric: acc=0.8 | |||
Evaluation at Epoch 8/10. Step:16/20. AccuracyMetric: acc=0.733333 | |||
Evaluation at Epoch 9/10. Step:18/20. AccuracyMetric: acc=0.733333 | |||
Evaluation at Epoch 10/10. Step:20/20. AccuracyMetric: acc=0.733333 | |||
In Epoch:6/Step:12, got best dev performance:AccuracyMetric: acc=0.8 | |||
Reloaded the best model. | |||
这份教程只是简单地介绍了使用 fastNLP 工作的流程,具体的细节分析见 :doc:`/user/tutorial_one` |
@@ -0,0 +1,371 @@ | |||
=============== | |||
详细指南 | |||
=============== | |||
我们使用和 :doc:`/user/quickstart` 中一样的任务来进行详细的介绍。给出一段文字,预测它的标签是0~4中的哪一个 | |||
(数据来源 `kaggle <https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews>`_ )。 | |||
-------------- | |||
数据处理 | |||
-------------- | |||
数据读入 | |||
我们可以使用 fastNLP :mod:`fastNLP.io` 模块中的 :class:`~fastNLP.io.CSVLoader` 类,轻松地从 csv 文件读取我们的数据。 | |||
这里的 dataset 是 fastNLP 中 :class:`~fastNLP.DataSet` 类的对象 | |||
.. code-block:: python | |||
from fastNLP.io import CSVLoader | |||
loader = CSVLoader(headers=('raw_sentence', 'label'), sep='\t') | |||
dataset = loader.load("./sample_data/tutorial_sample_dataset.csv") | |||
除了读取数据外,fastNLP 还提供了读取其它文件类型的 Loader 类、读取 Embedding的 Loader 等。详见 :doc:`/fastNLP.io` 。 | |||
Instance 和 DataSet | |||
fastNLP 中的 :class:`~fastNLP.DataSet` 类对象类似于二维表格,它的每一列是一个 :mod:`~fastNLP.core.field` | |||
每一行是一个 :mod:`~fastNLP.core.instance` 。我们可以手动向数据集中添加 :class:`~fastNLP.Instance` 类的对象 | |||
.. code-block:: python | |||
from fastNLP import Instance | |||
dataset.append(Instance(raw_sentence='fake data', label='0')) | |||
此时的 ``dataset[-1]`` 的值如下,可以看到,数据集中的每个数据包含 ``raw_sentence`` 和 ``label`` 两个 | |||
:mod:`~fastNLP.core.field` ,他们的类型都是 ``str`` :: | |||
{'raw_sentence': fake data type=str, 'label': 0 type=str} | |||
field 的修改 | |||
我们使用 :class:`~fastNLP.DataSet` 类的 :meth:`~fastNLP.DataSet.apply` 方法将 ``raw_sentence`` 中字母变成小写,并将句子分词。 | |||
同时也将 ``label`` :mod:`~fastNLP.core.field` 转化为整数并改名为 ``target`` | |||
.. code-block:: python | |||
dataset.apply(lambda x: x['raw_sentence'].lower(), new_field_name='sentence') | |||
dataset.apply_field(lambda x: x.split(), field_name='sentence', new_field_name='words') | |||
dataset.apply(lambda x: int(x['label']), new_field_name='target') | |||
``words`` 和 ``target`` 已经足够用于 :class:`~fastNLP.models.CNNText` 的训练了,但我们从其文档 | |||
:class:`~fastNLP.models.CNNText` 中看到,在 :meth:`~fastNLP.models.CNNText.forward` 的时候,还可以传入可选参数 ``seq_len`` 。 | |||
所以,我们再使用 :meth:`~fastNLP.DataSet.apply_field` 方法增加一个名为 ``seq_len`` 的 :mod:`~fastNLP.core.field` 。 | |||
.. code-block:: python | |||
dataset.apply_field(lambda x: len(x), field_name='words', new_field_name='seq_len') | |||
观察可知: :meth:`~fastNLP.DataSet.apply_field` 与 :meth:`~fastNLP.DataSet.apply` 类似, | |||
但所传入的 `lambda` 函数是针对一个 :class:`~fastNLP.Instance` 中的一个 :mod:`~fastNLP.core.field` 的; | |||
而 :meth:`~fastNLP.DataSet.apply` 所传入的 `lambda` 函数是针对整个 :class:`~fastNLP.Instance` 的。 | |||
.. note:: | |||
`lambda` 函数即匿名函数,是 Python 的重要特性。 ``lambda x: len(x)`` 和下面的这个函数的作用相同:: | |||
def func_lambda(x): | |||
return len(x) | |||
你也可以编写复杂的函数做为 :meth:`~fastNLP.DataSet.apply_field` 与 :meth:`~fastNLP.DataSet.apply` 的参数 | |||
Vocabulary 的使用 | |||
我们再用 :class:`~fastNLP.Vocabulary` 类来统计数据中出现的单词,并使用 :meth:`~fastNLP.Vocabularyindex_dataset` | |||
将单词序列转化为训练可用的数字序列。 | |||
.. code-block:: python | |||
from fastNLP import Vocabulary | |||
vocab = Vocabulary(min_freq=2).from_dataset(dataset, field_name='words') | |||
vocab.index_dataset(dataset, field_name='words',new_field_name='words') | |||
数据集分割 | |||
除了修改 :mod:`~fastNLP.core.field` 之外,我们还可以对 :class:`~fastNLP.DataSet` 进行分割,以供训练、开发和测试使用。 | |||
下面这段代码展示了 :meth:`~fastNLP.DataSet.split` 的使用方法(但实际应该放在后面两段改名和设置输入的代码之后) | |||
.. code-block:: python | |||
train_dev_data, test_data = dataset.split(0.1) | |||
train_data, dev_data = train_dev_data.split(0.1) | |||
len(train_data), len(dev_data), len(test_data) | |||
--------------------- | |||
使用内置模型训练 | |||
--------------------- | |||
内置模型的输入输出命名 | |||
fastNLP内置了一些完整的神经网络模型,详见 :doc:`/fastNLP.models` , 我们使用其中的 :class:`~fastNLP.models.CNNText` 模型进行训练。 | |||
为了使用内置的 :class:`~fastNLP.models.CNNText`,我们必须修改 :class:`~fastNLP.DataSet` 中 :mod:`~fastNLP.core.field` 的名称。 | |||
在这个例子中模型输入 (forward方法的参数) 为 ``words`` 和 ``seq_len`` ; 预测输出为 ``pred`` ;标准答案为 ``target`` 。 | |||
具体的命名规范可以参考 :doc:`/fastNLP.core.const` 。 | |||
如果不想查看文档,您也可以使用 :class:`~fastNLP.Const` 类进行命名。下面的代码展示了给 :class:`~fastNLP.DataSet` 中 | |||
:mod:`~fastNLP.core.field` 改名的 :meth:`~fastNLP.DataSet.rename_field` 方法,以及 :class:`~fastNLP.Const` 类的使用方法。 | |||
.. code-block:: python | |||
from fastNLP import Const | |||
dataset.rename_field('words', Const.INPUT) | |||
dataset.rename_field('seq_len', Const.INPUT_LEN) | |||
dataset.rename_field('target', Const.TARGET) | |||
在给 :class:`~fastNLP.DataSet` 中 :mod:`~fastNLP.core.field` 改名后,我们还需要设置训练所需的输入和目标,这里使用的是 | |||
:meth:`~fastNLP.DataSet.set_input` 和 :meth:`~fastNLP.DataSet.set_target` 两个函数。 | |||
.. code-block:: python | |||
dataset.set_input(Const.INPUT, Const.INPUT_LEN) | |||
dataset.set_target(Const.TARGET) | |||
快速训练 | |||
现在我们可以导入 fastNLP 内置的文本分类模型 :class:`~fastNLP.models.CNNText` ,并使用 :class:`~fastNLP.Trainer` 进行训练了 | |||
(其中 ``loss`` 和 ``metrics`` 的定义,我们将在后续两段代码中给出)。 | |||
.. code-block:: python | |||
from fastNLP.models import CNNText | |||
from fastNLP import Trainer | |||
model = CNNText((len(vocab),50), num_classes=5, padding=2, dropout=0.1) | |||
trainer = Trainer(model=model_cnn, train_data=train_data, dev_data=dev_data, | |||
loss=loss, metrics=metrics) | |||
trainer.train() | |||
训练过程的输出如下:: | |||
input fields after batch(if batch size is 2): | |||
words: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2, 26]) | |||
target fields after batch(if batch size is 2): | |||
target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2]) | |||
training epochs started 2019-05-09-10-59-39 | |||
Evaluation at Epoch 1/10. Step:2/20. AccuracyMetric: acc=0.333333 | |||
Evaluation at Epoch 2/10. Step:4/20. AccuracyMetric: acc=0.533333 | |||
Evaluation at Epoch 3/10. Step:6/20. AccuracyMetric: acc=0.533333 | |||
Evaluation at Epoch 4/10. Step:8/20. AccuracyMetric: acc=0.533333 | |||
Evaluation at Epoch 5/10. Step:10/20. AccuracyMetric: acc=0.6 | |||
Evaluation at Epoch 6/10. Step:12/20. AccuracyMetric: acc=0.8 | |||
Evaluation at Epoch 7/10. Step:14/20. AccuracyMetric: acc=0.8 | |||
Evaluation at Epoch 8/10. Step:16/20. AccuracyMetric: acc=0.733333 | |||
Evaluation at Epoch 9/10. Step:18/20. AccuracyMetric: acc=0.733333 | |||
Evaluation at Epoch 10/10. Step:20/20. AccuracyMetric: acc=0.733333 | |||
In Epoch:6/Step:12, got best dev performance:AccuracyMetric: acc=0.8 | |||
Reloaded the best model. | |||
损失函数 | |||
训练模型需要提供一个损失函数, 下面提供了一个在分类问题中常用的交叉熵损失。注意它的 **初始化参数** 。 | |||
``pred`` 参数对应的是模型的 forward 方法返回的 dict 中的一个 key 的名字。 | |||
``target`` 参数对应的是 :class:`~fastNLP.DataSet` 中作为标签的 :mod:`~fastNLP.core.field` 的名字。 | |||
这里我们用 :class:`~fastNLP.Const` 来辅助命名,如果你自己编写模型中 forward 方法的返回值或 | |||
数据集中 :mod:`~fastNLP.core.field` 的名字与本例不同, 你可以把 ``pred`` 参数和 ``target`` 参数设定符合自己代码的值。 | |||
.. code-block:: python | |||
from fastNLP import CrossEntropyLoss | |||
# loss = CrossEntropyLoss() 在本例中与下面这行代码等价 | |||
loss = CrossEntropyLoss(pred=Const.OUTPUT, target=Const.TARGET) | |||
评价指标 | |||
训练模型需要提供一个评价指标。这里使用准确率做为评价指标。参数的 `命名规则` 跟上面类似。 | |||
``pred`` 参数对应的是模型的 forward 方法返回的 dict 中的一个 key 的名字。 | |||
``target`` 参数对应的是 :class:`~fastNLP.DataSet` 中作为标签的 :mod:`~fastNLP.core.field` 的名字。 | |||
.. code-block:: python | |||
from fastNLP import AccuracyMetric | |||
# metrics=AccuracyMetric() 在本例中与下面这行代码等价 | |||
metrics=AccuracyMetric(pred=Const.OUTPUT, target=Const.TARGET) | |||
快速测试 | |||
与 :class:`~fastNLP.Trainer` 对应,fastNLP 也提供了 :class:`~fastNLP.Tester` 用于快速测试,用法如下 | |||
.. code-block:: python | |||
from fastNLP import Tester | |||
tester = Tester(test_data, model_cnn, metrics=AccuracyMetric()) | |||
tester.test() | |||
--------------------- | |||
编写自己的模型 | |||
--------------------- | |||
因为 fastNLP 是基于 `PyTorch <https://pytorch.org/>`_ 开发的框架,所以我们可以基于 PyTorch 模型编写自己的神经网络模型。 | |||
与标准的 PyTorch 模型不同,fastNLP 模型中 forward 方法返回的是一个字典,字典中至少需要包含 "pred" 这个字段。 | |||
而 forward 方法的参数名称必须与 :class:`~fastNLP.DataSet` 中用 :meth:`~fastNLP.DataSet.set_input` 设定的名称一致。 | |||
模型定义的代码如下: | |||
.. code-block:: python | |||
import torch | |||
import torch.nn as nn | |||
class LSTMText(nn.Module): | |||
def __init__(self, vocab_size, embedding_dim, output_dim, hidden_dim=64, num_layers=2, dropout=0.5): | |||
super().__init__() | |||
self.embedding = nn.Embedding(vocab_size, embedding_dim) | |||
self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=num_layers, bidirectional=True, dropout=dropout) | |||
self.fc = nn.Linear(hidden_dim * 2, output_dim) | |||
self.dropout = nn.Dropout(dropout) | |||
def forward(self, words): | |||
# (input) words : (batch_size, seq_len) | |||
words = words.permute(1,0) | |||
# words : (seq_len, batch_size) | |||
embedded = self.dropout(self.embedding(words)) | |||
# embedded : (seq_len, batch_size, embedding_dim) | |||
output, (hidden, cell) = self.lstm(embedded) | |||
# output: (seq_len, batch_size, hidden_dim * 2) | |||
# hidden: (num_layers * 2, batch_size, hidden_dim) | |||
# cell: (num_layers * 2, batch_size, hidden_dim) | |||
hidden = torch.cat((hidden[-2, :, :], hidden[-1, :, :]), dim=1) | |||
hidden = self.dropout(hidden) | |||
# hidden: (batch_size, hidden_dim * 2) | |||
pred = self.fc(hidden.squeeze(0)) | |||
# result: (batch_size, output_dim) | |||
return {"pred":pred} | |||
模型的使用方法与内置模型 :class:`~fastNLP.models.CNNText` 一致 | |||
.. code-block:: python | |||
model_lstm = LSTMText(len(vocab),50,5) | |||
trainer = Trainer(model=model_lstm, train_data=train_data, dev_data=dev_data, | |||
loss=loss, metrics=metrics) | |||
trainer.train() | |||
tester = Tester(test_data, model_lstm, metrics=AccuracyMetric()) | |||
tester.test() | |||
.. todo:: | |||
使用 :doc:`/fastNLP.modules` 编写模型 | |||
-------------------------- | |||
自己编写训练过程 | |||
-------------------------- | |||
如果你想用类似 PyTorch 的使用方法,自己编写训练过程,你可以参考下面这段代码。其中使用了 fastNLP 提供的 :class:`~fastNLP.Batch` | |||
来获得小批量训练的小批量数据,使用 :class:`~fastNLP.BucketSampler` 做为 :class:`~fastNLP.Batch` 的参数来选择采样的方式。 | |||
这段代码中使用了 PyTorch 的 `torch.optim.Adam` 优化器 和 `torch.nn.CrossEntropyLoss` 损失函数,并自己计算了正确率 | |||
.. code-block:: python | |||
from fastNLP import BucketSampler | |||
from fastNLP import Batch | |||
import torch | |||
import time | |||
model = CNNText((len(vocab),50), num_classes=5, padding=2, dropout=0.1) | |||
def train(epoch, data): | |||
optim = torch.optim.Adam(model.parameters(), lr=0.001) | |||
lossfunc = torch.nn.CrossEntropyLoss() | |||
batch_size = 32 | |||
train_sampler = BucketSampler(batch_size=batch_size, seq_len_field_name='seq_len') | |||
train_batch = Batch(batch_size=batch_size, dataset=data, sampler=train_sampler) | |||
start_time = time.time() | |||
for i in range(epoch): | |||
loss_list = [] | |||
for batch_x, batch_y in train_batch: | |||
optim.zero_grad() | |||
output = model(batch_x['words']) | |||
loss = lossfunc(output['pred'], batch_y['target']) | |||
loss.backward() | |||
optim.step() | |||
loss_list.append(loss.item()) | |||
print('Epoch {:d} Avg Loss: {:.2f}'.format(i, sum(loss_list) / len(loss_list)),end=" ") | |||
print('{:d}ms'.format(round((time.time()-start_time)*1000))) | |||
loss_list.clear() | |||
train(10, train_data) | |||
tester = Tester(test_data, model, metrics=AccuracyMetric()) | |||
tester.test() | |||
这段代码的输出如下:: | |||
Epoch 0 Avg Loss: 2.76 17ms | |||
Epoch 1 Avg Loss: 2.55 29ms | |||
Epoch 2 Avg Loss: 2.37 41ms | |||
Epoch 3 Avg Loss: 2.30 53ms | |||
Epoch 4 Avg Loss: 2.12 65ms | |||
Epoch 5 Avg Loss: 2.16 76ms | |||
Epoch 6 Avg Loss: 1.88 88ms | |||
Epoch 7 Avg Loss: 1.84 99ms | |||
Epoch 8 Avg Loss: 1.71 111ms | |||
Epoch 9 Avg Loss: 1.62 122ms | |||
[tester] | |||
AccuracyMetric: acc=0.142857 | |||
---------------------------------- | |||
使用 Callback 增强 Trainer | |||
---------------------------------- | |||
如果你不想自己实现繁琐的训练过程,只希望在训练过程中实现一些自己的功能(比如:输出从训练开始到当前 batch 结束的总时间), | |||
你可以使用 fastNLP 提供的 :class:`~fastNLP.Callback` 类。下面的例子中,我们继承 :class:`~fastNLP.Callback` 类实现了这个功能。 | |||
.. code-block:: python | |||
from fastNLP import Callback | |||
start_time = time.time() | |||
class MyCallback(Callback): | |||
def on_epoch_end(self): | |||
print('Sum Time: {:d}ms\n\n'.format(round((time.time()-start_time)*1000))) | |||
model = CNNText((len(vocab),50), num_classes=5, padding=2, dropout=0.1) | |||
trainer = Trainer(model=model, train_data=train_data, dev_data=dev_data, | |||
loss=CrossEntropyLoss(), metrics=AccuracyMetric(), callbacks=[MyCallback()]) | |||
trainer.train() | |||
训练输出如下:: | |||
input fields after batch(if batch size is 2): | |||
words: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2, 16]) | |||
seq_len: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2]) | |||
target fields after batch(if batch size is 2): | |||
target: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2]) | |||
training epochs started 2019-05-12-21-38-40 | |||
Evaluation at Epoch 1/10. Step:2/20. AccuracyMetric: acc=0.285714 | |||
Sum Time: 51ms | |||
………………………… | |||
Evaluation at Epoch 10/10. Step:20/20. AccuracyMetric: acc=0.857143 | |||
Sum Time: 212ms | |||
In Epoch:10/Step:20, got best dev performance:AccuracyMetric: acc=0.857143 | |||
Reloaded the best model. | |||
这个例子只是介绍了 :class:`~fastNLP.Callback` 类的使用方法。实际应用(比如:负采样、Learning Rate Decay、Early Stop 等)中 | |||
很多功能已经被 fastNLP 实现了。你可以直接 import 它们使用,详细请查看文档 :doc:`/fastNLP.core.callback` 。 |
@@ -0,0 +1,5 @@ | |||
================= | |||
科研向导 | |||
================= | |||
本文介绍使用 fastNLP 和 fitlog 进行科学研究的方法 |
@@ -1,3 +1,59 @@ | |||
""" | |||
fastNLP 由 :mod:`~fastNLP.core` 、 :mod:`~fastNLP.io` 、:mod:`~fastNLP.modules`、:mod:`~fastNLP.models` | |||
等子模块组成,你可以点进去查看每个模块的文档。 | |||
- :mod:`~fastNLP.core` 是fastNLP 的核心模块,包括 DataSet、 Trainer、 Tester 等组件。详见文档 :doc:`/fastNLP.core` | |||
- :mod:`~fastNLP.io` 是实现输入输出的模块,包括了数据集的读取,模型的存取等功能。详见文档 :doc:`/fastNLP.io` | |||
- :mod:`~fastNLP.modules` 包含了用于搭建神经网络模型的诸多组件,可以帮助用户快速搭建自己所需的网络。详见文档 :doc:`/fastNLP.modules` | |||
- :mod:`~fastNLP.models` 包含了一些使用 fastNLP 实现的完整网络模型,包括CNNText、SeqLabeling等常见模型。详见文档 :doc:`/fastNLP.models` | |||
fastNLP 中最常用的组件可以直接从 fastNLP 包中 import ,他们的文档如下: | |||
""" | |||
__all__ = [ | |||
"Instance", | |||
"FieldArray", | |||
"Batch", | |||
"Vocabulary", | |||
"DataSet", | |||
"Const", | |||
"Trainer", | |||
"Tester", | |||
"Callback", | |||
"GradientClipCallback", | |||
"EarlyStopCallback", | |||
"TensorboardCallback", | |||
"LRScheduler", | |||
"ControlC", | |||
"Padder", | |||
"AutoPadder", | |||
"EngChar2DPadder", | |||
"AccuracyMetric", | |||
"SpanFPreRecMetric", | |||
"SQuADMetric", | |||
"Optimizer", | |||
"SGD", | |||
"Adam", | |||
"Sampler", | |||
"SequentialSampler", | |||
"BucketSampler", | |||
"RandomSampler", | |||
"LossFunc", | |||
"CrossEntropyLoss", | |||
"L1Loss", "BCELoss", | |||
"NLLLoss", | |||
"LossInForward", | |||
"cache_results" | |||
] | |||
__version__ = '0.4.0' | |||
from .core import * | |||
from . import models | |||
from . import modules |
@@ -1 +0,0 @@ | |||
from .api import CWS, POS, Parser |
@@ -1,13 +1,30 @@ | |||
""" | |||
core 模块里实现了 fastNLP 的核心框架,常用的功能都可以从 fastNLP 包中直接 import。当然你也同样可以从 core 模块的子模块中 import, | |||
例如 Batch 组件有两种 import 的方式:: | |||
# 直接从 fastNLP 中 import | |||
from fastNLP import Batch | |||
# 从 core 模块的子模块 batch 中 import | |||
from fastNLP.core.batch import Batch | |||
对于常用的功能,你只需要在 :doc:`fastNLP` 中查看即可。如果想了解各个子模块的具体作用,您可以在下面找到每个子模块的具体文档。 | |||
.. todo:: | |||
介绍core 的子模块的分工,好像必要性不大 | |||
""" | |||
from .batch import Batch | |||
# from .dataset import DataSet | |||
from .fieldarray import FieldArray | |||
from .callback import Callback, GradientClipCallback, EarlyStopCallback, TensorboardCallback, LRScheduler, ControlC | |||
from .const import Const | |||
from .dataset import DataSet | |||
from .field import FieldArray, Padder, AutoPadder, EngChar2DPadder | |||
from .instance import Instance | |||
from .losses import LossFunc, CrossEntropyLoss, L1Loss, BCELoss, NLLLoss, LossInForward | |||
from .metrics import AccuracyMetric | |||
from .metrics import AccuracyMetric, SpanFPreRecMetric, SQuADMetric | |||
from .optimizer import Optimizer, SGD, Adam | |||
from .sampler import SequentialSampler, BucketSampler, RandomSampler, BaseSampler | |||
from .sampler import SequentialSampler, BucketSampler, RandomSampler, Sampler | |||
from .tester import Tester | |||
from .trainer import Trainer | |||
from .utils import cache_results, seq_len_to_mask | |||
from .vocabulary import Vocabulary | |||
from ..io.dataset_loader import DataSet | |||
@@ -1,28 +1,64 @@ | |||
""" | |||
batch 模块实现了 fastNLP 所需的 Batch 类。 | |||
""" | |||
__all__ = [ | |||
"Batch" | |||
] | |||
import atexit | |||
from queue import Empty, Full | |||
import numpy as np | |||
import torch | |||
from fastNLP.core.sampler import RandomSampler | |||
import torch.multiprocessing as mp | |||
class Batch(object): | |||
"""Batch is an iterable object which iterates over mini-batches. | |||
from .sampler import RandomSampler | |||
Example:: | |||
_python_is_exit = False | |||
for batch_x, batch_y in Batch(data_set, batch_size=16, sampler=SequentialSampler()): | |||
# ... | |||
:param DataSet dataset: a DataSet object | |||
:param int batch_size: the size of the batch | |||
:param Sampler sampler: a Sampler object | |||
:param bool as_numpy: If True, return Numpy array. Otherwise, return torch tensors. | |||
:param bool prefetch: If True, use multiprocessing to fetch next batch when training. | |||
:param str or torch.device device: the batch's device, if as_numpy is True, device is ignored. | |||
""" | |||
def _set_python_is_exit(): | |||
global _python_is_exit | |||
_python_is_exit = True | |||
def __init__(self, dataset, batch_size, sampler=RandomSampler(), as_numpy=False, prefetch=False): | |||
atexit.register(_set_python_is_exit) | |||
class Batch(object): | |||
""" | |||
别名::class:`fastNLP.Batch` :class:`fastNLP.core.batch.Batch` | |||
Batch 用于从 `DataSet` 中按一定的顺序, 依次按 ``batch_size`` 的大小将数据取出. | |||
组成 `x` 和 `y` | |||
Example:: | |||
batch = Batch(data_set, batch_size=16, sampler=SequentialSampler()) | |||
num_batch = len(batch) | |||
for batch_x, batch_y in batch: | |||
# do stuff ... | |||
:param dataset: :class:`~fastNLP.DataSet` 对象, 数据集 | |||
:param int batch_size: 取出的batch大小 | |||
:param sampler: 规定使用的 :class:`~fastNLP.Sampler` 方式. 若为 ``None`` , 使用 :class:`~fastNLP.RandomSampler`. | |||
Default: ``None`` | |||
:param bool as_numpy: 若为 ``True`` , 输出batch为 numpy.array. 否则为 :class:`torch.Tensor`. | |||
Default: ``False`` | |||
:param bool prefetch: 若为 ``True`` 使用多进程预先取出下一batch. | |||
Default: ``False`` | |||
""" | |||
def __init__(self, dataset, batch_size, sampler=None, as_numpy=False, prefetch=False): | |||
self.dataset = dataset | |||
self.batch_size = batch_size | |||
if sampler is None: | |||
sampler = RandomSampler() | |||
self.sampler = sampler | |||
self.as_numpy = as_numpy | |||
self.idx_list = None | |||
@@ -31,37 +67,38 @@ class Batch(object): | |||
self.cur_batch_indices = None | |||
self.prefetch = prefetch | |||
self.lengths = 0 | |||
def fetch_one(self): | |||
if self.curidx >= len(self.idx_list): | |||
return None | |||
else: | |||
endidx = min(self.curidx + self.batch_size, len(self.idx_list)) | |||
batch_x, batch_y = {}, {} | |||
indices = self.idx_list[self.curidx:endidx] | |||
self.cur_batch_indices = indices | |||
for field_name, field in self.dataset.get_all_fields().items(): | |||
if field.is_target or field.is_input: | |||
batch = field.get(indices) | |||
if not self.as_numpy and field.padder is not None: | |||
batch = to_tensor(batch, field.dtype) | |||
batch = _to_tensor(batch, field.dtype) | |||
if field.is_target: | |||
batch_y[field_name] = batch | |||
if field.is_input: | |||
batch_x[field_name] = batch | |||
self.curidx = endidx | |||
return batch_x, batch_y | |||
def __iter__(self): | |||
""" | |||
Iterate on dataset, fetch batch data. Fetch process don't block the iterate process | |||
:return: | |||
""" | |||
if self.prefetch: | |||
return run_batch_iter(self) | |||
return self._run_batch_iter(self) | |||
def batch_iter(): | |||
self.init_iter() | |||
while 1: | |||
@@ -69,21 +106,78 @@ class Batch(object): | |||
if res is None: | |||
break | |||
yield res | |||
return batch_iter() | |||
def init_iter(self): | |||
self.idx_list = self.sampler(self.dataset) | |||
self.curidx = 0 | |||
self.lengths = self.dataset.get_length() | |||
def __len__(self): | |||
return self.num_batches | |||
def get_batch_indices(self): | |||
""" | |||
取得当前batch在DataSet中所在的index下标序列 | |||
:return list(int) indexes: 下标序列 | |||
""" | |||
return self.cur_batch_indices | |||
@staticmethod | |||
def _run_fetch(batch, q): | |||
try: | |||
global _python_is_exit | |||
batch.init_iter() | |||
# print('start fetch') | |||
while 1: | |||
res = batch.fetch_one() | |||
# print('fetch one') | |||
while 1: | |||
try: | |||
q.put(res, timeout=3) | |||
break | |||
except Full: | |||
if _python_is_exit: | |||
return | |||
if res is None: | |||
# print('fetch done, waiting processing') | |||
break | |||
# print('fetch exit') | |||
except Exception as e: | |||
q.put(e) | |||
finally: | |||
q.join() | |||
@staticmethod | |||
def _run_batch_iter(batch): | |||
q = mp.JoinableQueue(maxsize=10) | |||
fetch_p = mp.Process(target=Batch._run_fetch, args=(batch, q)) | |||
fetch_p.daemon = True | |||
fetch_p.start() | |||
# print('fork fetch process') | |||
while 1: | |||
try: | |||
res = q.get(timeout=1) | |||
q.task_done() | |||
# print('get fetched') | |||
if res is None: | |||
break | |||
elif isinstance(res, Exception): | |||
raise res | |||
yield res | |||
except Empty as e: | |||
if fetch_p.is_alive(): | |||
continue | |||
else: | |||
break | |||
fetch_p.terminate() | |||
fetch_p.join() | |||
# print('iter done') | |||
def to_tensor(batch, dtype): | |||
def _to_tensor(batch, dtype): | |||
try: | |||
if dtype in (int, np.int8, np.int16, np.int32, np.int64): | |||
batch = torch.LongTensor(batch) | |||
@@ -92,42 +186,3 @@ def to_tensor(batch, dtype): | |||
except: | |||
pass | |||
return batch | |||
def run_fetch(batch, q): | |||
batch.init_iter() | |||
# print('start fetch') | |||
while 1: | |||
res = batch.fetch_one() | |||
# print('fetch one') | |||
q.put(res) | |||
if res is None: | |||
# print('fetch done, waiting processing') | |||
q.join() | |||
break | |||
# print('fetch exit') | |||
def run_batch_iter(batch): | |||
q = mp.JoinableQueue(maxsize=10) | |||
fetch_p = mp.Process(target=run_fetch, args=(batch, q)) | |||
fetch_p.daemon = True | |||
fetch_p.start() | |||
# print('fork fetch process') | |||
while 1: | |||
try: | |||
res = q.get(timeout=1) | |||
q.task_done() | |||
# print('get fetched') | |||
if res is None: | |||
break | |||
yield res | |||
except Exception as e: | |||
if fetch_p.is_alive(): | |||
continue | |||
else: | |||
break | |||
fetch_p.terminate() | |||
fetch_p.join() | |||
# print('iter done') | |||
@@ -1,130 +1,298 @@ | |||
r""" | |||
callback模块实现了 fastNLP 中的许多 callback 类,用于增强 :class:`~fastNLP.Trainer` 类。 | |||
虽然Trainer本身已经集成了一些功能,但仍然不足以囊括训练过程中可能需要到的功能, | |||
比如负采样,learning rate decay, Early Stop等。 | |||
为了解决这个问题fastNLP引入了callback的机制,Callback 是一种在Trainer训练过程中特定阶段会运行的函数集合。 | |||
关于Trainer的详细文档,请参见 :doc:`trainer 模块<fastNLP.core.trainer>` | |||
我们将 :meth:`~fastNLP.Train.train` 这个函数内部分为以下的阶段,在对应阶段会触发相应的调用:: | |||
callback.on_train_begin() # 开始进行训练 | |||
for i in range(1, n_epochs+1): | |||
callback.on_epoch_begin() # 开始新的epoch | |||
for batch_x, batch_y in Batch: | |||
callback.on_batch_begin(batch_x, batch_y, indices) # batch_x是设置为input的field,batch_y是设置为target的field | |||
获取模型输出 | |||
callback.on_loss_begin() | |||
计算loss | |||
callback.on_backward_begin() # 可以进行一些检查,比如loss是否为None | |||
反向梯度回传 | |||
callback.on_backward_end() # 进行梯度截断等 | |||
进行参数更新 | |||
callback.on_step_end() | |||
callback.on_batch_end() | |||
# 根据设置进行evaluation,比如这是本epoch最后一个batch或者达到一定step | |||
if do evaluation: | |||
callback.on_valid_begin() | |||
进行dev data上的验证 | |||
callback.on_valid_end() # 可以进行在其它数据集上进行验证 | |||
callback.on_epoch_end() # epoch结束调用 | |||
callback.on_train_end() # 训练结束 | |||
callback.on_exception() # 这是一个特殊的步骤,在训练过程中遭遇exception会跳转到这里。 | |||
如下面的例子所示,我们可以使用内置的 callback 类,或者继承 :class:`~fastNLP.core.callback.Callback` | |||
定义自己的 callback 类:: | |||
from fastNLP import Callback, EarlyStopCallback, Trainer, CrossEntropyLoss, AccuracyMetric | |||
from fastNLP.models import CNNText | |||
start_time = time.time() | |||
class MyCallback(Callback): | |||
def on_epoch_end(self): | |||
print('{:d}ms\n\n'.format(round((time.time()-start_time)*1000))) | |||
model = CNNText((len(vocab),50), num_classes=5, padding=2, dropout=0.1) | |||
trainer = Trainer(model=model, train_data=train_data, dev_data=dev_data, loss=CrossEntropyLoss(), | |||
metrics=AccuracyMetric(), callbacks=[MyCallback(),EarlyStopCallback(10)]) | |||
trainer.train() | |||
""" | |||
__all__ = [ | |||
"Callback", | |||
"GradientClipCallback", | |||
"EarlyStopCallback", | |||
"TensorboardCallback", | |||
"LRScheduler", | |||
"ControlC", | |||
"CallbackException", | |||
"EarlyStopError" | |||
] | |||
import os | |||
import torch | |||
from tensorboardX import SummaryWriter | |||
from fastNLP.io.model_io import ModelSaver, ModelLoader | |||
from copy import deepcopy | |||
try: | |||
from tensorboardX import SummaryWriter | |||
tensorboardX_flag = True | |||
except: | |||
tensorboardX_flag = False | |||
from ..io.model_io import ModelSaver, ModelLoader | |||
from .dataset import DataSet | |||
from .tester import Tester | |||
try: | |||
import fitlog | |||
except: | |||
pass | |||
class Callback(object): | |||
"""An Interface for all callbacks. | |||
""" | |||
别名::class:`fastNLP.Callback` :class:`fastNLP.core.callback.Callback` | |||
Any customized callback should implement at least one of the following methods. | |||
Callback是fastNLP中被设计用于增强 :class:`~fastNLP.Trainer` 的类。 | |||
如果Callback被传递给了 Trainer , 则 Trainer 会在对应的阶段调用Callback的函数, | |||
具体调用时机可以通过 :doc:`trainer 模块<fastNLP.core.trainer>` 查看。 | |||
这是Callback的基类,所有的callback必须继承自这个类 | |||
""" | |||
def __init__(self): | |||
super(Callback, self).__init__() | |||
self.trainer = None # 在Trainer内部被重新赋值 | |||
self._trainer = None # 在Trainer内部被重新赋值 | |||
@property | |||
def trainer(self): | |||
""" | |||
该属性可以通过self.trainer获取到,一般情况下不需要使用这个属性。 | |||
""" | |||
return self._trainer | |||
@property | |||
def step(self): | |||
"""当前运行到的step, 范围为[1, self.n_steps+1)""" | |||
return self._trainer.step | |||
@property | |||
def n_steps(self): | |||
"""Trainer一共会运行多少步""" | |||
return self._trainer.n_steps | |||
@property | |||
def batch_size(self): | |||
"""train和evaluate时的batch_size为多大""" | |||
return self._trainer.batch_size | |||
@property | |||
def epoch(self): | |||
"""当前运行的epoch数,范围是[1, self.n_epochs+1)""" | |||
return self._trainer.epoch | |||
@property | |||
def n_epochs(self): | |||
"""一共会运行多少个epoch""" | |||
return self._trainer.n_epochs | |||
@property | |||
def optimizer(self): | |||
"""初始化Trainer时传递的Optimizer""" | |||
return self._trainer.optimizer | |||
@property | |||
def model(self): | |||
"""正在被Trainer训练的模型""" | |||
return self._trainer.model | |||
@property | |||
def pbar(self): | |||
"""如果在Callback中需要打印内容,请使用self.pbar.write(str)。否则可能出现命令行显示效果不太好的问题。在 | |||
on_train_begin(), on_train_end(), on_exception()中请不要使用该属性,通过print输出即可。""" | |||
return self._trainer.pbar | |||
@property | |||
def update_every(self): | |||
"""Trainer中的模型多少次反向传播才进行一次梯度更新,在Trainer初始化时传入的。""" | |||
return self._trainer.update_every | |||
@property | |||
def batch_per_epoch(self): | |||
"""每个epoch一共有多少个batch,只有在on_epoch_begin之后才能调用该属性。""" | |||
return self._trainer.batch_per_epoch | |||
def on_train_begin(self): | |||
# before the main training loop | |||
pass | |||
""" | |||
在Train过程开始之前调用。 | |||
def on_epoch_begin(self, cur_epoch, total_epoch): | |||
# at the beginning of each epoch | |||
:return: | |||
""" | |||
pass | |||
def on_epoch_begin(self): | |||
""" | |||
在每个epoch开始之前调用一次 | |||
def on_batch_begin(self, batch_x, batch_y, indices): | |||
# at the beginning of each step/mini-batch | |||
:return: | |||
""" | |||
pass | |||
def on_batch_begin(self, batch_x, batch_y, indices): | |||
""" | |||
每次采集到一个batch的数据则调用一次。这里对batch_x或batch_y删除添加内容是可以影响到Trainer中内容的。所以在这一步 | |||
可以进行一些负采样之类的操作 | |||
def on_loss_begin(self, batch_y, predict_y): | |||
# after data_forward, and before loss computation | |||
:param dict batch_x: DataSet中被设置为input的field的batch。 | |||
:param dict batch_y: DataSet中被设置为target的field的batch。 | |||
:param list(int) indices: 这次采样使用到的indices,可以通过DataSet[indices]获取出这个batch采出的Instance,在一些 | |||
情况下可以帮助定位是哪个Sample导致了错误。仅在Trainer的prefetch为False时可用。 | |||
:return: | |||
""" | |||
pass | |||
def on_loss_begin(self, batch_y, predict_y): | |||
""" | |||
在计算loss前调用,即这里修改batch_y或predict_y的值是可以影响到loss计算的。 | |||
def on_backward_begin(self, loss, model): | |||
# after loss computation, and before gradient backward | |||
:param dict batch_y: 在DataSet中被设置为target的field的batch集合。 | |||
:param dict predict_y: 模型的forward()返回的结果。 | |||
:return: | |||
""" | |||
pass | |||
def on_backward_begin(self, loss): | |||
""" | |||
在loss得到之后,但在反向传播之前。可能可以进行loss是否为NaN的检查。 | |||
def on_backward_end(self, model): | |||
:param torch.Tensor loss: 计算得到的loss值 | |||
:return: | |||
""" | |||
pass | |||
def on_backward_end(self): | |||
""" | |||
反向梯度传播已完成,但由于update_every的设置,可能并不是每一次调用都有梯度。到这一步,还没有更新参数。 | |||
def on_step_end(self, optimizer): | |||
:return: | |||
""" | |||
pass | |||
def on_step_end(self): | |||
""" | |||
到这里模型的参数已经按照梯度更新。但可能受update_every影响,并不是每次都更新了。 | |||
def on_batch_end(self, *args): | |||
# at the end of each step/mini-batch | |||
:return: | |||
""" | |||
pass | |||
def on_batch_end(self): | |||
""" | |||
这一步与on_step_end是紧接着的。只是为了对称性加上了这一步。 | |||
def on_valid_begin(self): | |||
""" | |||
pass | |||
def on_valid_end(self, eval_result, metric_key, optimizer): | |||
def on_valid_begin(self): | |||
""" | |||
每次执行验证机的evaluation后会调用。传入eval_result | |||
如果Trainer中设置了验证,则发生验证前会调用该函数 | |||
:param eval_result: Dict[str: Dict[str: float]], evaluation的结果 | |||
:param metric_key: str | |||
:param optimizer: | |||
:return: | |||
""" | |||
pass | |||
def on_epoch_end(self, cur_epoch, n_epoch, optimizer): | |||
def on_valid_end(self, eval_result, metric_key, optimizer, is_better_eval): | |||
""" | |||
每个epoch结束将会调用该方法 | |||
每次执行验证集的evaluation后会调用。 | |||
:param cur_epoch: int, 当前的batch。从1开始。 | |||
:param n_epoch: int, 总的batch数 | |||
:param optimizer: 传入Trainer的optimizer。 | |||
:param Dict[str: Dict[str: float]] eval_result: , evaluation的结果。一个例子为{'AccuracyMetric':{'acc':1.0}},即 | |||
传入的dict是有两层,第一层是metric的名称,第二层是metric的具体指标。 | |||
:param str metric_key: 初始化Trainer时传入的metric_key。 | |||
:param torch.Optimizer optimizer: Trainer中使用的优化器。 | |||
:param bool is_better_eval: 当前dev结果是否比之前的好。 | |||
:return: | |||
""" | |||
pass | |||
def on_train_end(self, model): | |||
def on_epoch_end(self): | |||
""" | |||
每个epoch结束将会调用该方法 | |||
""" | |||
pass | |||
def on_train_end(self): | |||
""" | |||
训练结束,调用该方法 | |||
:param model: nn.Module, 传入Trainer的模型 | |||
:return: | |||
""" | |||
pass | |||
def on_exception(self, exception, model): | |||
def on_exception(self, exception): | |||
""" | |||
当训练过程出现异常,会触发该方法 | |||
:param exception: 某种类型的Exception,比如KeyboardInterrupt等 | |||
:param model: 传入Trainer的模型 | |||
:return: | |||
""" | |||
pass | |||
def transfer(func): | |||
def _transfer(func): | |||
"""装饰器,将对CallbackManager的调用转发到各个Callback子类. | |||
:param func: | |||
:return: | |||
""" | |||
def wrapper(manager, *arg): | |||
returns = [] | |||
for callback in manager.callbacks: | |||
for env_name, env_value in manager.env.items(): | |||
setattr(callback, env_name, env_value) | |||
returns.append(getattr(callback, func.__name__)(*arg)) | |||
return returns | |||
return wrapper | |||
class CallbackManager(Callback): | |||
"""A manager for all callbacks passed into Trainer. | |||
It collects resources inside Trainer and raise callbacks. | |||
""" | |||
def __init__(self, env, callbacks=None): | |||
""" | |||
内部使用的Callback管理类 | |||
:param dict env: The key is the name of the Trainer attribute(str). The value is the attribute itself. | |||
:param Callback callbacks: | |||
:param List[Callback] callbacks: | |||
""" | |||
super(CallbackManager, self).__init__() | |||
# set attribute of trainer environment | |||
self.env = env | |||
self.callbacks = [] | |||
if callbacks is not None: | |||
if isinstance(callbacks, list): | |||
@@ -135,108 +303,87 @@ class CallbackManager(Callback): | |||
raise TypeError(f"Expect sub-classes of Callback. Got {type(obj)}") | |||
else: | |||
raise TypeError(f"Expect callbacks in CallbackManager(callbacks) to be list. Got {type(callbacks)}.") | |||
@transfer | |||
for env_name, env_val in env.items(): | |||
for callback in self.callbacks: | |||
setattr(callback, '_' + env_name, env_val) # Callback.trainer | |||
@_transfer | |||
def on_train_begin(self): | |||
pass | |||
@transfer | |||
def on_epoch_begin(self, cur_epoch, total_epoch): | |||
@_transfer | |||
def on_epoch_begin(self): | |||
pass | |||
@transfer | |||
@_transfer | |||
def on_batch_begin(self, batch_x, batch_y, indices): | |||
pass | |||
@transfer | |||
@_transfer | |||
def on_loss_begin(self, batch_y, predict_y): | |||
pass | |||
@transfer | |||
def on_backward_begin(self, loss, model): | |||
@_transfer | |||
def on_backward_begin(self, loss): | |||
pass | |||
@transfer | |||
def on_backward_end(self, model): | |||
@_transfer | |||
def on_backward_end(self): | |||
pass | |||
@transfer | |||
def on_step_end(self, optimizer): | |||
@_transfer | |||
def on_step_end(self): | |||
pass | |||
@transfer | |||
@_transfer | |||
def on_batch_end(self): | |||
pass | |||
@transfer | |||
@_transfer | |||
def on_valid_begin(self): | |||
pass | |||
@transfer | |||
def on_valid_end(self, eval_result, metric_key, optimizer): | |||
@_transfer | |||
def on_valid_end(self, eval_result, metric_key, optimizer, is_better_eval): | |||
pass | |||
@transfer | |||
def on_epoch_end(self, cur_epoch, n_epoch, optimizer): | |||
@_transfer | |||
def on_epoch_end(self): | |||
pass | |||
@transfer | |||
def on_train_end(self, model): | |||
@_transfer | |||
def on_train_end(self): | |||
pass | |||
@transfer | |||
def on_exception(self, exception, model): | |||
@_transfer | |||
def on_exception(self, exception): | |||
pass | |||
class DummyCallback(Callback): | |||
def on_train_begin(self, *arg): | |||
print(arg) | |||
def on_epoch_end(self, cur_epoch, n_epoch, optimizer): | |||
print(cur_epoch, n_epoch, optimizer) | |||
class EchoCallback(Callback): | |||
def on_train_begin(self): | |||
print("before_train") | |||
def on_epoch_begin(self, cur_epoch, total_epoch): | |||
print("before_epoch") | |||
def on_batch_begin(self, batch_x, batch_y, indices): | |||
print("before_batch") | |||
def on_loss_begin(self, batch_y, predict_y): | |||
print("before_loss") | |||
def on_backward_begin(self, loss, model): | |||
print("before_backward") | |||
def on_batch_end(self): | |||
print("after_batch") | |||
class GradientClipCallback(Callback): | |||
""" | |||
别名::class:`fastNLP.GradientClipCallback` :class:`fastNLP.core.callback.GradientClipCallback` | |||
def on_epoch_end(self, cur_epoch, n_epoch, optimizer): | |||
print("after_epoch") | |||
每次backward前,将parameter的gradient clip到某个范围。 | |||
def on_train_end(self, model): | |||
print("after_train") | |||
:param None,torch.Tensor,List[torch.Tensor] parameters: 一般通过model.parameters()获得。如果为None则默认对Trainer | |||
的model中所有参数进行clip | |||
:param float clip_value: 将gradient 限制到[-clip_value, clip_value]。clip_value应该为正数 | |||
:param str clip_type: 支持'norm', 'value' | |||
两种:: | |||
1 'norm', 将gradient的norm rescale到[-clip_value, clip_value] | |||
2 'value', 将gradient限制在[-clip_value, clip_value], 小于-clip_value的gradient被赋值为-clip_value; | |||
大于clip_value的gradient被赋值为clip_value. | |||
class GradientClipCallback(Callback): | |||
""" | |||
def __init__(self, parameters=None, clip_value=1, clip_type='norm'): | |||
"""每次backward前,将parameter的gradient clip到某个范围。 | |||
:param parameters: None, torch.Tensor或List[torch.Tensor], 一般通过model.parameters()获得。如果为None则默认对Trainer | |||
的model中所有参数进行clip | |||
:param clip_value: float, 将gradient 限制到[-clip_value, clip_value]。clip_value应该为正数 | |||
:param clip_type: str, 支持'norm', 'value'两种。 | |||
(1) 'norm', 将gradient的norm rescale到[-clip_value, clip_value] | |||
(2) 'value', 将gradient限制在[-clip_value, clip_value], 小于-clip_value的gradient被赋值为-clip_value; 大于 | |||
clip_value的gradient被赋值为clip_value. | |||
""" | |||
super().__init__() | |||
from torch import nn | |||
if clip_type == 'norm': | |||
self.clip_fun = nn.utils.clip_grad_norm_ | |||
@@ -246,36 +393,30 @@ class GradientClipCallback(Callback): | |||
raise ValueError("Only supports `norm` or `value` right now.") | |||
self.parameters = parameters | |||
self.clip_value = clip_value | |||
def on_backward_end(self, model): | |||
self.clip_fun(model.parameters(), self.clip_value) | |||
class CallbackException(BaseException): | |||
def __init__(self, msg): | |||
super(CallbackException, self).__init__(msg) | |||
class EarlyStopError(CallbackException): | |||
def __init__(self, msg): | |||
super(EarlyStopError, self).__init__(msg) | |||
def on_backward_end(self): | |||
if self.parameters is None: | |||
self.clip_fun(self.model.parameters(), self.clip_value) | |||
else: | |||
self.clip_fun(self.parameters, self.clip_value) | |||
class EarlyStopCallback(Callback): | |||
def __init__(self, patience): | |||
""" | |||
""" | |||
别名::class:`fastNLP.EarlyStopCallback` :class:`fastNLP.core.callback.EarlyStopCallback` | |||
多少个epoch没有变好就停止训练,相关类 :class:`EarlyStopError` | |||
:param int patience: 停止之前等待的epoch数 | |||
""" | |||
:param int patience: epoch的数量 | |||
""" | |||
def __init__(self, patience): | |||
super(EarlyStopCallback, self).__init__() | |||
self.trainer = None # override by CallbackManager | |||
self.patience = patience | |||
self.wait = 0 | |||
self.epoch = 0 | |||
def on_valid_end(self, eval_result, metric_key, optimizer): | |||
self.epoch += 1 | |||
if not self.trainer._better_eval_result(eval_result): | |||
def on_valid_end(self, eval_result, metric_key, optimizer, is_better_eval): | |||
if not is_better_eval: | |||
# current result is getting worse | |||
if self.wait == self.patience: | |||
raise EarlyStopError("Early stopping raised.") | |||
@@ -283,44 +424,135 @@ class EarlyStopCallback(Callback): | |||
self.wait += 1 | |||
else: | |||
self.wait = 0 | |||
def on_exception(self, exception, model): | |||
def on_exception(self, exception): | |||
if isinstance(exception, EarlyStopError): | |||
print("Early Stopping triggered in epoch {}!".format(self.epoch)) | |||
else: | |||
raise exception # 抛出陌生Error | |||
class FitlogCallback(Callback): | |||
""" | |||
别名: :class:`fastNLP.FitlogCallback` :class:`fastNLP.core.callback.FitlogCallback` | |||
该callback将loss和progress自动写入到fitlog中; 如果Trainer有dev的数据,将自动把dev的结果写入到log中; 同时还支持传入 | |||
一个(或多个)test数据集进行测试(只有在trainer具有dev时才能使用),每次在dev上evaluate之后会在这些数据集上验证一下。 | |||
并将验证结果写入到fitlog中。这些数据集的结果是根据dev上最好的结果报道的,即如果dev在第3个epoch取得了最佳,则 | |||
fitlog中记录的关于这些数据集的结果就是来自第三个epoch的结果。 | |||
:param DataSet,dict(DataSet) data: 传入DataSet对象,会使用多个Trainer中的metric对数据进行验证。如果需要传入多个 | |||
DataSet请通过dict的方式传入,dict的key将作为对应dataset的name传递给fitlog。若tester不为None时,data需要通过 | |||
dict的方式传入。如果仅传入DataSet, 则被命名为test | |||
:param Tester tester: Tester对象,将在on_valid_end时调用。tester中的DataSet会被称为为`test` | |||
:param int verbose: 是否在终端打印内容,0不打印 | |||
:param bool log_exception: fitlog是否记录发生的exception信息 | |||
""" | |||
def __init__(self, data=None, tester=None, verbose=0, log_exception=False): | |||
super().__init__() | |||
self.datasets = {} | |||
self.testers = {} | |||
self._log_exception = log_exception | |||
if tester is not None: | |||
assert isinstance(tester, Tester), "Only fastNLP.Tester allowed." | |||
assert isinstance(data, dict) or data is None, "If tester is not None, only dict[DataSet] allowed for data." | |||
if data is not None: | |||
assert 'test' not in data, "Cannot use `test` as DataSet key, when tester is passed." | |||
setattr(tester, 'verbose', 0) | |||
self.testers['test'] = tester | |||
if isinstance(data, dict): | |||
for key, value in data.items(): | |||
assert isinstance(value, DataSet), f"Only DataSet object is allowed, not {type(value)}." | |||
for key, value in data.items(): | |||
self.datasets[key] = value | |||
elif isinstance(data, DataSet): | |||
self.datasets['test'] = data | |||
else: | |||
raise TypeError("data receives dict[DataSet] or DataSet object.") | |||
self.verbose = verbose | |||
def on_train_begin(self): | |||
if (len(self.datasets)>0 or len(self.testers)>0 ) and self.trainer.dev_data is None: | |||
raise RuntimeError("Trainer has no dev data, you cannot pass extra data to do evaluation.") | |||
if len(self.datasets)>0: | |||
for key, data in self.datasets.items(): | |||
tester = Tester(data=data, model=self.model, batch_size=self.batch_size, metrics=self.trainer.metrics, | |||
verbose=0) | |||
self.testers[key] = tester | |||
fitlog.add_progress(total_steps=self.n_steps) | |||
def on_backward_begin(self, loss): | |||
fitlog.add_loss(loss.item(), name='loss', step=self.step, epoch=self.epoch) | |||
def on_valid_end(self, eval_result, metric_key, optimizer, better_result): | |||
if better_result: | |||
eval_result = deepcopy(eval_result) | |||
eval_result['step'] = self.step | |||
eval_result['epoch'] = self.epoch | |||
fitlog.add_best_metric(eval_result) | |||
fitlog.add_metric(eval_result, step=self.step, epoch=self.epoch) | |||
if len(self.testers)>0: | |||
for key, tester in self.testers.items(): | |||
try: | |||
eval_result = tester.test() | |||
if self.verbose!=0: | |||
self.pbar.write("Evaluation on DataSet {}:".format(key)) | |||
self.pbar.write(tester._format_eval_results(eval_result)) | |||
fitlog.add_metric(eval_result, name=key, step=self.step, epoch=self.epoch) | |||
if better_result: | |||
fitlog.add_best_metric(eval_result, name=key) | |||
except Exception: | |||
self.pbar.write("Exception happens when evaluate on DataSet named `{}`.".format(key)) | |||
def on_train_end(self): | |||
fitlog.finish() | |||
def on_exception(self, exception): | |||
fitlog.finish(status=1) | |||
if self._log_exception: | |||
fitlog.add_other(str(exception), name='except_info') | |||
class LRScheduler(Callback): | |||
def __init__(self, lr_scheduler): | |||
"""对PyTorch LR Scheduler的包装 | |||
""" | |||
别名::class:`fastNLP.LRScheduler` :class:`fastNLP.core.callback.LRScheduler` | |||
:param lr_scheduler: PyTorch的lr_scheduler | |||
""" | |||
对PyTorch LR Scheduler的包装以使得其可以被Trainer所使用 | |||
:param torch.optim.lr_scheduler._LRScheduler lr_scheduler: PyTorch的lr_scheduler | |||
""" | |||
def __init__(self, lr_scheduler): | |||
super(LRScheduler, self).__init__() | |||
import torch.optim | |||
if isinstance(lr_scheduler, torch.optim.lr_scheduler._LRScheduler): | |||
self.scheduler = lr_scheduler | |||
else: | |||
raise ValueError(f"Expect torch.optim.lr_scheduler for LRScheduler. Got {type(lr_scheduler)}.") | |||
def on_epoch_begin(self, cur_epoch, total_epoch): | |||
self.scheduler.step() | |||
print("scheduler step ", "lr=", self.trainer.optimizer.param_groups[0]["lr"]) | |||
def on_epoch_begin(self): | |||
self.scheduler.step(self.epoch) | |||
class ControlC(Callback): | |||
def __init__(self, quit_all): | |||
""" | |||
""" | |||
别名::class:`fastNLP.ControlC` :class:`fastNLP.core.callback.ControlC` | |||
:param quit_all: 若为True,则检测到control+C 直接退出程序;否则只退出Trainer | |||
""" | |||
:param bool quit_all: 若为True,则检测到control+C 直接退出程序;否则只退出Trainer | |||
""" | |||
def __init__(self, quit_all): | |||
super(ControlC, self).__init__() | |||
if type(quit_all) != bool: | |||
raise ValueError("In KeyBoardInterrupt, quit_all arguemnt must be a bool.") | |||
self.quit_all = quit_all | |||
def on_exception(self, exception, model): | |||
def on_exception(self, exception): | |||
if isinstance(exception, KeyboardInterrupt): | |||
if self.quit_all is True: | |||
import sys | |||
@@ -335,7 +567,7 @@ class SmoothValue(object): | |||
def __init__(self, beta: float): | |||
self.beta, self.n, self.mov_avg = beta, 0, 0 | |||
self.smooth = None | |||
def add_value(self, val: float) -> None: | |||
"Add `val` to calculate updated smoothed value." | |||
self.n += 1 | |||
@@ -344,48 +576,58 @@ class SmoothValue(object): | |||
class LRFinder(Callback): | |||
def __init__(self, n_batch, start_lr=1e-6, end_lr=10): | |||
"""用第一个 epoch 找最佳的学习率,从第二个epoch开始应用它 | |||
""" | |||
别名::class:`fastNLP.LRFinder` :class:`fastNLP.core.callback.LRFinder` | |||
:param n_batch: 一个epoch内的iteration数 | |||
:param start_lr: 学习率下界 | |||
:param end_lr: 学习率上界 | |||
""" | |||
用第一个 epoch 找最佳的学习率,从第二个epoch开始应用它 | |||
:param float start_lr: 学习率下界 | |||
:param float end_lr: 学习率上界 | |||
""" | |||
def __init__(self, start_lr=1e-6, end_lr=10): | |||
super(LRFinder, self).__init__() | |||
self.start_lr, self.end_lr = start_lr, end_lr | |||
self.num_it = n_batch | |||
self.stop = False | |||
self.best_loss = 0. | |||
self.best_lr = None | |||
self.loss_history = [] | |||
self.smooth_value = SmoothValue(0.8) | |||
self.opt = None | |||
scale = (self.end_lr - self.start_lr) / self.num_it | |||
self.lr_gen = (self.start_lr + scale * (step + 1) for step in range(self.num_it)) | |||
self.find = None | |||
self.loader = ModelLoader() | |||
def on_epoch_begin(self, cur_epoch, total_epoch): | |||
if cur_epoch == 1: | |||
@property | |||
def lr_gen(self): | |||
scale = (self.end_lr - self.start_lr) / self.batch_per_epoch | |||
return (self.start_lr + scale * (step + 1) for step in range(self.batch_per_epoch)) | |||
@property | |||
def num_it(self): | |||
return self.batch_per_epoch | |||
def on_epoch_begin(self): | |||
if self.epoch == 1: # first epoch | |||
self.opt = self.trainer.optimizer # pytorch optimizer | |||
self.opt.param_groups[0]["lr"] = self.start_lr | |||
# save model | |||
ModelSaver("tmp").save_pytorch(self.trainer.model, param_only=True) | |||
self.find = True | |||
def on_backward_begin(self, loss, model): | |||
def on_backward_begin(self, loss): | |||
if self.find: | |||
if torch.isnan(loss) or self.stop is True: | |||
self.stop = True | |||
return | |||
loss_val = loss.detach().cpu().data | |||
loss_val = loss.detach().mean().item() | |||
self.loss_history.append(loss_val) | |||
self.smooth_value.add_value(loss_val) | |||
if self.best_loss == 0. or self.smooth_value.smooth < self.best_loss: | |||
self.best_loss = self.smooth_value.smooth | |||
self.best_lr = self.opt.param_groups[0]["lr"] | |||
def on_batch_end(self, *args): | |||
if self.find: | |||
lr = next(self.lr_gen, None) | |||
@@ -394,24 +636,31 @@ class LRFinder(Callback): | |||
return | |||
self.opt.param_groups[0]["lr"] = lr | |||
# self.loader.load_pytorch(self.trainer.model, "tmp") | |||
def on_epoch_end(self, cur_epoch, n_epoch, optimizer): | |||
if cur_epoch == 1: | |||
def on_epoch_end(self): | |||
if self.epoch == 1: # first epoch | |||
self.opt.param_groups[0]["lr"] = self.best_lr | |||
self.find = False | |||
# reset model | |||
ModelLoader().load_pytorch(self.trainer.model, "tmp") | |||
print("Model reset. \nFind best lr={}".format(self.best_lr)) | |||
self.pbar.write("Model reset. \nFind best lr={}".format(self.best_lr)) | |||
class TensorboardCallback(Callback): | |||
""" | |||
接受以下一个或多个字符串作为参数: | |||
- "model" | |||
- "loss" | |||
- "metric" | |||
别名::class:`fastNLP.TensorboardCallback` :class:`fastNLP.core.callback.TensorboardCallback` | |||
接受以下一个或多个字符串作为参数: | |||
- "model" | |||
- "loss" | |||
- "metric" | |||
.. warning:: | |||
fastNLP 已停止对此功能的维护,请等待 fastNLP 兼容 PyTorch1.1 的下一个版本。 | |||
或者使用和 fastNLP 高度配合的 fitlog(参见 :doc:`/user/with_fitlog` )。 | |||
""" | |||
def __init__(self, *options): | |||
super(TensorboardCallback, self).__init__() | |||
args = {"model", "loss", "metric"} | |||
@@ -421,15 +670,18 @@ class TensorboardCallback(Callback): | |||
self.options = options | |||
self._summary_writer = None | |||
self.graph_added = False | |||
def on_train_begin(self): | |||
save_dir = self.trainer.save_path | |||
if save_dir is None: | |||
path = os.path.join("./", 'tensorboard_logs_{}'.format(self.trainer.start_time)) | |||
else: | |||
path = os.path.join(save_dir, 'tensorboard_logs_{}'.format(self.trainer.start_time)) | |||
self._summary_writer = SummaryWriter(path) | |||
if tensorboardX_flag: | |||
self._summary_writer = SummaryWriter(path) | |||
else: | |||
self._summary_writer = None | |||
def on_batch_begin(self, batch_x, batch_y, indices): | |||
if "model" in self.options and self.graph_added is False: | |||
# tesorboardX 这里有大bug,暂时没法画模型图 | |||
@@ -439,37 +691,53 @@ class TensorboardCallback(Callback): | |||
# args = args[0] if len(args) == 1 else args | |||
# self._summary_writer.add_graph(self.trainer.model, torch.zeros(32, 2)) | |||
self.graph_added = True | |||
def on_backward_begin(self, loss, model): | |||
if "loss" in self.options: | |||
def on_backward_begin(self, loss): | |||
if "loss" in self.options and self._summary_writer: | |||
self._summary_writer.add_scalar("loss", loss.item(), global_step=self.trainer.step) | |||
if "model" in self.options: | |||
if "model" in self.options and self._summary_writer: | |||
for name, param in self.trainer.model.named_parameters(): | |||
if param.requires_grad: | |||
self._summary_writer.add_scalar(name + "_mean", param.mean(), global_step=self.trainer.step) | |||
# self._summary_writer.add_scalar(name + "_std", param.std(), global_step=self.trainer.step) | |||
self._summary_writer.add_scalar(name + "_grad_mean", param.grad.mean(), | |||
global_step=self.trainer.step) | |||
def on_valid_end(self, eval_result, metric_key, optimizer): | |||
if "metric" in self.options: | |||
def on_valid_end(self, eval_result, metric_key, optimizer, is_better_eval): | |||
if "metric" in self.options and self._summary_writer: | |||
for name, metric in eval_result.items(): | |||
for metric_key, metric_val in metric.items(): | |||
self._summary_writer.add_scalar("valid_{}_{}".format(name, metric_key), metric_val, | |||
global_step=self.trainer.step) | |||
def on_train_end(self, model): | |||
self._summary_writer.close() | |||
del self._summary_writer | |||
def on_exception(self, exception, model): | |||
def on_train_end(self): | |||
if self._summary_writer: | |||
self._summary_writer.close() | |||
del self._summary_writer | |||
def on_exception(self, exception): | |||
if hasattr(self, "_summary_writer"): | |||
self._summary_writer.close() | |||
del self._summary_writer | |||
if __name__ == "__main__": | |||
manager = CallbackManager(env={"n_epoch": 3}, callbacks=[DummyCallback(), DummyCallback()]) | |||
manager.on_train_begin(10, 11, 12) | |||
# print(manager.after_epoch()) | |||
class CallbackException(BaseException): | |||
""" | |||
当需要通过callback跳出训练的时候可以通过抛出CallbackException并在on_exception中捕获这个值。 | |||
:param str msg: Exception的信息。 | |||
""" | |||
def __init__(self, msg): | |||
super(CallbackException, self).__init__(msg) | |||
class EarlyStopError(CallbackException): | |||
""" | |||
用于EarlyStop时从Trainer训练循环中跳出。 | |||
""" | |||
def __init__(self, msg): | |||
super(EarlyStopError, self).__init__(msg) |
@@ -0,0 +1,59 @@ | |||
class Const: | |||
""" | |||
fastNLP中field命名常量。 | |||
.. todo:: | |||
把下面这段改成表格 | |||
具体列表:: | |||
INPUT 模型的序列输入 words(复数words1, words2) | |||
CHAR_INPUT 模型character输入 chars(复数chars1, chars2) | |||
INPUT_LEN 序列长度 seq_len(复数seq_len1,seq_len2) | |||
OUTPUT 模型输出 pred(复数pred1, pred2) | |||
TARGET 真实目标 target(复数target1,target2) | |||
LOSS 损失函数 loss (复数loss1,loss2) | |||
""" | |||
INPUT = 'words' | |||
CHAR_INPUT = 'chars' | |||
INPUT_LEN = 'seq_len' | |||
OUTPUT = 'pred' | |||
TARGET = 'target' | |||
LOSS = 'loss' | |||
@staticmethod | |||
def INPUTS(i): | |||
"""得到第 i 个 ``INPUT`` 的命名""" | |||
i = int(i) + 1 | |||
return Const.INPUT + str(i) | |||
@staticmethod | |||
def CHAR_INPUTS(i): | |||
"""得到第 i 个 ``CHAR_INPUT`` 的命名""" | |||
i = int(i) + 1 | |||
return Const.CHAR_INPUT + str(i) | |||
@staticmethod | |||
def INPUT_LENS(i): | |||
"""得到第 i 个 ``INPUT_LEN`` 的命名""" | |||
i = int(i) + 1 | |||
return Const.INPUT_LEN + str(i) | |||
@staticmethod | |||
def OUTPUTS(i): | |||
"""得到第 i 个 ``OUTPUT`` 的命名""" | |||
i = int(i) + 1 | |||
return Const.OUTPUT + str(i) | |||
@staticmethod | |||
def TARGETS(i): | |||
"""得到第 i 个 ``TARGET`` 的命名""" | |||
i = int(i) + 1 | |||
return Const.TARGET + str(i) | |||
@staticmethod | |||
def LOSSES(i): | |||
"""得到第 i 个 ``LOSS`` 的命名""" | |||
i = int(i) + 1 | |||
return Const.LOSS + str(i) |
@@ -1,121 +1,37 @@ | |||
import numpy as np | |||
class PadderBase: | |||
""" | |||
所有padder都需要继承这个类,并覆盖__call__()方法。 | |||
用于对batch进行padding操作。传入的element是inplace的,即直接修改element可能导致数据变化,建议inplace修改之前deepcopy一份。 | |||
""" | |||
def __init__(self, pad_val=0, **kwargs): | |||
self.pad_val = pad_val | |||
def set_pad_val(self, pad_val): | |||
self.pad_val = pad_val | |||
def __call__(self, contents, field_name, field_ele_dtype): | |||
""" | |||
传入的是List内容。假设有以下的DataSet。 | |||
from fastNLP import DataSet | |||
from fastNLP import Instance | |||
dataset = DataSet() | |||
dataset.append(Instance(word='this is a demo', length=4, | |||
chars=[['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['d', 'e', 'm', 'o']])) | |||
dataset.append(Instance(word='another one', length=2, | |||
chars=[['a', 'n', 'o', 't', 'h', 'e', 'r'], ['o', 'n', 'e']])) | |||
# 如果batch_size=2, 下面只是用str的方式看起来更直观一点,但实际上可能word和chars在pad时都已经为index了。 | |||
word这个field的pad_func会接收到的内容会是 | |||
[ | |||
'this is a demo', | |||
'another one' | |||
] | |||
length这个field的pad_func会接收到的内容会是 | |||
[4, 2] | |||
chars这个field的pad_func会接收到的内容会是 | |||
[ | |||
[['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['d', 'e', 'm', 'o']], | |||
[['a', 'n', 'o', 't', 'h', 'e', 'r'], ['o', 'n', 'e']] | |||
] | |||
即把每个instance中某个field的内容合成一个List传入 | |||
:param contents: List[element]。传入的element是inplace的,即直接修改element可能导致数据变化,建议inplace修改之前 | |||
deepcopy一份。 | |||
:param field_name: str, field的名称,帮助定位错误 | |||
:param field_ele_dtype: np.int64, np.float64, np.str. 该field的内层list元素的类型。辅助判断是否pad,大多数情况用不上 | |||
:return: List[padded_element]或np.array([padded_element]) | |||
""" | |||
raise NotImplementedError | |||
""" | |||
field模块实现了 FieldArray 和若干 Padder。 FieldArray 是 :class:`~fastNLP.DataSet` 中一列的存储方式, | |||
原理部分请参考 :doc:`fastNLP.core.dataset` | |||
class AutoPadder(PadderBase): | |||
""" | |||
根据contents的数据自动判定是否需要做padding。 | |||
(1) 如果元素类型(元素类型是指field中最里层List的元素的数据类型, 可以通过FieldArray.dtype查看,比如['This', 'is', ...]的元素类 | |||
型为np.str, [[1,2], ...]的元素类型为np.int64)的数据不为(np.int64, np.float64)则不会进行padding | |||
(2) 如果元素类型为(np.int64, np.float64), | |||
(2.1) 如果该field的内容只有一个,比如为sequence_length, 则不进行padding | |||
(2.2) 如果该field的内容为List, 那么会将Batch中的List pad为一样长。若该List下还有里层的List需要padding,请使用其它padder。 | |||
如果某个instance中field为[1, 2, 3],则可以pad; 若为[[1,2], [3,4, ...]]则不能进行pad | |||
""" | |||
def __init__(self, pad_val=0): | |||
""" | |||
:param pad_val: int, padding的位置使用该index | |||
""" | |||
super().__init__(pad_val=pad_val) | |||
""" | |||
__all__ = [ | |||
"FieldArray", | |||
"Padder", | |||
"AutoPadder", | |||
"EngChar2DPadder" | |||
] | |||
def _is_two_dimension(self, contents): | |||
""" | |||
判断contents是不是只有两个维度。[[1,2], [3]]是两个维度. [[[1,2], [3, 4, 5]], [[4,5]]]有三个维度 | |||
:param contents: | |||
:return: | |||
""" | |||
value = contents[0] | |||
if isinstance(value , (np.ndarray, list)): | |||
value = value[0] | |||
if isinstance(value, (np.ndarray, list)): | |||
return False | |||
return True | |||
return False | |||
from copy import deepcopy | |||
def __call__(self, contents, field_name, field_ele_dtype): | |||
if not is_iterable(contents[0]): | |||
array = np.array([content for content in contents], dtype=field_ele_dtype) | |||
elif field_ele_dtype in (np.int64, np.float64) and self._is_two_dimension(contents): | |||
max_len = max([len(content) for content in contents]) | |||
array = np.full((len(contents), max_len), self.pad_val, dtype=field_ele_dtype) | |||
for i, content in enumerate(contents): | |||
array[i][:len(content)] = content | |||
else: # should only be str | |||
array = np.array([content for content in contents]) | |||
return array | |||
import numpy as np | |||
class FieldArray(object): | |||
"""``FieldArray`` is the collection of ``Instance``s of the same field. | |||
It is the basic element of ``DataSet`` class. | |||
:param str name: the name of the FieldArray | |||
:param list content: a list of int, float, str or np.ndarray, or a list of list of one, or a np.ndarray. | |||
:param bool is_target: If True, this FieldArray is used to compute loss. | |||
:param bool is_input: If True, this FieldArray is used to the model input. | |||
:param padder: PadderBase类型。大多数情况下都不需要设置该值,除非需要在多个维度上进行padding(比如英文中对character进行padding) | |||
""" | |||
def __init__(self, name, content, is_target=None, is_input=None, padder=AutoPadder(pad_val=0)): | |||
"""DataSet在初始化时会有两类方法对FieldArray操作: | |||
1) 如果DataSet使用dict初始化,那么在add_field中会构造FieldArray: | |||
1.1) 二维list DataSet({"x": [[1, 2], [3, 4]]}) | |||
1.2) 二维array DataSet({"x": np.array([[1, 2], [3, 4]])}) | |||
1.3) 三维list DataSet({"x": [[[1, 2], [3, 4]], [[1, 2], [3, 4]]]}) | |||
1.4) list of array: DataSet({"x": [np.array([1,2,3]), np.array([1,2,3])]}) | |||
2) 如果DataSet使用list of Instance 初始化,那么在append中会先对第一个样本初始化FieldArray; | |||
然后后面的样本使用FieldArray.append进行添加。 | |||
2.1) 一维list DataSet([Instance(x=[1, 2, 3, 4])]) | |||
2.2) 一维array DataSet([Instance(x=np.array([1, 2, 3, 4]))]) | |||
2.3) 二维list DataSet([Instance(x=[[1, 2], [3, 4]])]) | |||
2.4) 二维array DataSet([Instance(x=np.array([[1, 2], [3, 4]]))]) | |||
类型检查(dtype check)发生在当该field被设置为is_input或者is_target时。 | |||
""" | |||
别名::class:`fastNLP.FieldArray` :class:`fastNLP.core.field.FieldArray` | |||
FieldArray 是用于保存 :class:`~fastNLP.DataSet` 中一个field的类型。 | |||
:param str name: FieldArray的名称 | |||
:param list,numpy.ndarray content: 列表的元素可以为list,int,float, | |||
:param bool is_target: 这个field是否是一个target field。 | |||
:param bool is_input: 这个field是否是一个input field。 | |||
:param padder: :class:`~fastNLP.Padder` 类型。赋值给fieldarray的padder的对象会被deepcopy一份,需要修改padder参数必须通过 | |||
fieldarray.set_pad_val()。默认为None,即使用 :class:`~fastNLP.AutoPadder` 。 | |||
:param bool ignore_type: 是否忽略该field的type,一般如果这个field不需要转为torch.FloatTensor或torch.LongTensor, | |||
就可以设置为True。具体意义请参考 :class:`~fastNLP.DataSet` 。 | |||
""" | |||
def __init__(self, name, content, is_target=None, is_input=None, padder=None, ignore_type=False): | |||
self.name = name | |||
if isinstance(content, list): | |||
# 如果DataSet使用dict初始化, content 可能是二维list/二维array/三维list | |||
@@ -132,30 +48,37 @@ class FieldArray(object): | |||
raise TypeError("content in FieldArray can only be list or numpy.ndarray, got {}.".format(type(content))) | |||
if len(content) == 0: | |||
raise RuntimeError("Cannot initialize FieldArray with empty list.") | |||
self.content = content # 1维 或 2维 或 3维 list, 形状可能不对齐 | |||
self.content_dim = None # 表示content是多少维的list | |||
if padder is None: | |||
padder = AutoPadder(pad_val=0) | |||
else: | |||
assert isinstance(padder, Padder), "padder must be of type Padder." | |||
padder = deepcopy(padder) | |||
self.set_padder(padder) | |||
self.ignore_type = ignore_type | |||
self.BASIC_TYPES = (int, float, str) # content中可接受的Python基本类型,这里没有np.array | |||
self.pytype = None | |||
self.dtype = None | |||
self._is_input = None | |||
self._is_target = None | |||
if is_input is not None or is_target is not None: | |||
self.is_input = is_input | |||
self.is_target = is_target | |||
def _set_dtype(self): | |||
self.pytype = self._type_detection(self.content) | |||
self.dtype = self._map_to_np_type(self.pytype) | |||
if self.ignore_type is False: | |||
self.pytype = self._type_detection(self.content) | |||
self.dtype = self._map_to_np_type(self.pytype) | |||
@property | |||
def is_input(self): | |||
return self._is_input | |||
@is_input.setter | |||
def is_input(self, value): | |||
""" | |||
@@ -164,33 +87,34 @@ class FieldArray(object): | |||
if value is True: | |||
self._set_dtype() | |||
self._is_input = value | |||
@property | |||
def is_target(self): | |||
return self._is_target | |||
@is_target.setter | |||
def is_target(self, value): | |||
""" | |||
当 field_array.is_target = True / False 时被调用 | |||
当 field_array.is_target = True / False 时被调用 | |||
""" | |||
if value is True: | |||
self._set_dtype() | |||
self._is_target = value | |||
def _type_detection(self, content): | |||
"""当该field被设置为is_input或者is_target时被调用 | |||
""" | |||
当该field被设置为is_input或者is_target时被调用 | |||
""" | |||
if len(content) == 0: | |||
raise RuntimeError("Empty list in Field {}.".format(self.name)) | |||
type_set = set([type(item) for item in content]) | |||
if list in type_set: | |||
if len(type_set) > 1: | |||
# list 跟 非list 混在一起 | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, type_set)) | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, list(type_set))) | |||
# >1维list | |||
inner_type_set = set() | |||
for l in content: | |||
@@ -213,7 +137,7 @@ class FieldArray(object): | |||
return self._basic_type_detection(inner_inner_type_set) | |||
else: | |||
# list 跟 非list 混在一起 | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, inner_type_set)) | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, list(inner_type_set))) | |||
else: | |||
# 一维list | |||
for content_type in type_set: | |||
@@ -222,7 +146,7 @@ class FieldArray(object): | |||
self.name, self.BASIC_TYPES, content_type)) | |||
self.content_dim = 1 | |||
return self._basic_type_detection(type_set) | |||
def _basic_type_detection(self, type_set): | |||
""" | |||
:param type_set: a set of Python types | |||
@@ -237,21 +161,21 @@ class FieldArray(object): | |||
return float | |||
else: | |||
# str 跟 int 或者 float 混在一起 | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, type_set)) | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, list(type_set))) | |||
else: | |||
# str, int, float混在一起 | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, type_set)) | |||
raise RuntimeError("Mixed data types in Field {}: {}".format(self.name, list(type_set))) | |||
def _1d_list_check(self, val): | |||
"""如果不是1D list就报错 | |||
""" | |||
type_set = set((type(obj) for obj in val)) | |||
if any(obj not in self.BASIC_TYPES for obj in type_set): | |||
raise ValueError("Mixed data types in Field {}: {}".format(self.name, type_set)) | |||
raise ValueError("Mixed data types in Field {}: {}".format(self.name, list(type_set))) | |||
self._basic_type_detection(type_set) | |||
# otherwise: _basic_type_detection will raise error | |||
return True | |||
def _2d_list_check(self, val): | |||
"""如果不是2D list 就报错 | |||
""" | |||
@@ -264,110 +188,132 @@ class FieldArray(object): | |||
inner_type_set.add(type(obj)) | |||
self._basic_type_detection(inner_type_set) | |||
return True | |||
@staticmethod | |||
def _map_to_np_type(basic_type): | |||
type_mapping = {int: np.int64, float: np.float64, str: np.str, np.ndarray: np.ndarray} | |||
return type_mapping[basic_type] | |||
def __repr__(self): | |||
return "FieldArray {}: {}".format(self.name, self.content.__repr__()) | |||
def append(self, val): | |||
"""Add a new item to the tail of FieldArray. | |||
:param val: int, float, str, or a list of one. | |||
""" | |||
if isinstance(val, list): | |||
pass | |||
elif isinstance(val, tuple): # 确保最外层是list | |||
val = list(val) | |||
elif isinstance(val, np.ndarray): | |||
val = val.tolist() | |||
elif any((isinstance(val, t) for t in self.BASIC_TYPES)): | |||
pass | |||
else: | |||
raise RuntimeError( | |||
"Unexpected data type {}. Should be list, np.array, or {}".format(type(val), self.BASIC_TYPES)) | |||
if self.is_input is True or self.is_target is True: | |||
if type(val) == list: | |||
if len(val) == 0: | |||
raise ValueError("Cannot append an empty list.") | |||
if self.content_dim == 2 and self._1d_list_check(val): | |||
# 1维list检查 | |||
pass | |||
elif self.content_dim == 3 and self._2d_list_check(val): | |||
# 2维list检查 | |||
pass | |||
else: | |||
raise RuntimeError( | |||
"Dimension not matched: expect dim={}, got {}.".format(self.content_dim - 1, val)) | |||
elif type(val) in self.BASIC_TYPES and self.content_dim == 1: | |||
# scalar检查 | |||
if type(val) == float and self.pytype == int: | |||
self.pytype = float | |||
self.dtype = self._map_to_np_type(self.pytype) | |||
"""将val append到这个field的尾部。如果这个field已经被设置为input或者target,则在append之前会检查该类型是否与已有 | |||
的内容是匹配的。 | |||
:param Any val: 需要append的值。 | |||
""" | |||
if self.ignore_type is False: | |||
if isinstance(val, list): | |||
pass | |||
elif isinstance(val, tuple): # 确保最外层是list | |||
val = list(val) | |||
elif isinstance(val, np.ndarray): | |||
val = val.tolist() | |||
elif any((isinstance(val, t) for t in self.BASIC_TYPES)): | |||
pass | |||
else: | |||
raise RuntimeError( | |||
"Unexpected data type {}. Should be list, np.array, or {}".format(type(val), self.BASIC_TYPES)) | |||
if self.is_input is True or self.is_target is True: | |||
if type(val) == list: | |||
if len(val) == 0: | |||
raise ValueError("Cannot append an empty list.") | |||
if self.content_dim == 2 and self._1d_list_check(val): | |||
# 1维list检查 | |||
pass | |||
elif self.content_dim == 3 and self._2d_list_check(val): | |||
# 2维list检查 | |||
pass | |||
else: | |||
raise RuntimeError( | |||
"Dimension not matched: expect dim={}, got {}.".format(self.content_dim - 1, val)) | |||
elif type(val) in self.BASIC_TYPES and self.content_dim == 1: | |||
# scalar检查 | |||
if type(val) == float and self.pytype == int: | |||
self.pytype = float | |||
self.dtype = self._map_to_np_type(self.pytype) | |||
else: | |||
raise RuntimeError( | |||
"Unexpected data type {}. Should be list, np.array, or {}".format(type(val), self.BASIC_TYPES)) | |||
self.content.append(val) | |||
def __getitem__(self, indices): | |||
return self.get(indices) | |||
return self.get(indices, pad=False) | |||
def __setitem__(self, idx, val): | |||
assert isinstance(idx, int) | |||
self.content[idx] = val | |||
def get(self, indices, pad=True): | |||
"""Fetch instances based on indices. | |||
""" | |||
根据给定的indices返回内容 | |||
:param indices: an int, or a list of int. | |||
:param pad: bool, 是否对返回的结果进行padding。 | |||
:return: | |||
:param int,List[int] indices: 获取indices对应的内容。 | |||
:param bool pad: 是否对返回的结果进行padding。仅对indices为List[int]时有效 | |||
:return: 根据给定的indices返回的内容,可能是单个值或List | |||
""" | |||
if isinstance(indices, int): | |||
return self.content[indices] | |||
if self.is_input is False and self.is_target is False: | |||
raise RuntimeError("Please specify either is_input or is_target is True for {}".format(self.name)) | |||
contents = [self.content[i] for i in indices] | |||
if self.padder is None or pad is False: | |||
return np.array(contents) | |||
else: | |||
return self.padder(contents, field_name=self.name, field_ele_dtype=self.dtype) | |||
def set_padder(self, padder): | |||
""" | |||
设置padding方式 | |||
设置padder,在这个field进行pad的时候用这个padder进行pad,如果为None则不进行pad。 | |||
:param padder: PadderBase类型或None. 设置为None即删除padder. | |||
:return: | |||
:param padder: :class:`~fastNLP.Padder` 类型,设置为None即删除padder。 | |||
""" | |||
if padder is not None: | |||
assert isinstance(padder, PadderBase), "padder must be of type PadderBase." | |||
self.padder = padder | |||
assert isinstance(padder, Padder), "padder must be of type Padder." | |||
self.padder = deepcopy(padder) | |||
else: | |||
self.padder = None | |||
def set_pad_val(self, pad_val): | |||
""" | |||
修改padder的pad_val. | |||
:param pad_val: int。 | |||
:return: | |||
:param int pad_val: 该field的pad值设置为该值。 | |||
""" | |||
if self.padder is not None: | |||
self.padder.set_pad_val(pad_val) | |||
return self | |||
def __len__(self): | |||
"""Returns the size of FieldArray. | |||
""" | |||
Returns the size of FieldArray. | |||
:return int length: | |||
""" | |||
return len(self.content) | |||
def to(self, other): | |||
""" | |||
将other的属性复制给本FieldArray(other必须为FieldArray类型). | |||
属性包括 is_input, is_target, padder, ignore_type | |||
:param other: :class:`~fastNLP.FieldArray` 从哪个field拷贝属性 | |||
:return: :class:`~fastNLP.FieldArray` | |||
""" | |||
assert isinstance(other, FieldArray), "Only support FieldArray type, not {}.".format(type(other)) | |||
self.is_input = other.is_input | |||
self.is_target = other.is_target | |||
self.padder = other.padder | |||
self.ignore_type = other.ignore_type | |||
return self | |||
def is_iterable(content): | |||
def _is_iterable(content): | |||
try: | |||
_ = (e for e in content) | |||
except TypeError: | |||
@@ -375,24 +321,161 @@ def is_iterable(content): | |||
return True | |||
class EngChar2DPadder(PadderBase): | |||
class Padder: | |||
""" | |||
用于为英语执行character级别的2D padding操作。对应的field内容应该为[['T', 'h', 'i', 's'], ['a'], ['d', 'e', 'm', 'o']](这里为 | |||
了更直观,把它们写为str,但实际使用时它们应该是character的index)。 | |||
padded过后的batch内容,形状为(batch_size, max_sentence_length, max_word_length). max_sentence_length最大句子长度。 | |||
max_word_length最长的word的长度 | |||
别名::class:`fastNLP.Padder` :class:`fastNLP.core.field.Padder` | |||
所有padder都需要继承这个类,并覆盖__call__方法。 | |||
用于对batch进行padding操作。传入的element是inplace的,即直接修改element可能导致数据变化,建议inplace修改之前deepcopy一份。 | |||
.. py:function:: __call__(self, contents, field_name, field_ele_dtype): | |||
传入的是List内容。假设有以下的DataSet。 | |||
:param list(Any) contents: 传入的element是inplace的,即直接修改element可能导致数据变化,建议inplace修改之前 | |||
deepcopy一份。 | |||
:param str, field_name: field的名称。 | |||
:param np.int64,np.float64,np.str,None, field_ele_dtype: 该field的内层元素的类型。如果该field的ignore_type为True,该这个值为None。 | |||
:return: np.array([padded_element]) | |||
""" | |||
def __init__(self, pad_val=0, pad_length=0): | |||
def __init__(self, pad_val=0, **kwargs): | |||
self.pad_val = pad_val | |||
def set_pad_val(self, pad_val): | |||
self.pad_val = pad_val | |||
def __call__(self, contents, field_name, field_ele_dtype): | |||
""" | |||
传入的是List内容。假设有以下的DataSet。 | |||
:param list(Any) contents: 传入的element是inplace的,即直接修改element可能导致数据变化,建议inplace修改之前 | |||
deepcopy一份。 | |||
:param str, field_name: field的名称。 | |||
:param np.int64,np.float64,np.str,None, field_ele_dtype: 该field的内层元素的类型。如果该field的ignore_type为True,该这个值为None。 | |||
:return: np.array([padded_element]) | |||
Example:: | |||
from fastNLP import DataSet | |||
from fastNLP import Instance | |||
dataset = DataSet() | |||
dataset.append(Instance(sent='this is a demo', length=4, | |||
chars=[['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['d', 'e', 'm', 'o']])) | |||
dataset.append(Instance(sent='another one', length=2, | |||
chars=[['a', 'n', 'o', 't', 'h', 'e', 'r'], ['o', 'n', 'e']])) | |||
如果调用 | |||
batch = dataset.get([0,1], pad=True) | |||
sent这个field的padder的__call__会接收到的内容会是 | |||
[ | |||
'this is a demo', | |||
'another one' | |||
] | |||
length这个field的padder的__call__会接收到的内容会是 | |||
[4, 2] | |||
chars这个field的padder的__call__会接收到的内容会是 | |||
[ | |||
[['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['d', 'e', 'm', 'o']], | |||
[['a', 'n', 'o', 't', 'h', 'e', 'r'], ['o', 'n', 'e']] | |||
] | |||
即把每个instance中某个field的内容合成一个List传入 | |||
""" | |||
raise NotImplementedError | |||
class AutoPadder(Padder): | |||
""" | |||
别名::class:`fastNLP.AutoPadder` :class:`fastNLP.core.field.AutoPadder` | |||
根据contents的数据自动判定是否需要做padding。 | |||
1 如果元素类型(元素类型是指field中最里层元素的数据类型, 可以通过FieldArray.dtype查看,比如['This', 'is', ...]的元素类 | |||
型为np.str, [[1,2], ...]的元素类型为np.int64)的数据不为(np.int64, np.float64)则不会进行pad | |||
2 如果元素类型为(np.int64, np.float64), | |||
2.1 如果该field的内容为(np.int64, np.float64),比如为seq_len, 则不进行padding | |||
2.2 如果该field的内容为List, 那么会将Batch中的List pad为一样长。若该List下还有里层的List需要padding,请使用其它padder。 | |||
即如果Instance中field形如[1, 2, 3, ...],则可以pad;若为[[1,2], [3,4, ...]]则不能进行pad | |||
""" | |||
def __init__(self, pad_val=0): | |||
""" | |||
:param pad_val: int, padding的位置使用该index | |||
:param pad_length: int, 如果为0则取一个batch中最大的单词长度作为padding长度。如果为大于0的数,则将所有单词的长度都pad或截 | |||
取到该长度. | |||
""" | |||
super().__init__(pad_val=pad_val) | |||
def _is_two_dimension(self, contents): | |||
""" | |||
判断contents是不是只有两个维度。[[1,2], [3]]是两个维度. [[[1,2], [3, 4, 5]], [[4,5]]]有三个维度 | |||
:param contents: | |||
:return: | |||
""" | |||
value = contents[0] | |||
if isinstance(value, (np.ndarray, list)): | |||
value = value[0] | |||
if isinstance(value, (np.ndarray, list)): | |||
return False | |||
return True | |||
return False | |||
def __call__(self, contents, field_name, field_ele_dtype): | |||
if not _is_iterable(contents[0]): | |||
array = np.array([content for content in contents], dtype=field_ele_dtype) | |||
elif field_ele_dtype in (np.int64, np.float64) and self._is_two_dimension(contents): | |||
max_len = max([len(content) for content in contents]) | |||
array = np.full((len(contents), max_len), self.pad_val, dtype=field_ele_dtype) | |||
for i, content in enumerate(contents): | |||
array[i][:len(content)] = content | |||
elif field_ele_dtype is None: | |||
array = np.array(contents) # 当ignore_type=True时,直接返回contents | |||
else: # should only be str | |||
array = np.array([content for content in contents]) | |||
return array | |||
self.pad_length = pad_length | |||
class EngChar2DPadder(Padder): | |||
""" | |||
别名::class:`fastNLP.EngChar2DPadder` :class:`fastNLP.core.field.EngChar2DPadder` | |||
用于为英语执行character级别的2D padding操作。对应的field内容应该类似[['T', 'h', 'i', 's'], ['a'], ['d', 'e', 'm', 'o']], | |||
但这个Padder只能处理index为int的情况。 | |||
padded过后的batch内容,形状为(batch_size, max_sentence_length, max_word_length). max_sentence_length为这个batch中最大句 | |||
子长度;max_word_length为这个batch中最长的word的长度 | |||
Example:: | |||
from fastNLP import DataSet | |||
from fastNLP import EngChar2DPadder | |||
from fastNLP import Vocabulary | |||
dataset = DataSet({'sent': ['This is the first demo', 'This is the second demo']}) | |||
dataset.apply(lambda ins:[list(word) for word in ins['sent'].split()], new_field_name='chars') | |||
vocab = Vocabulary() | |||
vocab.from_dataset(dataset, field_name='chars') | |||
vocab.index_dataset(dataset, field_name='chars') | |||
dataset.set_input('chars') | |||
padder = EngChar2DPadder() | |||
dataset.set_padder('chars', padder) # chars这个field的设置为了EnChar2DPadder | |||
""" | |||
def __init__(self, pad_val=0, pad_length=0): | |||
""" | |||
:param pad_val: int, pad的位置使用该index | |||
:param pad_length: int, 如果为0则取一个batch中最大的单词长度作为padding长度。如果为大于0的数,则将所有单词的长度 | |||
都pad或截取到该长度. | |||
""" | |||
super().__init__(pad_val=pad_val) | |||
self.pad_length = pad_length | |||
def _exactly_three_dims(self, contents, field_name): | |||
""" | |||
检查传入的contents是否刚好是3维,如果不是3维就报错。理论上,第一个维度是batch,第二个维度是word,第三个维度是character | |||
@@ -411,10 +494,10 @@ class EngChar2DPadder(PadderBase): | |||
value = value[0] | |||
except: | |||
raise ValueError("Field:{} only has two dimensions.".format(field_name)) | |||
if is_iterable(value): | |||
if _is_iterable(value): | |||
raise ValueError("Field:{} has more than 3 dimension.".format(field_name)) | |||
def __call__(self, contents, field_name, field_ele_dtype): | |||
""" | |||
期望输入类似于 | |||
@@ -441,12 +524,12 @@ class EngChar2DPadder(PadderBase): | |||
max_sent_length = max(len(word_lst) for word_lst in contents) | |||
batch_size = len(contents) | |||
dtype = type(contents[0][0][0]) | |||
padded_array = np.full((batch_size, max_sent_length, max_char_length), fill_value=self.pad_val, | |||
dtype=dtype) | |||
dtype=dtype) | |||
for b_idx, word_lst in enumerate(contents): | |||
for c_idx, char_lst in enumerate(word_lst): | |||
chars = char_lst[:max_char_length] | |||
padded_array[b_idx, c_idx, :len(chars)] = chars | |||
return padded_array | |||
return padded_array |
@@ -1,38 +1,52 @@ | |||
class Instance(object): | |||
"""An Instance is an example of data. | |||
Example:: | |||
ins = Instance(field_1=[1, 1, 1], field_2=[2, 2, 2]) | |||
ins["field_1"] | |||
>>[1, 1, 1] | |||
ins.add_field("field_3", [3, 3, 3]) | |||
""" | |||
instance 模块实现了Instance 类在fastNLP中对应sample。一个sample可以认为是一个Instance类型的对象。 | |||
便于理解的例子可以参考文档 :doc:`fastNLP.core.dataset` 中的表格 | |||
:param fields: a dict of (str: list). | |||
""" | |||
__all__ = [ | |||
"Instance" | |||
] | |||
""" | |||
class Instance(object): | |||
""" | |||
别名::class:`fastNLP.Instance` :class:`fastNLP.core.instance.Instance` | |||
Instance是fastNLP中对应一个sample的类。每个sample在fastNLP中是一个Instance对象。 | |||
Instance一般与 :class:`~fastNLP.DataSet` 一起使用, Instance的初始化如下面的Example所示 | |||
Example:: | |||
>>>from fastNLP import Instance | |||
>>>ins = Instance(field_1=[1, 1, 1], field_2=[2, 2, 2]) | |||
>>>ins["field_1"] | |||
[1, 1, 1] | |||
>>>ins.add_field("field_3", [3, 3, 3]) | |||
>>>ins = Instance(**{'x1': 1, 'x2':np.zeros((3, 4))}) | |||
""" | |||
def __init__(self, **fields): | |||
""" | |||
:param fields: 可能是一维或者二维的 list or np.array | |||
""" | |||
self.fields = fields | |||
def add_field(self, field_name, field): | |||
"""Add a new field to the instance. | |||
""" | |||
向Instance中增加一个field | |||
:param field_name: str, the name of the field. | |||
:param str field_name: 新增field的名称 | |||
:param Any field: 新增field的内容 | |||
""" | |||
self.fields[field_name] = field | |||
def __getitem__(self, name): | |||
if name in self.fields: | |||
return self.fields[name] | |||
else: | |||
raise KeyError("{} not found".format(name)) | |||
def __setitem__(self, name, field): | |||
return self.add_field(name, field) | |||
def __repr__(self): | |||
s = '\'' | |||
return "{" + ",\n".join( | |||
@@ -1,33 +1,50 @@ | |||
""" | |||
losses 模块定义了 fastNLP 中所需的各种损失函数,一般做为 :class:`~fastNLP.Trainer` 的参数使用。 | |||
""" | |||
__all__ = [ | |||
"LossBase", | |||
"LossFunc", | |||
"LossInForward", | |||
"CrossEntropyLoss", | |||
"BCELoss", | |||
"L1Loss", | |||
"NLLLoss" | |||
] | |||
import inspect | |||
from collections import defaultdict | |||
import torch | |||
import torch.nn.functional as F | |||
from fastNLP.core.utils import CheckError | |||
from fastNLP.core.utils import CheckRes | |||
from fastNLP.core.utils import _build_args | |||
from fastNLP.core.utils import _check_arg_dict_list | |||
from fastNLP.core.utils import _check_function_or_method | |||
from fastNLP.core.utils import get_func_signature | |||
from .utils import _CheckError | |||
from .utils import _CheckRes | |||
from .utils import _build_args | |||
from .utils import _check_arg_dict_list | |||
from .utils import _check_function_or_method | |||
from .utils import _get_func_signature | |||
class LossBase(object): | |||
"""Base class for all losses. | |||
""" | |||
所有loss的基类。如果想了解其中的原理,请查看源码。 | |||
""" | |||
def __init__(self): | |||
self.param_map = {} | |||
self._checked = False | |||
def get_loss(self, *args, **kwargs): | |||
raise NotImplementedError | |||
def _init_param_map(self, key_map=None, **kwargs): | |||
"""Check the validity of key_map and other param map. Add these into self.param_map | |||
"""检查key_map和其他参数map,并将这些映射关系添加到self.param_map | |||
:param key_map: dict | |||
:param kwargs: | |||
:param dict key_map: 表示key的映射关系 | |||
:param kwargs: key word args里面的每一个的键-值对都会被构造成映射关系 | |||
:return: None | |||
""" | |||
value_counter = defaultdict(set) | |||
@@ -55,21 +72,21 @@ class LossBase(object): | |||
for value, key_set in value_counter.items(): | |||
if len(key_set) > 1: | |||
raise ValueError(f"Several parameters:{key_set} are provided with one output {value}.") | |||
# check consistence between signature and param_map | |||
func_spect = inspect.getfullargspec(self.get_loss) | |||
func_args = [arg for arg in func_spect.args if arg != 'self'] | |||
for func_param, input_param in self.param_map.items(): | |||
if func_param not in func_args: | |||
raise NameError( | |||
f"Parameter `{func_param}` is not in {get_func_signature(self.get_loss)}. Please check the " | |||
f"Parameter `{func_param}` is not in {_get_func_signature(self.get_loss)}. Please check the " | |||
f"initialization parameters, or change its signature.") | |||
# evaluate should not have varargs. | |||
# if func_spect.varargs: | |||
# raise NameError(f"Delete `*{func_spect.varargs}` in {get_func_signature(self.get_loss)}(Do not use " | |||
# f"positional argument.).") | |||
def _fast_param_map(self, pred_dict, target_dict): | |||
"""Only used as inner function. When the pred_dict, target is unequivocal. Don't need users to pass key_map. | |||
such as pred_dict has one element, target_dict has one element | |||
@@ -84,34 +101,34 @@ class LossBase(object): | |||
fast_param['target'] = list(target_dict.values())[0] | |||
return fast_param | |||
return fast_param | |||
def __call__(self, pred_dict, target_dict, check=False): | |||
""" | |||
:param pred_dict: A dict from forward function of the network. | |||
:param target_dict: A dict from DataSet.batch_y. | |||
:param check: Boolean. Force to check the mapping functions when it is running. | |||
:param dict pred_dict: 模型的forward函数返回的dict | |||
:param dict target_dict: DataSet.batch_y里的键-值对所组成的dict | |||
:param Boolean check: 每一次执行映射函数的时候是否检查映射表,默认为不检查 | |||
:return: | |||
""" | |||
fast_param = self._fast_param_map(pred_dict, target_dict) | |||
if fast_param: | |||
loss = self.get_loss(**fast_param) | |||
return loss | |||
if not self._checked: | |||
# 1. check consistence between signature and param_map | |||
func_spect = inspect.getfullargspec(self.get_loss) | |||
func_args = set([arg for arg in func_spect.args if arg != 'self']) | |||
for func_arg, input_arg in self.param_map.items(): | |||
if func_arg not in func_args: | |||
raise NameError(f"`{func_arg}` not in {get_func_signature(self.get_loss)}.") | |||
raise NameError(f"`{func_arg}` not in {_get_func_signature(self.get_loss)}.") | |||
# 2. only part of the param_map are passed, left are not | |||
for arg in func_args: | |||
if arg not in self.param_map: | |||
self.param_map[arg] = arg # This param does not need mapping. | |||
self._evaluate_args = func_args | |||
self._reverse_param_map = {input_arg: func_arg for func_arg, input_arg in self.param_map.items()} | |||
# need to wrap inputs in dict. | |||
mapped_pred_dict = {} | |||
mapped_target_dict = {} | |||
@@ -131,7 +148,7 @@ class LossBase(object): | |||
not_duplicate_flag += 1 | |||
if not_duplicate_flag == 3: | |||
duplicated.append(input_arg) | |||
# missing | |||
if not self._checked: | |||
check_res = _check_arg_dict_list(self.get_loss, [mapped_pred_dict, mapped_target_dict]) | |||
@@ -141,37 +158,50 @@ class LossBase(object): | |||
for idx, func_arg in enumerate(missing): | |||
# Don't delete `` in this information, nor add `` | |||
replaced_missing[idx] = f"{self.param_map[func_arg]}" + f"(assign to `{func_arg}` " \ | |||
f"in `{self.__class__.__name__}`)" | |||
check_res = CheckRes(missing=replaced_missing, | |||
unused=check_res.unused, | |||
duplicated=duplicated, | |||
required=check_res.required, | |||
all_needed=check_res.all_needed, | |||
varargs=check_res.varargs) | |||
f"in `{self.__class__.__name__}`)" | |||
check_res = _CheckRes(missing=replaced_missing, | |||
unused=check_res.unused, | |||
duplicated=duplicated, | |||
required=check_res.required, | |||
all_needed=check_res.all_needed, | |||
varargs=check_res.varargs) | |||
if check_res.missing or check_res.duplicated: | |||
raise CheckError(check_res=check_res, | |||
func_signature=get_func_signature(self.get_loss)) | |||
raise _CheckError(check_res=check_res, | |||
func_signature=_get_func_signature(self.get_loss)) | |||
refined_args = _build_args(self.get_loss, **mapped_pred_dict, **mapped_target_dict) | |||
loss = self.get_loss(**refined_args) | |||
self._checked = True | |||
return loss | |||
class LossFunc(LossBase): | |||
"""A wrapper of user-provided loss function. | |||
""" | |||
别名::class:`fastNLP.LossFunc` :class:`fastNLP.core.losses.LossFunc` | |||
提供给用户使用自定义损失函数的类 | |||
:param func: 用户自行定义的损失函数,应当为一个函数或者callable(func)为True的ojbect | |||
:param dict key_map: 参数映射表。键为Model/DataSet参数名,值为损失函数参数名。 | |||
fastNLP的trainer将在训练时从模型返回值或者训练数据DataSet的target=True的field中 | |||
找到相对应的参数名为value的参数,并传入func中作为参数名为key的参数 | |||
:param kwargs: 除了参数映射表以外可以用key word args的方式设置参数映射关系 | |||
Example:: | |||
>>> func = torch.nn.CrossEntropyLoss() | |||
>>> loss_func = LossFunc(func, input="pred", target="label") | |||
# 这表示构建了一个损失函数类,由func计算损失函数,其中将从模型返回值或者DataSet的target=True的field | |||
# 当中找到一个参数名为`pred`的参数传入func一个参数名为`input`的参数;找到一个参数名为`label`的参数 | |||
# 传入func作为一个名为`target`的参数 | |||
""" | |||
def __init__(self, func, key_map=None, **kwargs): | |||
""" | |||
:param func: a callable object, such as a function. | |||
:param dict key_map: | |||
:param kwargs: | |||
""" | |||
super(LossFunc, self).__init__() | |||
_check_function_or_method(func) | |||
if key_map is not None: | |||
@@ -181,78 +211,129 @@ class LossFunc(LossBase): | |||
if len(kwargs) > 0: | |||
for key, val in kwargs.items(): | |||
self.param_map.update({key: val}) | |||
self.get_loss = func | |||
class CrossEntropyLoss(LossBase): | |||
""" | |||
别名::class:`fastNLP.CrossEntropyLoss` :class:`fastNLP.core.losses.CrossEntropyLoss` | |||
交叉熵损失函数 | |||
:param pred: 参数映射表中 `pred` 的映射关系,None表示映射关系为 `pred` -> `pred` | |||
:param target: 参数映射表中 `target` 的映射关系,None表示映射关系为 `target` -> `target` | |||
:param padding_idx: padding的index,在计算loss时将忽略target中标号为padding_idx的内容 | |||
Example:: | |||
>>> loss = CrossEntropyLoss(pred='pred', target='label', padding_idx=0) | |||
""" | |||
def __init__(self, pred=None, target=None, padding_idx=-100): | |||
# TODO 需要做一些检查,F.cross_entropy在计算时,如果pred是(16, 10 ,4), target的形状按道理应该是(16, 10), 但实际却需要 | |||
# TODO (16, 4) | |||
# TODO 需要做一些检查,F.cross_entropy在计算时,如果pred是(16, 10 ,4), target的形状按道理应该是(16, 10), 但实际需要(16,4) | |||
super(CrossEntropyLoss, self).__init__() | |||
self._init_param_map(pred=pred, target=target) | |||
self.padding_idx = padding_idx | |||
def get_loss(self, pred, target): | |||
return F.cross_entropy(input=pred, target=target, | |||
ignore_index=self.padding_idx) | |||
class L1Loss(LossBase): | |||
""" | |||
别名::class:`fastNLP.L1Loss` :class:`fastNLP.core.losses.L1Loss` | |||
L1损失函数 | |||
:param pred: 参数映射表中 `pred` 的映射关系,None表示映射关系为 `pred` -> `pred` | |||
:param target: 参数映射表中 `target` 的映射关系,None表示映射关系为 `target` >`target` | |||
""" | |||
def __init__(self, pred=None, target=None): | |||
super(L1Loss, self).__init__() | |||
self._init_param_map(pred=pred, target=target) | |||
def get_loss(self, pred, target): | |||
return F.l1_loss(input=pred, target=target) | |||
class BCELoss(LossBase): | |||
""" | |||
别名::class:`fastNLP.BCELoss` :class:`fastNLP.core.losses.BCELoss` | |||
二分类交叉熵损失函数 | |||
:param pred: 参数映射表中`pred`的映射关系,None表示映射关系为`pred`->`pred` | |||
:param target: 参数映射表中`target`的映射关系,None表示映射关系为`target`->`target` | |||
""" | |||
def __init__(self, pred=None, target=None): | |||
super(BCELoss, self).__init__() | |||
self._init_param_map(pred=pred, target=target) | |||
def get_loss(self, pred, target): | |||
return F.binary_cross_entropy(input=pred, target=target) | |||
class NLLLoss(LossBase): | |||
""" | |||
别名::class:`fastNLP.NLLLoss` :class:`fastNLP.core.losses.NLLLoss` | |||
负对数似然损失函数 | |||
:param pred: 参数映射表中`pred`的映射关系,None表示映射关系为`pred`->`pred` | |||
:param target: 参数映射表中`target`的映射关系,None表示映射关系为`target`->`target` | |||
""" | |||
def __init__(self, pred=None, target=None): | |||
super(NLLLoss, self).__init__() | |||
self._init_param_map(pred=pred, target=target) | |||
def get_loss(self, pred, target): | |||
return F.nll_loss(input=pred, target=target) | |||
class LossInForward(LossBase): | |||
""" | |||
别名::class:`fastNLP.LossInForward` :class:`fastNLP.core.losses.LossInForward` | |||
从forward()函数返回结果中获取loss | |||
:param str loss_key: 在forward函数中loss的键名,默认为loss | |||
""" | |||
def __init__(self, loss_key='loss'): | |||
super().__init__() | |||
if not isinstance(loss_key, str): | |||
raise TypeError(f"Only str allowed for loss_key, got {type(loss_key)}.") | |||
self.loss_key = loss_key | |||
def get_loss(self, **kwargs): | |||
if self.loss_key not in kwargs: | |||
check_res = CheckRes(missing=[self.loss_key + f"(assign to `{self.loss_key}` " \ | |||
f"in `{self.__class__.__name__}`"], | |||
unused=[], | |||
duplicated=[], | |||
required=[], | |||
all_needed=[], | |||
varargs=[]) | |||
raise CheckError(check_res=check_res, func_signature=get_func_signature(self.get_loss)) | |||
check_res = _CheckRes( | |||
missing=[self.loss_key + f"(assign to `{self.loss_key}` in `{self.__class__.__name__}`"], | |||
unused=[], | |||
duplicated=[], | |||
required=[], | |||
all_needed=[], | |||
varargs=[]) | |||
raise _CheckError(check_res=check_res, func_signature=_get_func_signature(self.get_loss)) | |||
return kwargs[self.loss_key] | |||
def __call__(self, pred_dict, target_dict, check=False): | |||
loss = self.get_loss(**pred_dict) | |||
if not (isinstance(loss, torch.Tensor) and len(loss.size()) == 0): | |||
if not isinstance(loss, torch.Tensor): | |||
raise TypeError(f"Loss excepted to be a torch.Tensor, got {type(loss)}") | |||
raise RuntimeError(f"The size of loss excepts to be torch.Size([]), got {loss.size()}") | |||
loss = torch.sum(loss) / (loss.view(-1)).size(0) | |||
# raise RuntimeError(f"The size of loss excepts to be torch.Size([]), got {loss.size()}") | |||
return loss | |||
@@ -271,7 +352,7 @@ def squash(predict, truth, **kwargs): | |||
:param predict: Tensor, model output | |||
:param truth: Tensor, truth from dataset | |||
:param **kwargs: extra arguments | |||
:param kwargs: extra arguments | |||
:return predict , truth: predict & truth after processing | |||
""" | |||
return predict.view(-1, predict.size()[-1]), truth.view(-1, ) | |||
@@ -315,20 +396,20 @@ def mask(predict, truth, **kwargs): | |||
:param predict: Tensor, [batch_size , max_len , tag_size] | |||
:param truth: Tensor, [batch_size , max_len] | |||
:param **kwargs: extra arguments, kwargs["mask"]: ByteTensor, [batch_size , max_len], the mask Tensor. The position that is 1 will be selected. | |||
:param kwargs: extra arguments, kwargs["mask"]: ByteTensor, [batch_size , max_len], the mask Tensor. The position that is 1 will be selected. | |||
:return predict , truth: predict & truth after processing | |||
""" | |||
if kwargs.get("mask") is None: | |||
return predict, truth | |||
mask = kwargs["mask"] | |||
predict, truth = squash(predict, truth) | |||
mask = mask.view(-1, ) | |||
predict = torch.masked_select(predict.permute(1, 0), mask).view(predict.size()[-1], -1).permute(1, 0) | |||
truth = torch.masked_select(truth, mask) | |||
return predict, truth | |||
@@ -343,4 +424,3 @@ def make_mask(lens, tar_len): | |||
mask = [torch.ge(lens, i + 1) for i in range(tar_len)] | |||
mask = torch.stack(mask, 1) | |||
return mask | |||
@@ -1,57 +1,82 @@ | |||
""" | |||
optimizer 模块定义了 fastNLP 中所需的各种优化器,一般做为 :class:`~fastNLP.Trainer` 的参数使用。 | |||
""" | |||
__all__ = [ | |||
"Optimizer", | |||
"SGD", | |||
"Adam" | |||
] | |||
import torch | |||
class Optimizer(object): | |||
""" | |||
别名::class:`fastNLP.Optimizer` :class:`fastNLP.core.optimizer.Optimizer` | |||
:param model_params: a generator. E.g. ``model.parameters()`` for PyTorch models. | |||
:param kwargs: additional parameters. | |||
:param model_params: a generator. E.g. ``model.parameters()`` for PyTorch models. | |||
:param kwargs: additional parameters. | |||
""" | |||
def __init__(self, model_params, **kwargs): | |||
if model_params is not None and not hasattr(model_params, "__next__"): | |||
raise RuntimeError("model parameters should be a generator, rather than {}.".format(type(model_params))) | |||
self.model_params = model_params | |||
self.settings = kwargs | |||
def construct_from_pytorch(self, model_params): | |||
raise NotImplementedError | |||
def _get_require_grads_param(self, params): | |||
""" | |||
将params中不需要gradient的删除 | |||
:param iterable params: parameters | |||
:return: list(nn.Parameters) | |||
""" | |||
return [param for param in params if param.requires_grad] | |||
class SGD(Optimizer): | |||
""" | |||
别名::class:`fastNLP.SGD` :class:`fastNLP.core.optimizer.SGD` | |||
:param float lr: learning rate. Default: 0.01 | |||
:param float momentum: momentum. Default: 0 | |||
:param model_params: a generator. E.g. ``model.parameters()`` for PyTorch models. | |||
:param float lr: learning rate. Default: 0.01 | |||
:param float momentum: momentum. Default: 0 | |||
:param model_params: a generator. E.g. ``model.parameters()`` for PyTorch models. | |||
""" | |||
def __init__(self, lr=0.001, momentum=0, model_params=None): | |||
if not isinstance(lr, float): | |||
raise TypeError("learning rate has to be float.") | |||
super(SGD, self).__init__(model_params, lr=lr, momentum=momentum) | |||
def construct_from_pytorch(self, model_params): | |||
if self.model_params is None: | |||
# careful! generator cannot be assigned. | |||
return torch.optim.SGD(model_params, **self.settings) | |||
return torch.optim.SGD(self._get_require_grads_param(model_params), **self.settings) | |||
else: | |||
return torch.optim.SGD(self.model_params, **self.settings) | |||
return torch.optim.SGD(self._get_require_grads_param(self.model_params), **self.settings) | |||
class Adam(Optimizer): | |||
""" | |||
别名::class:`fastNLP.Adam` :class:`fastNLP.core.optimizer.Adam` | |||
:param float lr: learning rate | |||
:param float weight_decay: | |||
:param model_params: a generator. E.g. ``model.parameters()`` for PyTorch models. | |||
:param float lr: learning rate | |||
:param float weight_decay: | |||
:param model_params: a generator. E.g. ``model.parameters()`` for PyTorch models. | |||
""" | |||
def __init__(self, lr=0.001, weight_decay=0, betas=(0.9, 0.999), eps=1e-8, amsgrad=False, model_params=None): | |||
if not isinstance(lr, float): | |||
raise TypeError("learning rate has to be float.") | |||
super(Adam, self).__init__(model_params, lr=lr, betas=betas, eps=eps, amsgrad=amsgrad, | |||
weight_decay=weight_decay) | |||
def construct_from_pytorch(self, model_params): | |||
if self.model_params is None: | |||
# careful! generator cannot be assigned. | |||
return torch.optim.Adam(model_params, **self.settings) | |||
return torch.optim.Adam(self._get_require_grads_param(model_params), **self.settings) | |||
else: | |||
return torch.optim.Adam(self.model_params, **self.settings) | |||
return torch.optim.Adam(self._get_require_grads_param(self.model_params), **self.settings) |
@@ -1,15 +1,20 @@ | |||
""" | |||
..todo:: | |||
检查这个类是否需要 | |||
""" | |||
from collections import defaultdict | |||
import torch | |||
from fastNLP.core import Batch | |||
from fastNLP.core import DataSet | |||
from fastNLP.core import SequentialSampler | |||
from fastNLP.core.utils import _build_args | |||
from . import Batch | |||
from . import DataSet | |||
from . import SequentialSampler | |||
from .utils import _build_args | |||
class Predictor(object): | |||
"""An interface for predicting outputs based on trained models. | |||
""" | |||
An interface for predicting outputs based on trained models. | |||
It does not care about evaluations of the model, which is different from Tester. | |||
This is a high-level model wrapper to be called by FastNLP. | |||
@@ -1,89 +1,93 @@ | |||
""" | |||
sampler 子类实现了 fastNLP 所需的各种采样器。 | |||
""" | |||
__all__ = [ | |||
"Sampler", | |||
"BucketSampler", | |||
"SequentialSampler", | |||
"RandomSampler" | |||
] | |||
from itertools import chain | |||
import numpy as np | |||
import torch | |||
def convert_to_torch_tensor(data_list, use_cuda): | |||
"""Convert lists into (cuda) Tensors. | |||
:param data_list: 2-level lists | |||
:param use_cuda: bool, whether to use GPU or not | |||
:return data_list: PyTorch Tensor of shape [batch_size, max_seq_len] | |||
class Sampler(object): | |||
""" | |||
data_list = torch.Tensor(data_list).long() | |||
if torch.cuda.is_available() and use_cuda: | |||
data_list = data_list.cuda() | |||
return data_list | |||
别名::class:`fastNLP.Sampler` :class:`fastNLP.core.sampler.Sampler` | |||
`Sampler` 类的基类. 规定以何种顺序取出data中的元素 | |||
class BaseSampler(object): | |||
"""The base class of all samplers. | |||
Sub-classes must implement the ``__call__`` method. | |||
``__call__`` takes a DataSet object and returns a list of int - the sampling indices. | |||
子类必须实现 ``__call__`` 方法. 输入 `DataSet` 对象, 返回其中元素的下标序列 | |||
""" | |||
def __call__(self, *args, **kwargs): | |||
def __call__(self, data_set): | |||
""" | |||
:param DataSet data_set: `DataSet` 对象, 需要Sample的数据 | |||
:return result: list(int) 其中元素的下标序列, ``data_set`` 中元素会按 ``result`` 中顺序取出 | |||
""" | |||
raise NotImplementedError | |||
class SequentialSampler(BaseSampler): | |||
"""Sample data in the original order. | |||
class SequentialSampler(Sampler): | |||
""" | |||
别名::class:`fastNLP.SequentialSampler` :class:`fastNLP.core.sampler.SequentialSampler` | |||
顺序取出元素的 `Sampler` | |||
""" | |||
def __call__(self, data_set): | |||
""" | |||
:param DataSet data_set: | |||
:return result: a list of integers. | |||
""" | |||
return list(range(len(data_set))) | |||
class RandomSampler(BaseSampler): | |||
"""Sample data in random permutation order. | |||
class RandomSampler(Sampler): | |||
""" | |||
别名::class:`fastNLP.RandomSampler` :class:`fastNLP.core.sampler.RandomSampler` | |||
随机化取元素的 `Sampler` | |||
""" | |||
def __call__(self, data_set): | |||
""" | |||
:param DataSet data_set: | |||
:return result: a list of integers. | |||
""" | |||
return list(np.random.permutation(len(data_set))) | |||
class BucketSampler(BaseSampler): | |||
class BucketSampler(Sampler): | |||
""" | |||
别名::class:`fastNLP.BucketSampler` :class:`fastNLP.core.sampler.BucketSampler` | |||
:param int num_buckets: the number of buckets to use. | |||
:param int batch_size: batch size per epoch. | |||
:param str seq_lens_field_name: the field name indicating the field about sequence length. | |||
带Bucket的 `Random Sampler`. 可以随机地取出长度相似的元素 | |||
:param int num_buckets: bucket的数量 | |||
:param int batch_size: batch的大小 | |||
:param str seq_len_field_name: 对应序列长度的 `field` 的名字 | |||
""" | |||
def __init__(self, num_buckets=10, batch_size=32, seq_lens_field_name='seq_lens'): | |||
def __init__(self, num_buckets=10, batch_size=32, seq_len_field_name='seq_len'): | |||
self.num_buckets = num_buckets | |||
self.batch_size = batch_size | |||
self.seq_lens_field_name = seq_lens_field_name | |||
self.seq_len_field_name = seq_len_field_name | |||
def __call__(self, data_set): | |||
seq_lens = data_set.get_all_fields()[self.seq_lens_field_name].content | |||
seq_lens = data_set.get_all_fields()[self.seq_len_field_name].content | |||
total_sample_num = len(seq_lens) | |||
bucket_indexes = [] | |||
assert total_sample_num >= self.num_buckets, "The number of samples is smaller than the number of buckets." | |||
num_sample_per_bucket = total_sample_num // self.num_buckets | |||
for i in range(self.num_buckets): | |||
bucket_indexes.append([num_sample_per_bucket * i, num_sample_per_bucket * (i + 1)]) | |||
bucket_indexes[-1][1] = total_sample_num | |||
sorted_seq_lens = list(sorted([(idx, seq_len) for | |||
idx, seq_len in zip(range(total_sample_num), seq_lens)], | |||
key=lambda x: x[1])) | |||
batchs = [] | |||
left_init_indexes = [] | |||
for b_idx in range(self.num_buckets): | |||
start_idx = bucket_indexes[b_idx][0] | |||
@@ -98,7 +102,7 @@ class BucketSampler(BaseSampler): | |||
if (left_init_indexes) != 0: | |||
batchs.append(left_init_indexes) | |||
np.random.shuffle(batchs) | |||
return list(chain(*batchs)) | |||
@@ -136,10 +140,10 @@ def k_means_1d(x, k, max_iter=100): | |||
if len(sorted_x) < k: | |||
raise ValueError("too few buckets") | |||
gap = len(sorted_x) / k | |||
centroids = np.array([sorted_x[int(x * gap)] for x in range(k)]) | |||
assign = None | |||
for i in range(max_iter): | |||
# Cluster Assignment step | |||
assign = np.array([np.argmin([np.absolute(x_i - x) for x in centroids]) for x_i in x]) | |||
@@ -171,7 +175,7 @@ def k_means_bucketing(lengths, buckets): | |||
bucket_data = [[] for _ in buckets] | |||
num_buckets = len(buckets) | |||
_, assignments = k_means_1d(lengths, num_buckets) | |||
for idx, bucket_id in enumerate(assignments): | |||
if buckets[bucket_id] is None or lengths[idx] <= buckets[bucket_id]: | |||
bucket_data[bucket_id].append(idx) | |||
@@ -1,50 +1,109 @@ | |||
""" | |||
tester模块实现了 fastNLP 所需的Tester类,能在提供数据、模型以及metric的情况下进行性能测试。 | |||
Example:: | |||
import numpy as np | |||
import torch | |||
from torch import nn | |||
from fastNLP import Tester | |||
from fastNLP import DataSet | |||
from fastNLP import AccuracyMetric | |||
class Model(nn.Module): | |||
def __init__(self): | |||
super().__init__() | |||
self.fc = nn.Linear(1, 1) | |||
def forward(self, a): | |||
return {'pred': self.fc(a.unsqueeze(1)).squeeze(1)} | |||
model = Model() | |||
dataset = DataSet({'a': np.arange(10, dtype=float), 'b':np.arange(10, dtype=float)*2}) | |||
dataset.set_input('a') | |||
dataset.set_target('b') | |||
tester = Tester(dataset, model, metrics=AccuracyMetric()) | |||
eval_results = tester.test() | |||
这里Metric的映射规律是和 :class:`fastNLP.Trainer` 中一致的,具体使用请参考 :doc:`trainer 模块<fastNLP.core.trainer>` 的1.3部分。 | |||
Tester在验证进行之前会调用model.eval()提示当前进入了evaluation阶段,即会关闭nn.Dropout()等,在验证结束之后会调用model.train()恢复到训练状态。 | |||
""" | |||
import warnings | |||
import torch | |||
from torch import nn | |||
import torch.nn as nn | |||
from .batch import Batch | |||
from .dataset import DataSet | |||
from .metrics import _prepare_metrics | |||
from .sampler import SequentialSampler | |||
from .utils import _CheckError | |||
from .utils import _build_args | |||
from .utils import _check_loss_evaluate | |||
from .utils import _move_dict_value_to_device | |||
from .utils import _get_func_signature | |||
from .utils import _get_model_device | |||
from .utils import _move_model_to_device | |||
from fastNLP.core.batch import Batch | |||
from fastNLP.core.dataset import DataSet | |||
from fastNLP.core.metrics import _prepare_metrics | |||
from fastNLP.core.sampler import SequentialSampler | |||
from fastNLP.core.utils import CheckError | |||
from fastNLP.core.utils import _build_args | |||
from fastNLP.core.utils import _check_loss_evaluate | |||
from fastNLP.core.utils import _move_dict_value_to_device | |||
from fastNLP.core.utils import get_func_signature | |||
__all__ = [ | |||
"Tester" | |||
] | |||
class Tester(object): | |||
"""An collection of model inference and evaluation of performance, used over validation/dev set and test set. | |||
""" | |||
别名::class:`fastNLP.Tester` :class:`fastNLP.core.tester.Tester` | |||
:param DataSet data: a validation/development set | |||
:param torch.nn.modules.module model: a PyTorch model | |||
:param MetricBase metrics: a metric object or a list of metrics (List[MetricBase]) | |||
:param int batch_size: batch size for validation | |||
:param bool use_cuda: whether to use CUDA in validation. | |||
:param int verbose: the number of steps after which an information is printed. | |||
Tester是在提供数据,模型以及metric的情况下进行性能测试的类。需要传入模型,数据以及metric进行验证。 | |||
""" | |||
:param data: 需要测试的数据集, :class:`~fastNLP.DataSet` 类型 | |||
:param torch.nn.module model: 使用的模型 | |||
:param metrics: :class:`~fastNLP.core.metrics.MetricBase` 或者一个列表的 :class:`~fastNLP.core.metrics.MetricBase` | |||
:param int batch_size: evaluation时使用的batch_size有多大。 | |||
:param str,int,torch.device,list(int) device: 将模型load到哪个设备。默认为None,即Trainer不对模型 | |||
的计算位置进行管理。支持以下的输入: | |||
def __init__(self, data, model, metrics, batch_size=16, use_cuda=False, verbose=1): | |||
super(Tester, self).__init__() | |||
1. str: ['cpu', 'cuda', 'cuda:0', 'cuda:1', ...] 依次为'cpu'中, 可见的第一个GPU中, 可见的第一个GPU中, | |||
可见的第二个GPU中; | |||
2. torch.device:将模型装载到torch.device上。 | |||
3. int: 将使用device_id为该值的gpu进行训练 | |||
4. list(int):如果多于1个device,将使用torch.nn.DataParallel包裹model, 并使用传入的device。 | |||
5. None. 为None则不对模型进行任何处理,如果传入的model为torch.nn.DataParallel该值必须为None。 | |||
如果模型是通过predict()进行预测的话,那么将不能使用多卡(DataParallel)进行验证,只会使用第一张卡上的模型。 | |||
:param int verbose: 如果为0不输出任何信息; 如果为1,打印出验证结果。 | |||
""" | |||
def __init__(self, data, model, metrics, batch_size=16, device=None, verbose=1): | |||
super(Tester, self).__init__() | |||
if not isinstance(data, DataSet): | |||
raise TypeError(f"The type of data must be `fastNLP.DataSet`, got `{type(data)}`.") | |||
if not isinstance(model, nn.Module): | |||
raise TypeError(f"The type of model must be `torch.nn.Module`, got `{type(model)}`.") | |||
self.metrics = _prepare_metrics(metrics) | |||
self.data = data | |||
self.use_cuda = use_cuda | |||
self._model = _move_model_to_device(model, device=device) | |||
self.batch_size = batch_size | |||
self.verbose = verbose | |||
if torch.cuda.is_available() and self.use_cuda: | |||
self._model = model.cuda() | |||
else: | |||
self._model = model | |||
self._model_device = model.parameters().__next__().device | |||
# 如果是DataParallel将没有办法使用predict方法 | |||
if isinstance(self._model, nn.DataParallel): | |||
if hasattr(self._model.module, 'predict') and not hasattr(self._model, 'predict'): | |||
warnings.warn("Cannot use DataParallel to test your model, because your model offer predict() function," | |||
" while DataParallel has no predict() function.") | |||
self._model = self._model.module | |||
# check predict | |||
if hasattr(self._model, 'predict'): | |||
self._predict_func = self._model.predict | |||
@@ -54,14 +113,15 @@ class Tester(object): | |||
f"for evaluation, not `{type(self._predict_func)}`.") | |||
else: | |||
self._predict_func = self._model.forward | |||
def test(self): | |||
"""Start test or validation. | |||
:return eval_results: a dictionary whose keys are the class name of metrics to use, values are the evaluation results of these metrics. | |||
"""开始进行验证,并返回验证结果。 | |||
:return Dict[Dict] : dict的二层嵌套结构,dict的第一层是metric的名称; 第二层是这个metric的指标。 | |||
一个AccuracyMetric的例子为{'AccuracyMetric': {'acc': 1.0}}。 | |||
""" | |||
# turn on the testing mode; clean up the history | |||
self._model_device = _get_model_device(self._model) | |||
network = self._model | |||
self._mode(network, is_test=True) | |||
data_iterator = Batch(self.data, self.batch_size, sampler=SequentialSampler(), as_numpy=False) | |||
@@ -72,28 +132,28 @@ class Tester(object): | |||
_move_dict_value_to_device(batch_x, batch_y, device=self._model_device) | |||
pred_dict = self._data_forward(self._predict_func, batch_x) | |||
if not isinstance(pred_dict, dict): | |||
raise TypeError(f"The return value of {get_func_signature(self._predict_func)} " | |||
raise TypeError(f"The return value of {_get_func_signature(self._predict_func)} " | |||
f"must be `dict`, got {type(pred_dict)}.") | |||
for metric in self.metrics: | |||
metric(pred_dict, batch_y) | |||
for metric in self.metrics: | |||
eval_result = metric.get_metric() | |||
if not isinstance(eval_result, dict): | |||
raise TypeError(f"The return value of {get_func_signature(metric.get_metric)} must be " | |||
raise TypeError(f"The return value of {_get_func_signature(metric.get_metric)} must be " | |||
f"`dict`, got {type(eval_result)}") | |||
metric_name = metric.__class__.__name__ | |||
eval_results[metric_name] = eval_result | |||
except CheckError as e: | |||
prev_func_signature = get_func_signature(self._predict_func) | |||
except _CheckError as e: | |||
prev_func_signature = _get_func_signature(self._predict_func) | |||
_check_loss_evaluate(prev_func_signature=prev_func_signature, func_signature=e.func_signature, | |||
check_res=e.check_res, pred_dict=pred_dict, target_dict=batch_y, | |||
dataset=self.data, check_level=0) | |||
if self.verbose >= 1: | |||
print("[tester] \n{}".format(self._format_eval_results(eval_results))) | |||
self._mode(network, is_test=False) | |||
return eval_results | |||
def _mode(self, model, is_test=False): | |||
"""Train mode or Test mode. This is for PyTorch currently. | |||
@@ -105,13 +165,13 @@ class Tester(object): | |||
model.eval() | |||
else: | |||
model.train() | |||
def _data_forward(self, func, x): | |||
"""A forward pass of the model. """ | |||
x = _build_args(func, **x) | |||
y = func(**x) | |||
return y | |||
def _format_eval_results(self, results): | |||
"""Override this method to support more print formats. | |||
@@ -1,87 +1,428 @@ | |||
r""" | |||
Trainer在fastNLP中用于组织单任务的训练过程,可以避免用户在不同训练任务中重复撰以下步骤的代码 | |||
(1) epoch循环; | |||
(2) 将数据分成不同的Batch; | |||
(3) 对Batch进行pad; | |||
(4) 每个epoch结束或一定step后进行验证集验证; | |||
(5) 保存获得更好验证性能的模型。 | |||
1 Trainer的基本使用 | |||
下面的例子是使用神经网络来进行预测一个序列中是否有偶数个1。 | |||
Example:: | |||
import numpy as np | |||
from torch import nn | |||
import torch | |||
import torch.nn.functional as F | |||
from torch.optim import SGD | |||
from fastNLP import DataSet | |||
from fastNLP import Trainer | |||
from fastNLP import CrossEntropyLoss | |||
from fastNLP import AccuracyMetric | |||
from fastNLP.modules.decoder import MLP | |||
# 模型 | |||
class Model(nn.Module): | |||
def __init__(self, input_num): | |||
super().__init__() | |||
self.fcs = MLP([input_num, 40, 40, 2], 'relu') | |||
def forward(self, x): | |||
x = self.fcs(x) | |||
return {'pred': x} | |||
model = Model(10) | |||
# 生成数据 | |||
def generate_psedo_dataset(num_samples): | |||
dataset = DataSet() | |||
data = np.random.randint(2, size=(num_samples, 10)) | |||
label = np.sum(data, axis=1)%2 | |||
dataset = DataSet({'x':data.astype(float), 'label': label}) | |||
dataset.set_input('x') | |||
dataset.set_target('label') | |||
return dataset | |||
tr_dataset = generate_psedo_dataset(1000) | |||
dev_data = generate_psedo_dataset(100) | |||
# 训练 | |||
trainer = Trainer(tr_dataset, model, loss=CrossEntropyLoss(target='label'), | |||
optimizer=SGD(model.parameters(), lr=0.1),n_epochs=1000, | |||
dev_data = dev_data, metrics=AccuracyMetric(target='label')) | |||
trainer.train() | |||
由上面的例子可以看出通过使用Trainer,可以使得训练部分的代码大幅减少。 | |||
使用Trainer需要满足以下几个条件: | |||
1.1 模型 | |||
1 模型的forward()的参数名需要与DataSet中的名字对应。实际上fastNLP在将DataSet中的数据传递给模型forward()时,是 | |||
通过匹配名称实现的。所以上例中,如果Model的forward函数修改为forward(self, data), 则DataSet中的'x'这个field就应该 | |||
改名为'data'。 | |||
2 传递给forward()的参数是DataSet中被设置为input的那些field。但如果forward()中没有对应的参数,则不会将数据传递 | |||
给forward()。例如,DataSet中'x1', 'x2'都是input,但是模型的函数为forward(self, x1), 那么'x2'不会传递给forward()。 | |||
3 模型的forward()返回值需要为一个dict。 | |||
1.2 Loss | |||
fastNLP中的为了不限制forward函数的返回内容数量(比如一些复杂任务需要返回多个内容,如Dependency Parsing, | |||
:mod:`Loss<fastNLP.core.losses>` 与 :mod:`Metric<fastNLP.core.metrics>` 都使用了通过名称来匹配相应内容的策略。如上面的例子中 | |||
Example:: | |||
trainer = Trainer(tr_dataset, model, loss=CrossEntropyLoss(target='label'), | |||
optimizer=SGD(model.parameters(), lr=0.1),n_epochs=1000, | |||
dev_data = dev_data, metrics=AccuracyMetric(target='label')) | |||
loss被设置为了 :class:`~fastNLP.CrossEntropyLoss` , 但在初始化的时候传入了target='label'这个参数, | |||
:class:`~fastNLP.CrossEntropyLoss` 的初始化参数为(pred=None, target=None, padding_idx=-100)。 | |||
这里的两个参数分别为计算CrossEntropy时需要使用到的模型的预测值与真实值。 | |||
其中 `pred` 一般来自于模型forward()的返回结果,`target` 一般是来自于DataSet中被设置为target的field。 | |||
由于每个人对真实值或者model的返回值取名并不一样,所以fastNLP的 :mod:`Loss<fastNLP.core.losses>` 提供一种类似于映射的机制来匹配对应的值, | |||
比如这里 :class:`~fastNLP.CrossEntropyLoss` 将尝试找到名为'label'的内容来作为真实值得到loss; | |||
而pred=None, 则 :class:`~fastNLP.CrossEntropyLoss` 使用'pred'作为名称匹配预测值, | |||
正好forward的返回值也叫pred,所以这里不需要申明pred。 | |||
尽管fastNLP使用了映射机制来使得loss的计算变得比较灵活,但有些情况下loss必须在模型中进行计算,比如使用了CRF的模型。 | |||
fastNLP中提供了 :class:`~fastNLP.LossInForward` 这个loss。 | |||
这个loss的原理是直接在forward()的返回结果中找到loss_key(默认寻找'loss')指定的那个tensor,并使用它作为loss。 | |||
如果Trainer初始化没有提供loss则默认使用 :class:`~fastNLP.LossInForward` 。 | |||
.. todo:: | |||
补充一个例子 详细例子可以参照 | |||
1.3 Metric | |||
:mod:`Metric<fastNLP.core.metrics>` 使用了与上述Loss一样的策略,即使用名称进行匹配。 | |||
AccuracyMetric(target='label')的情况与CrossEntropyLoss 是同理的。 | |||
在进行验证时,可能用到的计算与forward()中不太一致,没有办法直接从forward()的结果中得到预测值,这时模型可以提供一个predict()方法, | |||
如果提供的模型具有predict方法,则在模型验证时将调用predict()方法获取预测结果, | |||
传入到predict()的参数也是从DataSet中被设置为input的field中选择出来的; | |||
与forward()一样,返回值需要为一个dict。 | |||
.. todo:: | |||
补充一个例子 具体例子可以参考 | |||
2 Trainer的代码检查 | |||
由于在fastNLP中采取了映射的机制,所以难免可能存在对应出错的情况。Trainer提供一种映射检查机制,可以通过check_code_level来进行控制 | |||
比如下面的例子中,由于各种原因产生的报错 | |||
Example2.1 | |||
:: | |||
import numpy as np | |||
from torch import nn | |||
import torch | |||
from torch.optim import SGD | |||
from fastNLP import Trainer | |||
from fastNLP import DataSet | |||
class Model(nn.Module): | |||
def __init__(self): | |||
super().__init__() | |||
self.fc = nn.Linear(1, 1) | |||
def forward(self, x, b): | |||
loss = torch.mean((self.fc(x)-b)**2) | |||
return {'loss': loss} | |||
model = Model() | |||
dataset = DataSet({'a': np.arange(10), 'b':np.arange(10)*2}) | |||
dataset.set_input('a', 'b') | |||
trainer = Trainer(dataset, model, loss=None, optimizer=SGD(model.parameters(), lr=0.001)) | |||
trainer = Trainer(dataset, model, SGD(model.parameters())) | |||
# 会报以下的错误 | |||
# input fields after batch(if batch size is 2): | |||
# a: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2]) | |||
# b: (1)type:torch.Tensor (2)dtype:torch.int64, (3)shape:torch.Size([2]) | |||
# There is no target field. | |||
# .... | |||
# NameError: | |||
# Problems occurred when calling Model.forward(self, x, b) | |||
# missing param: ['x'] | |||
# unused field: ['a'] | |||
# Suggestion: You need to provide ['x'] in DataSet and set it as input. | |||
这里就是由于在Trainer初始化的时候,fastNLP会尝试使用一个batch_size=2的batch去运行一遍forward()以及backward()。这里有两类 | |||
信息可以为你提供参考 | |||
1 'input fields after batch...'这部分显示的是train dataset经过Batch操作后,每个field对应的类型以及进行shape。这里 | |||
因为train dataset没有target所以没有显示。根据这里可以看出是否正确将需要的内容设置为了input或target。 | |||
2 NameError,NameError发生在映射出错的情况。这里报错的原因是由于尝试进行forward计算时(可以通过Model.forward(self, x, b)判断 | |||
出当前是在调取forward),却没有获取到forward()函数中需要的'x';在报错信息中同时指出了缺'x',而'a'没有被使用,那么可能 | |||
就是由于field的名称不对。这里将dataset中'a'这个field的名称改为'x',或者model的参数从'x'修改为'a'都可以解决问题。 | |||
下面的例子是由于loss计算的时候找不到需要的值 | |||
Example2.2 | |||
:: | |||
import numpy as np | |||
from torch import nn | |||
from torch.optim import SGD | |||
from fastNLP import Trainer | |||
from fastNLP import DataSet | |||
from fastNLP import L1Loss | |||
import torch | |||
class Model(nn.Module): | |||
def __init__(self): | |||
super().__init__() | |||
self.fc = nn.Linear(1, 1) | |||
def forward(self, a): | |||
return {'pred_b': self.fc(a.unsqueeze(1)).squeeze(1), 'No use':1} | |||
model = Model() | |||
dataset = DataSet({'a': np.arange(10, dtype=float), 'b':np.arange(10, dtype=float)*2}) | |||
dataset.set_input('a') | |||
dataset.set_target('b') | |||
trainer = Trainer(dataset, model, loss=L1Loss(target='label'), optimizer=SGD(model.parameters(), lr=0.001)) | |||
# 报错信息如下 | |||
# input fields after batch(if batch size is 2): | |||
# a: (1)type:torch.Tensor (2)dtype:torch.float32, (3)shape:torch.Size([2]) | |||
# target fields after batch(if batch size is 2): | |||
# b: (1)type:torch.Tensor (2)dtype:torch.float32, (3)shape:torch.Size([2]) | |||
# .... | |||
# NameError: | |||
# Problems occurred when calling L1Loss.get_loss(self, pred, target) | |||
# missing param: ['pred(assign to `pred` in `L1Loss`)', 'label(assign to `target` in `L1Loss`)'] | |||
# unused field: ['b'] | |||
# unused param: ['pred_b', 'No use'] | |||
# target field: ['b'] | |||
# param from Model.forward(self, a): ['pred_b', 'No use'] | |||
# Suggestion: (1). Check key assignment for `target` when initialize L1Loss. Or provide `label` in DataSet or output of Model.forward(self, a). | |||
# (2). Check key assignment for `pred` when initialize L1Loss. Or provide `pred` in DataSet or output of Model.forward(self, a). | |||
报错信息也包含两部分: | |||
1 第一部分与上面是一样的 | |||
2 这里报错的原因是由于计算loss的时候找不到相应的值(通过L1Loss.get_loss(self, pred, target)判断出来的); | |||
报错的原因是因为 `pred` 和 `label` (我们在初始化L1Loss时将target指定为了label)都没有找到。 | |||
这里'unused field'是DataSet中出现了,但却没有被设置为input或者target的field; | |||
'unused param'是forward()中返回且没有被使用到的内容;'target field'是被设置为了target的field; | |||
'param from Model.forward(self, a)'是forward()返回的所有key。"Suggestion"是关于当前错误处理的建议。 | |||
但是在一些情况下,比如forward()返回值只有一个,target也只有一个,fastNLP不会进行匹配,而直接将forward()的结果作为pred, | |||
将DataSet中的target设置为target。上面的例子在返回值中加入了一个'No use'则只是为了使得Loss去匹配结果。 | |||
下面是带有dev dataset时如果出现错误会发生的报错, | |||
Example2.3 | |||
:: | |||
import numpy as np | |||
from torch import nn | |||
from torch.optim import SGD | |||
from fastNLP import Trainer | |||
from fastNLP import DataSet | |||
from fastNLP import AccuracyMetric | |||
import torch | |||
class Model(nn.Module): | |||
def __init__(self): | |||
super().__init__() | |||
self.fc = nn.Linear(1, 1) | |||
def forward(self, a, b): | |||
loss = torch.mean((self.fc(a.float().unsqueeze(1))-b.float())**2) | |||
return {'loss': loss} | |||
def predict(self, a): # 使用predict()进行验证 | |||
return {'output':self.fc(a.float().unsqueeze(1))} #这里return的值不包含'pred'这个key | |||
model = Model() | |||
dataset = DataSet({'a': np.arange(10), 'b':np.arange(10)*2}) | |||
dev_data = DataSet({'a': np.arange(10, 20), 'b':np.arange(10, 20)*2}) | |||
dataset.set_input('a', 'b') | |||
dev_data.set_input('a') # 这里没有设置target | |||
trainer = Trainer(dataset, model, loss=None, optimizer=SGD(model.parameters(), lr=0.001), | |||
dev_data=dev_data, metrics=AccuracyMetric()) | |||
# 报错信息 | |||
# ... | |||
# NameError: | |||
# Problems occurred when calling AccuracyMetric.evaluate(self, pred, target, seq_len=None) | |||
# missing param: ['pred(assign to `pred` in `AccuracyMetric`)', 'target(assign to `target` in `AccuracyMetric`)'] | |||
# unused param: ['output'] | |||
# target field: [] | |||
# param from Model.predict(self, a): ['output'] | |||
# Suggestion: (1). Check key assignment for `pred` when initialize AccuracyMetric. Or provide `pred` in DataSet or output of Model.predict(self, a). | |||
# (2). Check key assignment for `target` when initialize AccuracyMetric. Or provide `target` in DataSet or output of Model.predict(self, a). | |||
报错信息和前面都是类似的,但是可以通过'AccuracyMetric.evaluate(self, pred, target, seq_len=None)'看出这里是evaluation | |||
的时候发生了错误。这样避免了需要在完成一整个epoch的训练才能发现evaluation弄错的情况。这里的修改是通过在初始化metric的时候 | |||
指明通过'output'获取`pred`, 即AccuracyMetric(pred='output')。 | |||
可以通过check_code_level调节检查的强度。默认为0,即进行检查。 | |||
3 Trainer与callback | |||
虽然Trainer本身已经集成了一些功能,但仍然不足以囊括训练过程中可能需要到的功能,比如负采样,learning rate decay, Early Stop等。 | |||
为了解决这个问题fastNLP引入了callback的机制,:class:`~fastNLP.Callback` 是一种在Trainer训练过程中特定阶段会运行的函数集合, | |||
所有的 :class:`~fastNLP.Callback` 都具有on_*(比如on_train_start, on_backward_begin)等函数。 | |||
如果 Callback 实现了该函数,则Trainer运行至对应阶段,会进行调用,例如:: | |||
from fastNLP import Callback, EarlyStopCallback, Trainer, CrossEntropyLoss, AccuracyMetric | |||
from fastNLP.models import CNNText | |||
start_time = time.time() | |||
class MyCallback(Callback): | |||
def on_epoch_end(self): | |||
print('{:d}ms\n\n'.format(round((time.time()-start_time)*1000))) | |||
model = CNNText((len(vocab),50), num_classes=5, padding=2, dropout=0.1) | |||
trainer = Trainer(model=model, train_data=train_data, dev_data=dev_data, loss=CrossEntropyLoss(), | |||
metrics=AccuracyMetric(), callbacks=[MyCallback(),EarlyStopCallback(10)]) | |||
trainer.train() | |||
这里,我们通过继承 :class:`~fastNLP.Callback` 类定义了自己的 callback 的,并和内置的 :class:`~fastNLP.EarlyStopCallback` | |||
一起传给了 :class:`~fastNLP.Trainer` ,增强了 :class:`~fastNLP.Trainer` 的功能 | |||
fastNLP已经自带了很多callback函数供使用,可以参考 :doc:`fastNLP.core.callback` 。 | |||
""" | |||
__all__ = [ | |||
"Trainer" | |||
] | |||
import os | |||
import time | |||
from datetime import datetime | |||
from datetime import timedelta | |||
from datetime import datetime, timedelta | |||
import numpy as np | |||
import torch | |||
from torch import nn | |||
import torch.nn as nn | |||
try: | |||
from tqdm.autonotebook import tqdm | |||
from tqdm.auto import tqdm | |||
except: | |||
from fastNLP.core.utils import pseudo_tqdm as tqdm | |||
from fastNLP.core.batch import Batch | |||
from fastNLP.core.callback import CallbackManager, CallbackException | |||
from fastNLP.core.dataset import DataSet | |||
from fastNLP.core.losses import _prepare_losser | |||
from fastNLP.core.metrics import _prepare_metrics | |||
from fastNLP.core.optimizer import Adam | |||
from fastNLP.core.sampler import BaseSampler | |||
from fastNLP.core.sampler import RandomSampler | |||
from fastNLP.core.sampler import SequentialSampler | |||
from fastNLP.core.tester import Tester | |||
from fastNLP.core.utils import CheckError | |||
from fastNLP.core.utils import _build_args | |||
from fastNLP.core.utils import _check_forward_error | |||
from fastNLP.core.utils import _check_loss_evaluate | |||
from fastNLP.core.utils import _move_dict_value_to_device | |||
from fastNLP.core.utils import get_func_signature | |||
from .utils import _pseudo_tqdm as tqdm | |||
from .batch import Batch | |||
from .callback import CallbackManager, CallbackException | |||
from .dataset import DataSet | |||
from .losses import _prepare_losser | |||
from .metrics import _prepare_metrics | |||
from .optimizer import Optimizer | |||
from .sampler import Sampler | |||
from .sampler import RandomSampler | |||
from .sampler import SequentialSampler | |||
from .tester import Tester | |||
from .utils import _CheckError | |||
from .utils import _build_args | |||
from .utils import _check_forward_error | |||
from .utils import _check_loss_evaluate | |||
from .utils import _move_dict_value_to_device | |||
from .utils import _get_func_signature | |||
from .utils import _get_model_device | |||
from .utils import _move_model_to_device | |||
class Trainer(object): | |||
def __init__(self, train_data, model, loss=None, metrics=None, n_epochs=3, batch_size=32, print_every=50, | |||
validate_every=-1, dev_data=None, save_path=None, optimizer=Adam(lr=0.01, weight_decay=0), | |||
check_code_level=0, metric_key=None, sampler=RandomSampler(), prefetch=False, use_tqdm=True, | |||
use_cuda=False, callbacks=None): | |||
""" | |||
:param DataSet train_data: the training data | |||
:param torch.nn.modules.module model: a PyTorch model | |||
:param LossBase loss: a loss object | |||
:param MetricBase metrics: a metric object or a list of metrics (List[MetricBase]) | |||
:param int n_epochs: the number of training epochs | |||
:param int batch_size: batch size for training and validation | |||
:param int print_every: step interval to print next training information. Default: -1(no print). | |||
:param int validate_every: step interval to do next validation. Default: -1(validate every epoch). | |||
:param DataSet dev_data: the validation data | |||
:param str save_path: file path to save models | |||
:param Optimizer optimizer: an optimizer object | |||
:param int check_code_level: level of FastNLP code checker. -1: don't check, 0: ignore. 1: warning. 2: strict.\\ | |||
`ignore` will not check unused field; `warning` when warn if some field are not used; `strict` means | |||
it will raise error if some field are not used. 检查的原理是通过使用很小的batch(默认两个sample)来检查代码是 | |||
否能够运行,但是这个过程理论上不会修改任何参数,只是会检查能否运行。但如果(1)模型中存在将batch_size写为某个 | |||
固定值的情况;(2)模型中存在累加前向计算次数的,可能会多计算几次。以上情况建议将check_code_level设置为-1 | |||
:param str metric_key: a single indicator used to decide the best model based on metric results. It must be one | |||
of the keys returned by the FIRST metric in `metrics`. If the overall result gets better if the indicator gets | |||
smaller, add "-" in front of the string. For example:: | |||
metric_key="-PPL" # language model gets better as perplexity gets smaller | |||
:param BaseSampler sampler: method used to generate batch data. | |||
:param prefetch: bool, 是否使用额外的进程对产生batch数据。 | |||
:param bool use_tqdm: whether to use tqdm to show train progress. | |||
:param callbacks: List[Callback]. 用于在train过程中起调节作用的回调函数。比如early stop,negative sampling等可以 | |||
通过callback机制实现。 | |||
""" | |||
""" | |||
别名::class:`fastNLP.Trainer` :class:`fastNLP.core.trainer.Trainer` | |||
Trainer在fastNLP中用于组织单任务的训练过程,可以避免用户在不同训练任务中重复撰写 | |||
(1) epoch循环; | |||
(2) 将数据分成不同的Batch; | |||
(3) 对Batch进行pad; | |||
(4) 每个epoch结束或一定step后进行验证集验证; | |||
(5) 保存获得更好验证性能的模型等。 | |||
详细的介绍参见 :doc:`fastNLP.core.trainer` | |||
:param train_data: 训练集, :class:`~fastNLP.DataSet` 类型。 | |||
:param nn.modules model: 待训练的模型 | |||
:param optimizer: `torch.optim.Optimizer` 优化器。如果为None,则Trainer使用默认的Adam(model.parameters(), lr=4e-3)这个优化器 | |||
:param int batch_size: 训练和验证的时候的batch大小。 | |||
:param loss: 使用的 :class:`~fastNLP.core.losses.LossBase` 对象。当为None时,默认使用 :class:`~fastNLP.LossInForward` | |||
:param sampler: Batch数据生成的顺序, :class:`~fastNLP.Sampler` 类型。如果为None,默认使用 :class:`~fastNLP.RandomSampler` | |||
:param update_every: int, 多少步更新一次梯度。用于希望累计梯度的场景,比如需要128的batch_size, 但是直接设为128 | |||
会导致内存不足,通过设置batch_size=32, update_every=4达到目的。当optimizer为None时,该参数无效。 | |||
:param int n_epochs: 需要优化迭代多少次。 | |||
:param int print_every: 多少次反向传播更新tqdm显示的loss; 如果use_tqdm=False, 则多少次反向传播打印loss。 | |||
:param dev_data: 用于做验证的DataSet, :class:`~fastNLP.DataSet` 类型。 | |||
:param metrics: 验证的评估函数。可以只使用一个 :class:`Metric<fastNLP.core.metrics.MetricBase>` , | |||
也可以使用多个 :class:`Metric<fastNLP.core.metrics.MetricBase>` ,通过列表传入。 | |||
如验证时取得了更好的验证结果(如果有多个Metric,以列表中第一个Metric为准),且save_path不为None, | |||
则保存当前模型。Metric种类详见 :doc:`metrics模块 <fastNLP.core.metrics>` 。仅在传入dev_data时有效。 | |||
:param str,None metric_key: :class:`Metric<fastNLP.core.metrics.MetricBase>` 有时会有多个指标, | |||
比如 :class:`~fastNLP.core.metrics.SpanFPreRecMetric` 中包含了'f', 'pre', 'rec'。此时需 | |||
要指定以哪个指标为准。另外有些指标是越小效果越好,比如语言模型的困惑度,这种情况下,在key前面增加一个'-'来表 | |||
明验证时,值越小越好(比如: "-ppl")。仅在传入dev_data时有效。 | |||
:param int validate_every: 多少个step在验证集上验证一次; 如果为-1,则每个epoch结束验证一次。仅在传入dev_data时有效。 | |||
:param str,None save_path: 将模型保存路径。如果为None,则不保存模型。如果dev_data为None,则保存最后一次迭代的模型。 | |||
保存的时候不仅保存了参数,还保存了模型结构。即便使用DataParallel,这里也只保存模型。 | |||
:param prefetch: bool, 是否使用额外的进程对产生batch数据。理论上会使得Batch迭代更快。 | |||
:param bool use_tqdm: 是否使用tqdm来显示训练进度; 如果为False,则将loss打印在终端中。 | |||
:param str,int,torch.device,list(int) device: 将模型load到哪个设备。默认为None,即Trainer不对模型 | |||
的计算位置进行管理。支持以下的输入: | |||
1. str: ['cpu', 'cuda', 'cuda:0', 'cuda:1', ...] 依次为'cpu'中, 可见的第一个GPU中, 可见的第一个GPU中, | |||
可见的第二个GPU中; | |||
2. torch.device:将模型装载到torch.device上。 | |||
3. int: 将使用device_id为该值的gpu进行训练 | |||
4. list(int):如果多于1个device,将使用torch.nn.DataParallel包裹model, 并使用传入的device。 | |||
5. None. 为None则不对模型进行任何处理,如果传入的model为torch.nn.DataParallel该值必须为None。 | |||
已知可能会出现的问题:Adagrad优化器可能无法正常使用这个参数,请手动管理模型位置。 | |||
:param list(callbacks) callbacks: 用于在train过程中起调节作用的回调函数。比如early stop,negative sampling等可以 | |||
通过callback机制实现。 可使用的callback参见 :doc:`callback模块 <fastNLP.core.callback>` | |||
:param int check_code_level: 模型检查等级. -1: 不进行检查; 0: 仅出现错误时停止; 1: 如果有field没有被使用, | |||
报告警告信息; 2: 有任何field没有被使用都报错. 检查的原理是通过使用很小的batch(默认2个sample)来运行代码,但是 | |||
这个过程理论上不会修改任何参数,只是会检查能否运行。但如果(1)模型中存在将batch_size写为某个固定值的情况; | |||
(2)模型中存在累加前向计算次数的,可能会多计算1次。以上情况建议将check_code_level设置为-1。 | |||
""" | |||
def __init__(self, train_data, model, optimizer=None, loss=None, | |||
batch_size=32, sampler=None, update_every=1, | |||
n_epochs=10, print_every=5, | |||
dev_data=None, metrics=None, metric_key=None, | |||
validate_every=-1, save_path=None, | |||
prefetch=False, use_tqdm=True, device=None, | |||
callbacks=None, | |||
check_code_level=0): | |||
super(Trainer, self).__init__() | |||
if not isinstance(train_data, DataSet): | |||
raise TypeError(f"The type of train_data must be fastNLP.DataSet, got {type(train_data)}.") | |||
if not isinstance(model, nn.Module): | |||
raise TypeError(f"The type of model must be torch.nn.Module, got {type(model)}.") | |||
# check metrics and dev_data | |||
if (not metrics) and dev_data is not None: | |||
raise ValueError("No metric for dev_data evaluation.") | |||
if metrics and (dev_data is None): | |||
raise ValueError("No dev_data for evaluations, pass dev_data or set metrics to None. ") | |||
# check update every | |||
assert update_every >= 1, "update_every must be no less than 1." | |||
self.update_every = int(update_every) | |||
# check save_path | |||
if not (save_path is None or isinstance(save_path, str)): | |||
raise ValueError("save_path can only be None or `str`.") | |||
# prepare evaluate | |||
metrics = _prepare_metrics(metrics) | |||
# parse metric_key | |||
# increase_better is True. It means the exp result gets better if the indicator increases. | |||
# It is true by default. | |||
@@ -91,19 +432,20 @@ class Trainer(object): | |||
self.metric_key = metric_key[1:] if metric_key[0] == "+" or metric_key[0] == "-" else metric_key | |||
elif len(metrics) > 0: | |||
self.metric_key = metrics[0].__class__.__name__.lower().strip('metric') | |||
# prepare loss | |||
losser = _prepare_losser(loss) | |||
# sampler check | |||
if not isinstance(sampler, BaseSampler): | |||
if sampler is not None and not isinstance(sampler, Sampler): | |||
raise ValueError("The type of sampler should be fastNLP.BaseSampler, got {}.".format(type(sampler))) | |||
if check_code_level > -1: | |||
_check_code(dataset=train_data, model=model, losser=losser, metrics=metrics, dev_data=dev_data, | |||
metric_key=metric_key, check_level=check_code_level, | |||
batch_size=min(batch_size, DEFAULT_CHECK_BATCH_SIZE)) | |||
# _check_code 是 fastNLP 帮助你检查代码是否正确的方法 。如果你在错误栈中看到这行注释,请认真检查你的代码 | |||
self.train_data = train_data | |||
self.dev_data = dev_data # If None, No validation. | |||
self.model = model | |||
@@ -111,73 +453,61 @@ class Trainer(object): | |||
self.metrics = metrics | |||
self.n_epochs = int(n_epochs) | |||
self.batch_size = int(batch_size) | |||
self.use_cuda = bool(use_cuda) | |||
self.save_path = save_path | |||
self.print_every = int(print_every) | |||
self.validate_every = int(validate_every) if validate_every!=0 else -1 | |||
self.validate_every = int(validate_every) if validate_every != 0 else -1 | |||
self.best_metric_indicator = None | |||
self.best_dev_epoch = None | |||
self.best_dev_step = None | |||
self.best_dev_perf = None | |||
self.sampler = sampler | |||
self.sampler = sampler if sampler is not None else RandomSampler() | |||
self.prefetch = prefetch | |||
self.callback_manager = CallbackManager(env={"trainer": self}, callbacks=callbacks) | |||
self.n_steps = (len(self.train_data) // self.batch_size + int( | |||
len(self.train_data) % self.batch_size != 0)) * self.n_epochs | |||
self.model = _move_model_to_device(self.model, device=device) | |||
if isinstance(optimizer, torch.optim.Optimizer): | |||
self.optimizer = optimizer | |||
elif isinstance(optimizer, Optimizer): | |||
self.optimizer = optimizer.construct_from_pytorch(model.parameters()) | |||
elif optimizer is None: | |||
self.optimizer = torch.optim.Adam(model.parameters(), lr=4e-3) | |||
else: | |||
self.optimizer = optimizer.construct_from_pytorch(self.model.parameters()) | |||
raise TypeError("optimizer can only be torch.optim.Optimizer type, not {}.".format(type(optimizer))) | |||
self.use_tqdm = use_tqdm | |||
self.pbar = None | |||
self.print_every = abs(self.print_every) | |||
if self.dev_data is not None: | |||
self.tester = Tester(model=self.model, | |||
data=self.dev_data, | |||
metrics=self.metrics, | |||
batch_size=self.batch_size, | |||
use_cuda=self.use_cuda, | |||
device=None, # 由上面的部分处理device | |||
verbose=0) | |||
self.step = 0 | |||
self.start_time = None # start timestamp | |||
self.callback_manager = CallbackManager(env={"trainer": self}, | |||
callbacks=callbacks) | |||
def train(self, load_best_model=True): | |||
""" | |||
开始训练过程。主要有以下几个步骤:: | |||
for epoch in range(num_epochs): | |||
# 使用Batch从DataSet中按批取出数据,并自动对DataSet中dtype为(float, int)的fields进行padding。并转换为Tensor。 | |||
非float,int类型的参数将不会被转换为Tensor,且不进行padding。 | |||
for batch_x, batch_y in Batch(DataSet) | |||
# batch_x是一个dict, 被设为input的field会出现在这个dict中, | |||
key为DataSet中的field_name, value为该field的value | |||
# batch_y也是一个dict,被设为target的field会出现在这个dict中, | |||
key为DataSet中的field_name, value为该field的value | |||
2. 将batch_x的数据送入到model.forward函数中,并获取结果。这里我们就是通过匹配batch_x中的key与forward函数的形 | |||
参完成参数传递。例如, | |||
forward(self, x, seq_lens) # fastNLP会在batch_x中找到key为"x"的value传递给x,key为"seq_lens"的 | |||
value传递给seq_lens。若在batch_x中没有找到所有必须要传递的参数,就会报错。如果forward存在默认参数 | |||
而且默认参数这个key没有在batch_x中,则使用默认参数。 | |||
3. 将batch_y与model.forward的结果一并送入loss中计算loss。loss计算时一般都涉及到pred与target。但是在不同情况 | |||
中,可能pred称为output或prediction, target称为y或label。fastNLP通过初始化loss时传入的映射找到pred或 | |||
target。比如在初始化Trainer时初始化loss为CrossEntropyLoss(pred='output', target='y'), 那么fastNLP计 | |||
算loss时,就会使用"output"在batch_y与forward的结果中找到pred;使用"y"在batch_y与forward的结果中找target | |||
, 并完成loss的计算。 | |||
4. 获取到loss之后,进行反向求导并更新梯度 | |||
根据需要适时进行验证机测试 | |||
根据metrics进行evaluation,并根据是否提供了save_path判断是否存储模型 | |||
使用该函数使Trainer开始训练。 | |||
:param bool load_best_model: 该参数只有在初始化提供了dev_data的情况下有效,如果True, trainer将在返回之前重新加载dev表现 | |||
最好的模型参数。 | |||
:return results: 返回一个字典类型的数据, 内含以下内容:: | |||
最好的模型参数。 | |||
:return dict: 返回一个字典类型的数据, | |||
内含以下内容:: | |||
seconds: float, 表示训练时长 | |||
以下三个内容只有在提供了dev_data的情况下会有。 | |||
best_eval: Dict of Dict, 表示evaluation的结果 | |||
best_epoch: int,在第几个epoch取得的最佳值 | |||
best_step: int, 在第几个step(batch)更新取得的最佳值 | |||
seconds: float, 表示训练时长 | |||
以下三个内容只有在提供了dev_data的情况下会有。 | |||
best_eval: Dict of Dict, 表示evaluation的结果。第一层的key为Metric的名称,第二层的key为具体的Metric | |||
best_epoch: int,在第几个epoch取得的最佳值 | |||
best_step: int, 在第几个step(batch)更新取得的最佳值 | |||
""" | |||
results = {} | |||
@@ -186,25 +516,24 @@ class Trainer(object): | |||
results['seconds'] = 0. | |||
return results | |||
try: | |||
if torch.cuda.is_available() and self.use_cuda: | |||
self.model = self.model.cuda() | |||
self._model_device = self.model.parameters().__next__().device | |||
self._model_device = _get_model_device(self.model) | |||
self._mode(self.model, is_test=False) | |||
self._load_best_model = load_best_model | |||
self.start_time = str(datetime.now().strftime('%Y-%m-%d-%H-%M-%S')) | |||
start_time = time.time() | |||
print("training epochs started " + self.start_time, flush=True) | |||
try: | |||
self.callback_manager.on_train_begin() | |||
self._train() | |||
self.callback_manager.on_train_end(self.model) | |||
self.callback_manager.on_train_end() | |||
except (CallbackException, KeyboardInterrupt) as e: | |||
self.callback_manager.on_exception(e, self.model) | |||
if self.dev_data is not None: | |||
print("\nIn Epoch:{}/Step:{}, got best dev performance:".format(self.best_dev_epoch, self.best_dev_step) + | |||
self.tester._format_eval_results(self.best_dev_perf),) | |||
self.callback_manager.on_exception(e) | |||
if self.dev_data is not None and hasattr(self, 'best_dev_perf'): | |||
print( | |||
"\nIn Epoch:{}/Step:{}, got best dev performance:".format(self.best_dev_epoch, self.best_dev_step) + | |||
self.tester._format_eval_results(self.best_dev_perf), ) | |||
results['best_eval'] = self.best_dev_perf | |||
results['best_epoch'] = self.best_dev_epoch | |||
results['best_step'] = self.best_dev_step | |||
@@ -218,49 +547,55 @@ class Trainer(object): | |||
finally: | |||
pass | |||
results['seconds'] = round(time.time() - start_time, 2) | |||
return results | |||
def _train(self): | |||
if not self.use_tqdm: | |||
from fastNLP.core.utils import pseudo_tqdm as inner_tqdm | |||
from fastNLP.core.utils import _pseudo_tqdm as inner_tqdm | |||
else: | |||
inner_tqdm = tqdm | |||
self.step = 0 | |||
self.epoch = 0 | |||
start = time.time() | |||
total_steps = (len(self.train_data) // self.batch_size + int( | |||
len(self.train_data) % self.batch_size != 0)) * self.n_epochs | |||
with inner_tqdm(total=total_steps, postfix='loss:{0:<6.5f}', leave=False, dynamic_ncols=True) as pbar: | |||
with inner_tqdm(total=self.n_steps, postfix='loss:{0:<6.5f}', leave=False, dynamic_ncols=True) as pbar: | |||
self.pbar = pbar | |||
avg_loss = 0 | |||
data_iterator = Batch(self.train_data, batch_size=self.batch_size, sampler=self.sampler, as_numpy=False, | |||
prefetch=self.prefetch) | |||
for epoch in range(1, self.n_epochs+1): | |||
self.batch_per_epoch = data_iterator.num_batches | |||
for epoch in range(1, self.n_epochs + 1): | |||
self.epoch = epoch | |||
pbar.set_description_str(desc="Epoch {}/{}".format(epoch, self.n_epochs)) | |||
# early stopping | |||
self.callback_manager.on_epoch_begin(epoch, self.n_epochs) | |||
self.callback_manager.on_epoch_begin() | |||
for batch_x, batch_y in data_iterator: | |||
self.step += 1 | |||
_move_dict_value_to_device(batch_x, batch_y, device=self._model_device) | |||
indices = data_iterator.get_batch_indices() | |||
# negative sampling; replace unknown; re-weight batch_y | |||
self.callback_manager.on_batch_begin(batch_x, batch_y, indices) | |||
prediction = self._data_forward(self.model, batch_x) | |||
# edit prediction | |||
self.callback_manager.on_loss_begin(batch_y, prediction) | |||
loss = self._compute_loss(prediction, batch_y) | |||
loss = self._compute_loss(prediction, batch_y).mean() | |||
avg_loss += loss.item() | |||
loss = loss / self.update_every | |||
# Is loss NaN or inf? requires_grad = False | |||
self.callback_manager.on_backward_begin(loss, self.model) | |||
self.callback_manager.on_backward_begin(loss) | |||
self._grad_backward(loss) | |||
self.callback_manager.on_backward_end(self.model) | |||
self.callback_manager.on_backward_end() | |||
self._update() | |||
self.callback_manager.on_step_end(self.optimizer) | |||
if (self.step+1) % self.print_every == 0: | |||
self.callback_manager.on_step_end() | |||
if self.step % self.print_every == 0: | |||
avg_loss = float(avg_loss) / self.print_every | |||
if self.use_tqdm: | |||
print_output = "loss:{0:<6.5f}".format(avg_loss / self.print_every) | |||
print_output = "loss:{0:<6.5f}".format(avg_loss) | |||
pbar.update(self.print_every) | |||
else: | |||
end = time.time() | |||
@@ -269,43 +604,45 @@ class Trainer(object): | |||
epoch, self.step, avg_loss, diff) | |||
pbar.set_postfix_str(print_output) | |||
avg_loss = 0 | |||
self.step += 1 | |||
self.callback_manager.on_batch_end() | |||
if ((self.validate_every > 0 and self.step % self.validate_every == 0) or | |||
(self.validate_every < 0 and self.step % len(data_iterator) == 0)) \ | |||
and self.dev_data is not None: | |||
eval_res = self._do_validation(epoch=epoch, step=self.step) | |||
eval_str = "Evaluation at Epoch {}/{}. Step:{}/{}. ".format(epoch, self.n_epochs, self.step, | |||
total_steps) + \ | |||
self.n_steps) + \ | |||
self.tester._format_eval_results(eval_res) | |||
pbar.write(eval_str) | |||
pbar.write(eval_str + '\n') | |||
# ================= mini-batch end ==================== # | |||
# lr decay; early stopping | |||
self.callback_manager.on_epoch_end(epoch, self.n_epochs, self.optimizer) | |||
self.callback_manager.on_epoch_end() | |||
# =============== epochs end =================== # | |||
pbar.close() | |||
self.pbar = None | |||
# ============ tqdm end ============== # | |||
def _do_validation(self, epoch, step): | |||
self.callback_manager.on_valid_begin() | |||
res = self.tester.test() | |||
is_better_eval = False | |||
if self._better_eval_result(res): | |||
if self.save_path is not None: | |||
self._save_model(self.model, | |||
"best_" + "_".join([self.model.__class__.__name__, self.metric_key, self.start_time])) | |||
else: | |||
"best_" + "_".join([self.model.__class__.__name__, self.metric_key, self.start_time])) | |||
elif self._load_best_model: | |||
self._best_model_states = {name: param.cpu().clone() for name, param in self.model.named_parameters()} | |||
self.best_dev_perf = res | |||
self.best_dev_epoch = epoch | |||
self.best_dev_step = step | |||
is_better_eval = True | |||
# get validation results; adjust optimizer | |||
self.callback_manager.on_valid_end(res, self.metric_key, self.optimizer) | |||
self.callback_manager.on_valid_end(res, self.metric_key, self.optimizer, is_better_eval) | |||
return res | |||
def _mode(self, model, is_test=False): | |||
"""Train mode or Test mode. This is for PyTorch currently. | |||
@@ -317,20 +654,22 @@ class Trainer(object): | |||
model.eval() | |||
else: | |||
model.train() | |||
def _update(self): | |||
"""Perform weight update on a model. | |||
""" | |||
self.optimizer.step() | |||
if self.optimizer is not None and (self.step + 1) % self.update_every == 0: | |||
self.optimizer.step() | |||
def _data_forward(self, network, x): | |||
x = _build_args(network.forward, **x) | |||
y = network(**x) | |||
if not isinstance(y, dict): | |||
raise TypeError(f"The return value of {get_func_signature(network.forward)} should be dict, got {type(y)}.") | |||
raise TypeError( | |||
f"The return value of {_get_func_signature(network.forward)} should be dict, got {type(y)}.") | |||
return y | |||
def _grad_backward(self, loss): | |||
"""Compute gradient with link rules. | |||
@@ -338,9 +677,10 @@ class Trainer(object): | |||
For PyTorch, just do "loss.backward()" | |||
""" | |||
self.model.zero_grad() | |||
if self.step % self.update_every == 0: | |||
self.model.zero_grad() | |||
loss.backward() | |||
def _compute_loss(self, predict, truth): | |||
"""Compute loss given prediction and ground truth. | |||
@@ -349,7 +689,7 @@ class Trainer(object): | |||
:return: a scalar | |||
""" | |||
return self.losser(predict, truth) | |||
def _save_model(self, model, model_name, only_param=False): | |||
""" 存储不含有显卡信息的state_dict或model | |||
:param model: | |||
@@ -359,6 +699,10 @@ class Trainer(object): | |||
""" | |||
if self.save_path is not None: | |||
model_path = os.path.join(self.save_path, model_name) | |||
if not os.path.exists(self.save_path): | |||
os.makedirs(self.save_path, exist_ok=True) | |||
if isinstance(model, nn.DataParallel): | |||
model = model.module | |||
if only_param: | |||
state_dict = model.state_dict() | |||
for key in state_dict: | |||
@@ -367,8 +711,8 @@ class Trainer(object): | |||
else: | |||
model.cpu() | |||
torch.save(model, model_path) | |||
model.cuda() | |||
model.to(self._model_device) | |||
def _load_model(self, model, model_name, only_param=False): | |||
# 返回bool值指示是否成功reload模型 | |||
if self.save_path is not None: | |||
@@ -377,13 +721,16 @@ class Trainer(object): | |||
states = torch.load(model_path) | |||
else: | |||
states = torch.load(model_path).state_dict() | |||
model.load_state_dict(states) | |||
if isinstance(model, nn.DataParallel): | |||
model.module.load_state_dict(states) | |||
else: | |||
model.load_state_dict(states) | |||
elif hasattr(self, "_best_model_states"): | |||
model.load_state_dict(self._best_model_states) | |||
else: | |||
return False | |||
return True | |||
def _better_eval_result(self, metrics): | |||
"""Check if the current epoch yields better validation results. | |||
@@ -411,6 +758,7 @@ class Trainer(object): | |||
DEFAULT_CHECK_BATCH_SIZE = 2 | |||
DEFAULT_CHECK_NUM_BATCH = 2 | |||
def _get_value_info(_dict): | |||
# given a dict value, return information about this dict's value. Return list of str | |||
strs = [] | |||
@@ -427,27 +775,28 @@ def _get_value_info(_dict): | |||
strs.append(_str) | |||
return strs | |||
def _check_code(dataset, model, losser, metrics, batch_size=DEFAULT_CHECK_BATCH_SIZE, | |||
dev_data=None, metric_key=None, | |||
check_level=0): | |||
# check get_loss 方法 | |||
model_devcie = model.parameters().__next__().device | |||
batch = Batch(dataset=dataset, batch_size=batch_size, sampler=SequentialSampler()) | |||
for batch_count, (batch_x, batch_y) in enumerate(batch): | |||
_move_dict_value_to_device(batch_x, batch_y, device=model_devcie) | |||
# forward check | |||
if batch_count==0: | |||
if batch_count == 0: | |||
info_str = "" | |||
input_fields = _get_value_info(batch_x) | |||
target_fields = _get_value_info(batch_y) | |||
if len(input_fields)>0: | |||
if len(input_fields) > 0: | |||
info_str += "input fields after batch(if batch size is {}):\n".format(batch_size) | |||
info_str += "\n".join(input_fields) | |||
info_str += '\n' | |||
else: | |||
raise RuntimeError("There is no input field.") | |||
if len(target_fields)>0: | |||
if len(target_fields) > 0: | |||
info_str += "target fields after batch(if batch size is {}):\n".format(batch_size) | |||
info_str += "\n".join(target_fields) | |||
info_str += '\n' | |||
@@ -455,14 +804,14 @@ def _check_code(dataset, model, losser, metrics, batch_size=DEFAULT_CHECK_BATCH_ | |||
info_str += 'There is no target field.' | |||
print(info_str) | |||
_check_forward_error(forward_func=model.forward, dataset=dataset, | |||
batch_x=batch_x, check_level=check_level) | |||
batch_x=batch_x, check_level=check_level) | |||
refined_batch_x = _build_args(model.forward, **batch_x) | |||
pred_dict = model(**refined_batch_x) | |||
func_signature = get_func_signature(model.forward) | |||
func_signature = _get_func_signature(model.forward) | |||
if not isinstance(pred_dict, dict): | |||
raise TypeError(f"The return value of {func_signature} should be `dict`, not `{type(pred_dict)}`.") | |||
# loss check | |||
try: | |||
loss = losser(pred_dict, batch_y) | |||
@@ -470,23 +819,23 @@ def _check_code(dataset, model, losser, metrics, batch_size=DEFAULT_CHECK_BATCH_ | |||
if batch_count == 0: | |||
if not isinstance(loss, torch.Tensor): | |||
raise TypeError( | |||
f"The return value of {get_func_signature(losser.get_loss)} should be `torch.Tensor`, " | |||
f"The return value of {_get_func_signature(losser.get_loss)} should be `torch.Tensor`, " | |||
f"but got `{type(loss)}`.") | |||
if len(loss.size()) != 0: | |||
raise ValueError( | |||
f"The size of return value of {get_func_signature(losser.get_loss)} is {loss.size()}, " | |||
f"The size of return value of {_get_func_signature(losser.get_loss)} is {loss.size()}, " | |||
f"should be torch.size([])") | |||
loss.backward() | |||
except CheckError as e: | |||
# TODO: another error raised if CheckError caught | |||
pre_func_signature = get_func_signature(model.forward) | |||
except _CheckError as e: | |||
# TODO: another error raised if _CheckError caught | |||
pre_func_signature = _get_func_signature(model.forward) | |||
_check_loss_evaluate(prev_func_signature=pre_func_signature, func_signature=e.func_signature, | |||
check_res=e.check_res, pred_dict=pred_dict, target_dict=batch_y, | |||
dataset=dataset, check_level=check_level) | |||
model.zero_grad() | |||
if batch_count + 1 >= DEFAULT_CHECK_NUM_BATCH: | |||
break | |||
if dev_data is not None: | |||
tester = Tester(data=dev_data[:batch_size * DEFAULT_CHECK_NUM_BATCH], model=model, metrics=metrics, | |||
batch_size=batch_size, verbose=-1) | |||
@@ -500,7 +849,7 @@ def _check_eval_results(metrics, metric_key, metric_list): | |||
# metric_list: 多个用来做评价的指标,来自Trainer的初始化 | |||
if isinstance(metrics, tuple): | |||
loss, metrics = metrics | |||
if isinstance(metrics, dict): | |||
if len(metrics) == 1: | |||
# only single metric, just use it | |||
@@ -511,7 +860,7 @@ def _check_eval_results(metrics, metric_key, metric_list): | |||
if metrics_name not in metrics: | |||
raise RuntimeError(f"{metrics_name} is chosen to do validation, but got {metrics}") | |||
metric_dict = metrics[metrics_name] | |||
if len(metric_dict) == 1: | |||
indicator_val, indicator = list(metric_dict.values())[0], list(metric_dict.keys())[0] | |||
elif len(metric_dict) > 1 and metric_key is None: | |||
@@ -1,59 +1,274 @@ | |||
""" | |||
utils模块实现了 fastNLP 内部和外部所需的很多工具。其中用户可以使用的是 :func:`cache_results` 修饰器。 | |||
""" | |||
__all__ = [ | |||
"cache_results", | |||
"seq_len_to_mask" | |||
] | |||
import _pickle | |||
import inspect | |||
import os | |||
import warnings | |||
from collections import Counter | |||
from collections import namedtuple | |||
from collections import Counter, namedtuple | |||
import numpy as np | |||
import torch | |||
import torch.nn as nn | |||
CheckRes = namedtuple('CheckRes', ['missing', 'unused', 'duplicated', 'required', 'all_needed', | |||
'varargs']) | |||
_CheckRes = namedtuple('_CheckRes', ['missing', 'unused', 'duplicated', 'required', 'all_needed', | |||
'varargs']) | |||
def save_pickle(obj, pickle_path, file_name): | |||
"""Save an object into a pickle file. | |||
def _prepare_cache_filepath(filepath): | |||
""" | |||
检查filepath是否可以作为合理的cache文件. 如果可以的话,会自动创造路径 | |||
:param filepath: str. | |||
:return: None, if not, this function will raise error | |||
""" | |||
_cache_filepath = os.path.abspath(filepath) | |||
if os.path.isdir(_cache_filepath): | |||
raise RuntimeError("The cache_file_path must be a file, not a directory.") | |||
cache_dir = os.path.dirname(_cache_filepath) | |||
if not os.path.exists(cache_dir): | |||
os.makedirs(cache_dir) | |||
:param obj: an object | |||
:param pickle_path: str, the directory where the pickle file is to be saved | |||
:param file_name: str, the name of the pickle file. In general, it should be ended by "pkl". | |||
# TODO 可以保存下缓存时的参数,如果load的时候发现参数不一致,发出警告。 | |||
def cache_results(_cache_fp, _refresh=False, _verbose=1): | |||
""" | |||
别名::class:`fastNLP.cache_results` :class:`fastNLP.core.uitls.cache_results` | |||
cache_results是fastNLP中用于cache数据的装饰器。通过下面的例子看一下如何使用:: | |||
import time | |||
import numpy as np | |||
from fastNLP import cache_results | |||
@cache_results('cache.pkl') | |||
def process_data(): | |||
# 一些比较耗时的工作,比如读取数据,预处理数据等,这里用time.sleep()代替耗时 | |||
time.sleep(1) | |||
return np.random.randint(10, size=(5,)) | |||
start_time = time.time() | |||
print("res =",process_data()) | |||
print(time.time() - start_time) | |||
start_time = time.time() | |||
print("res =",process_data()) | |||
print(time.time() - start_time) | |||
# 输出内容如下,可以看到两次结果相同,且第二次几乎没有花费时间 | |||
# Save cache to cache.pkl. | |||
# res = [5 4 9 1 8] | |||
# 1.0042750835418701 | |||
# Read cache from cache.pkl. | |||
# res = [5 4 9 1 8] | |||
# 0.0040721893310546875 | |||
可以看到第二次运行的时候,只用了0.0001s左右,是由于第二次运行将直接从cache.pkl这个文件读取数据,而不会经过再次预处理 | |||
Example:: | |||
# 还是以上面的例子为例,如果需要重新生成另一个cache,比如另一个数据集的内容,通过如下的方式调用即可 | |||
process_data(_cache_fp='cache2.pkl') # 完全不影响之前的‘cache.pkl' | |||
上面的_cache_fp是cache_results会识别的参数,它将从'cache2.pkl'这里缓存/读取数据,即这里的'cache2.pkl'覆盖默认的 | |||
'cache.pkl'。如果在你的函数前面加上了@cache_results()则你的函数会增加三个参数[_cache_fp, _refresh, _verbose]。 | |||
上面的例子即为使用_cache_fp的情况,这三个参数不会传入到你的函数中,当然你写的函数参数名也不可能包含这三个名称。 | |||
Example:: | |||
process_data(_cache_fp='cache2.pkl', _refresh=True) # 这里强制重新生成一份对预处理的cache。 | |||
# _verbose是用于控制输出信息的,如果为0,则不输出任何内容;如果为1,则会提醒当前步骤是读取的cache还是生成了新的cache | |||
:param str _cache_fp: 将返回结果缓存到什么位置;或从什么位置读取缓存。如果为None,cache_results没有任何效用,除非在 | |||
函数调用的时候传入_cache_fp这个参数。 | |||
:param bool _refresh: 是否重新生成cache。 | |||
:param int _verbose: 是否打印cache的信息。 | |||
:return: | |||
""" | |||
if not os.path.exists(pickle_path): | |||
os.mkdir(pickle_path) | |||
print("make dir {} before saving pickle file".format(pickle_path)) | |||
with open(os.path.join(pickle_path, file_name), "wb") as f: | |||
_pickle.dump(obj, f) | |||
print("{} saved in {}".format(file_name, pickle_path)) | |||
def wrapper_(func): | |||
signature = inspect.signature(func) | |||
for key, _ in signature.parameters.items(): | |||
if key in ('_cache_fp', '_refresh', '_verbose'): | |||
raise RuntimeError("The function decorated by cache_results cannot have keyword `{}`.".format(key)) | |||
def wrapper(*args, **kwargs): | |||
if '_cache_fp' in kwargs: | |||
cache_filepath = kwargs.pop('_cache_fp') | |||
assert isinstance(cache_filepath, str), "_cache_fp can only be str." | |||
else: | |||
cache_filepath = _cache_fp | |||
if '_refresh' in kwargs: | |||
refresh = kwargs.pop('_refresh') | |||
assert isinstance(refresh, bool), "_refresh can only be bool." | |||
else: | |||
refresh = _refresh | |||
if '_verbose' in kwargs: | |||
verbose = kwargs.pop('_verbose') | |||
assert isinstance(verbose, int), "_verbose can only be integer." | |||
else: | |||
verbose = _verbose | |||
refresh_flag = True | |||
if cache_filepath is not None and refresh is False: | |||
# load data | |||
if os.path.exists(cache_filepath): | |||
with open(cache_filepath, 'rb') as f: | |||
results = _pickle.load(f) | |||
if verbose == 1: | |||
print("Read cache from {}.".format(cache_filepath)) | |||
refresh_flag = False | |||
if refresh_flag: | |||
results = func(*args, **kwargs) | |||
if cache_filepath is not None: | |||
if results is None: | |||
raise RuntimeError("The return value is None. Delete the decorator.") | |||
_prepare_cache_filepath(cache_filepath) | |||
with open(cache_filepath, 'wb') as f: | |||
_pickle.dump(results, f) | |||
print("Save cache to {}.".format(cache_filepath)) | |||
return results | |||
return wrapper | |||
return wrapper_ | |||
# def save_pickle(obj, pickle_path, file_name): | |||
# """Save an object into a pickle file. | |||
# | |||
# :param obj: an object | |||
# :param pickle_path: str, the directory where the pickle file is to be saved | |||
# :param file_name: str, the name of the pickle file. In general, it should be ended by "pkl". | |||
# """ | |||
# if not os.path.exists(pickle_path): | |||
# os.mkdir(pickle_path) | |||
# print("make dir {} before saving pickle file".format(pickle_path)) | |||
# with open(os.path.join(pickle_path, file_name), "wb") as f: | |||
# _pickle.dump(obj, f) | |||
# print("{} saved in {}".format(file_name, pickle_path)) | |||
# | |||
# | |||
# def load_pickle(pickle_path, file_name): | |||
# """Load an object from a given pickle file. | |||
# | |||
# :param pickle_path: str, the directory where the pickle file is. | |||
# :param file_name: str, the name of the pickle file. | |||
# :return obj: an object stored in the pickle | |||
# """ | |||
# with open(os.path.join(pickle_path, file_name), "rb") as f: | |||
# obj = _pickle.load(f) | |||
# print("{} loaded from {}".format(file_name, pickle_path)) | |||
# return obj | |||
# | |||
# | |||
# def pickle_exist(pickle_path, pickle_name): | |||
# """Check if a given pickle file exists in the directory. | |||
# | |||
# :param pickle_path: the directory of target pickle file | |||
# :param pickle_name: the filename of target pickle file | |||
# :return: True if file exists else False | |||
# """ | |||
# if not os.path.exists(pickle_path): | |||
# os.makedirs(pickle_path) | |||
# file_name = os.path.join(pickle_path, pickle_name) | |||
# if os.path.exists(file_name): | |||
# return True | |||
# else: | |||
# return False | |||
def _move_model_to_device(model, device): | |||
""" | |||
将model移动到device | |||
:param model: torch.nn.DataParallel or torch.nn.Module. 当为torch.nn.DataParallel, 则只是调用一次cuda。device必须为 | |||
None。 | |||
:param str,int,torch.device,list(int),list(torch.device) device: 将模型load到哪个设备。默认为None,即Trainer不对模型 | |||
的计算位置进行管理。支持以下的输入: | |||
1. str: ['cpu', 'cuda', 'cuda:0', 'cuda:1', ...] 依次为'cpu'中, 可见的第一个GPU中, 可见的第一个GPU中, | |||
可见的第二个GPU中; | |||
2. torch.device:将模型装载到torch.device上。 | |||
3. int: 将使用device_id为该值的gpu进行训练 | |||
def load_pickle(pickle_path, file_name): | |||
"""Load an object from a given pickle file. | |||
4. list(int):如果多于1个device,将使用torch.nn.DataParallel包裹model, 并使用传入的device。 | |||
:param pickle_path: str, the directory where the pickle file is. | |||
:param file_name: str, the name of the pickle file. | |||
:return obj: an object stored in the pickle | |||
5. None. 为None则不对模型进行任何处理,如果传入的model为torch.nn.DataParallel该值必须为None。 | |||
:return: torch.nn.DataParallel or torch.nn.Module | |||
""" | |||
with open(os.path.join(pickle_path, file_name), "rb") as f: | |||
obj = _pickle.load(f) | |||
print("{} loaded from {}".format(file_name, pickle_path)) | |||
return obj | |||
if isinstance(model, torch.nn.parallel.DistributedDataParallel): | |||
raise RuntimeError("model of `torch.nn.parallel.DistributedDataParallel` is not supported right now.") | |||
if device is None: | |||
if isinstance(model, torch.nn.DataParallel): | |||
model.cuda() | |||
return model | |||
else: | |||
if not torch.cuda.is_available() and ( | |||
device != 'cpu' or (isinstance(device, torch.device) and device.type != 'cpu')): | |||
raise ValueError("There is no usable gpu. set `device` as `cpu` or `None`.") | |||
if isinstance(model, torch.nn.DataParallel): | |||
raise RuntimeError("When model is `torch.nn.DataParallel`, the device has to be `None`.") | |||
if isinstance(device, int): | |||
assert device > -1, "device can only be non-negative integer" | |||
assert torch.cuda.device_count() > device, "Only has {} gpus, cannot use device {}.".format( | |||
torch.cuda.device_count(), | |||
device) | |||
device = torch.device('cuda:{}'.format(device)) | |||
elif isinstance(device, str): | |||
device = torch.device(device) | |||
if device.type == 'cuda' and device.index is not None: | |||
assert device.index < torch.cuda.device_count(), "Only has {} gpus, cannot use device cuda:{}.".format( | |||
torch.cuda.device_count(), | |||
device) | |||
elif isinstance(device, torch.device): | |||
if device.type == 'cuda' and device.index is not None: | |||
assert device.index < torch.cuda.device_count(), "Only has {} gpus, cannot use device cuda:{}.".format( | |||
torch.cuda.device_count(), | |||
device) | |||
elif isinstance(device, list): | |||
types = set([type(d) for d in device]) | |||
assert len(types) == 1, "Mixed type in device, only `int` allowed." | |||
assert list(types)[0] == int, "Only int supported for multiple devices." | |||
assert len(set(device)) == len(device), "Duplicated device id found in device." | |||
for d in device: | |||
assert d > -1, "Only non-negative device id allowed." | |||
if len(device) > 1: | |||
output_device = device[0] | |||
model = nn.DataParallel(model, device_ids=device, output_device=output_device) | |||
device = torch.device(device[0]) | |||
else: | |||
raise TypeError("Unsupported device type.") | |||
model = model.to(device) | |||
return model | |||
def pickle_exist(pickle_path, pickle_name): | |||
"""Check if a given pickle file exists in the directory. | |||
def _get_model_device(model): | |||
""" | |||
传入一个nn.Module的模型,获取它所在的device | |||
:param pickle_path: the directory of target pickle file | |||
:param pickle_name: the filename of target pickle file | |||
:return: True if file exists else False | |||
:param model: nn.Module | |||
:return: torch.device,None 如果返回值为None,说明这个模型没有任何参数。 | |||
""" | |||
if not os.path.exists(pickle_path): | |||
os.makedirs(pickle_path) | |||
file_name = os.path.join(pickle_path, pickle_name) | |||
if os.path.exists(file_name): | |||
return True | |||
assert isinstance(model, nn.Module) | |||
parameters = list(model.parameters()) | |||
if len(parameters) == 0: | |||
return None | |||
else: | |||
return False | |||
return parameters[0].device | |||
def _build_args(func, **kwargs): | |||
@@ -126,30 +341,35 @@ def _check_arg_dict_list(func, args): | |||
missing = list(require_args - input_args) | |||
unused = list(input_args - all_args) | |||
varargs = [] if not spect.varargs else [spect.varargs] | |||
return CheckRes(missing=missing, | |||
unused=unused, | |||
duplicated=duplicated, | |||
required=list(require_args), | |||
all_needed=list(all_args), | |||
varargs=varargs) | |||
return _CheckRes(missing=missing, | |||
unused=unused, | |||
duplicated=duplicated, | |||
required=list(require_args), | |||
all_needed=list(all_args), | |||
varargs=varargs) | |||
def get_func_signature(func): | |||
def _get_func_signature(func): | |||
""" | |||
Given a function or method, return its signature. | |||
For example: | |||
(1) function | |||
1 function:: | |||
def func(a, b='a', *args): | |||
xxxx | |||
get_func_signature(func) # 'func(a, b='a', *args)' | |||
(2) method | |||
2 method:: | |||
class Demo: | |||
def __init__(self): | |||
xxx | |||
def forward(self, a, b='a', **args) | |||
demo = Demo() | |||
get_func_signature(demo.forward) # 'Demo.forward(self, a, b='a', **args)' | |||
:param func: a function or a method | |||
:return: str or None | |||
""" | |||
@@ -195,9 +415,12 @@ def _move_dict_value_to_device(*args, device: torch.device, non_blocking=False): | |||
:param args: | |||
:return: | |||
""" | |||
if not torch.cuda.is_available(): | |||
return | |||
if not isinstance(device, torch.device): | |||
raise TypeError(f"device must be `torch.device`, got `{type(device)}`") | |||
for arg in args: | |||
if isinstance(arg, dict): | |||
for key, value in arg.items(): | |||
@@ -207,15 +430,15 @@ def _move_dict_value_to_device(*args, device: torch.device, non_blocking=False): | |||
raise TypeError("Only support `dict` type right now.") | |||
class CheckError(Exception): | |||
class _CheckError(Exception): | |||
""" | |||
CheckError. Used in losses.LossBase, metrics.MetricBase. | |||
_CheckError. Used in losses.LossBase, metrics.MetricBase. | |||
""" | |||
def __init__(self, check_res: CheckRes, func_signature: str): | |||
def __init__(self, check_res: _CheckRes, func_signature: str): | |||
errs = [f'Problems occurred when calling `{func_signature}`'] | |||
if check_res.varargs: | |||
errs.append(f"\tvarargs: {check_res.varargs}(Does not support pass positional arguments, please delete it)") | |||
if check_res.missing: | |||
@@ -224,9 +447,9 @@ class CheckError(Exception): | |||
errs.append(f"\tduplicated param: {check_res.duplicated}") | |||
if check_res.unused: | |||
errs.append(f"\tunused param: {check_res.unused}") | |||
Exception.__init__(self, '\n'.join(errs)) | |||
self.check_res = check_res | |||
self.func_signature = func_signature | |||
@@ -236,7 +459,7 @@ WARNING_CHECK_LEVEL = 1 | |||
STRICT_CHECK_LEVEL = 2 | |||
def _check_loss_evaluate(prev_func_signature: str, func_signature: str, check_res: CheckRes, | |||
def _check_loss_evaluate(prev_func_signature: str, func_signature: str, check_res: _CheckRes, | |||
pred_dict: dict, target_dict: dict, dataset, check_level=0): | |||
errs = [] | |||
unuseds = [] | |||
@@ -246,7 +469,7 @@ def _check_loss_evaluate(prev_func_signature: str, func_signature: str, check_re | |||
# if check_res.varargs: | |||
# errs.append(f"\tvarargs: *{check_res.varargs}") | |||
# suggestions.append(f"Does not support pass positional arguments, please delete *{check_res.varargs}.") | |||
if check_res.unused: | |||
for _unused in check_res.unused: | |||
if _unused in target_dict: | |||
@@ -256,20 +479,19 @@ def _check_loss_evaluate(prev_func_signature: str, func_signature: str, check_re | |||
if _unused_field: | |||
unuseds.append(f"\tunused field: {_unused_field}") | |||
if _unused_param: | |||
unuseds.append(f"\tunused param: {_unused_param}") # output from predict or forward | |||
unuseds.append(f"\tunused param: {_unused_param}") # output from predict or forward | |||
module_name = func_signature.split('.')[0] | |||
if check_res.missing: | |||
errs.append(f"\tmissing param: {check_res.missing}") | |||
import re | |||
mapped_missing = [] | |||
unmapped_missing = [] | |||
mapped_missing = [] # 提供了映射的参数 | |||
unmapped_missing = [] # 没有指定映射的参数 | |||
input_func_map = {} | |||
for _miss in check_res.missing: | |||
if '(' in _miss: | |||
# if they are like 'SomeParam(assign to xxx)' | |||
_miss = _miss.split('(')[0] | |||
matches = re.findall("(?<=`)[a-zA-Z0-9]*?(?=`)", _miss) | |||
for _miss_ in check_res.missing: | |||
# they shoudl like 'SomeParam(assign to xxx)' | |||
_miss = _miss_.split('(')[0] | |||
matches = re.findall("(?<=`)[a-zA-Z0-9]*?(?=`)", _miss_) | |||
if len(matches) == 2: | |||
fun_arg, module_name = matches | |||
input_func_map[_miss] = fun_arg | |||
@@ -279,50 +501,50 @@ def _check_loss_evaluate(prev_func_signature: str, func_signature: str, check_re | |||
mapped_missing.append(_miss) | |||
else: | |||
unmapped_missing.append(_miss) | |||
for _miss in mapped_missing: | |||
for _miss in mapped_missing + unmapped_missing: | |||
if _miss in dataset: | |||
suggestions.append(f"Set {_miss} as target.") | |||
suggestions.append(f"Set `{_miss}` as target.") | |||
else: | |||
_tmp = '' | |||
if check_res.unused: | |||
_tmp = f"Check key assignment for `{input_func_map.get(_miss, _miss)}` when initialize {module_name}." | |||
_tmp = f"Check key assignment for `{input_func_map.get(_miss,_miss)}` when initialize {module_name}." | |||
if _tmp: | |||
_tmp += f' Or provide {_miss} in DataSet or output of {prev_func_signature}.' | |||
_tmp += f' Or provide `{_miss}` in DataSet or output of {prev_func_signature}.' | |||
else: | |||
_tmp = f'Provide {_miss} in DataSet or output of {prev_func_signature}.' | |||
_tmp = f'Provide `{_miss}` in DataSet or output of {prev_func_signature}.' | |||
suggestions.append(_tmp) | |||
for _miss in unmapped_missing: | |||
if _miss in dataset: | |||
suggestions.append(f"Set {_miss} as target.") | |||
else: | |||
_tmp = '' | |||
if check_res.unused: | |||
_tmp = f"Specify your assignment for `{input_func_map.get(_miss, _miss)}` when initialize {module_name}." | |||
if _tmp: | |||
_tmp += f' Or provide {_miss} in DataSet or output of {prev_func_signature}.' | |||
else: | |||
_tmp = f'Provide {_miss} in output of {prev_func_signature} or DataSet.' | |||
suggestions.append(_tmp) | |||
# for _miss in unmapped_missing: | |||
# if _miss in dataset: | |||
# suggestions.append(f"Set `{_miss}` as target.") | |||
# else: | |||
# _tmp = '' | |||
# if check_res.unused: | |||
# _tmp = f"Specify your assignment for `{input_func_map.get(_miss, _miss)}` when initialize {module_name}." | |||
# if _tmp: | |||
# _tmp += f' Or provide `{_miss}` in DataSet or output of {prev_func_signature}.' | |||
# else: | |||
# _tmp = f'Provide `{_miss}` in output of {prev_func_signature} or DataSet.' | |||
# suggestions.append(_tmp) | |||
if check_res.duplicated: | |||
errs.append(f"\tduplicated param: {check_res.duplicated}.") | |||
suggestions.append(f"Delete {check_res.duplicated} in the output of " | |||
f"{prev_func_signature} or do not set {check_res.duplicated} as targets. ") | |||
if len(errs)>0: | |||
if len(errs) > 0: | |||
errs.extend(unuseds) | |||
elif check_level == STRICT_CHECK_LEVEL: | |||
errs.extend(unuseds) | |||
if len(errs) > 0: | |||
errs.insert(0, f'Problems occurred when calling {func_signature}') | |||
sugg_str = "" | |||
if len(suggestions) > 1: | |||
for idx, sugg in enumerate(suggestions): | |||
if idx>0: | |||
if idx > 0: | |||
sugg_str += '\t\t\t' | |||
sugg_str += f'({idx+1}). {sugg}\n' | |||
sugg_str += f'({idx + 1}). {sugg}\n' | |||
sugg_str = sugg_str[:-1] | |||
else: | |||
sugg_str += suggestions[0] | |||
@@ -337,14 +559,15 @@ def _check_loss_evaluate(prev_func_signature: str, func_signature: str, check_re | |||
_unused_warn = f'{check_res.unused} is not used by {module_name}.' | |||
warnings.warn(message=_unused_warn) | |||
def _check_forward_error(forward_func, batch_x, dataset, check_level): | |||
check_res = _check_arg_dict_list(forward_func, batch_x) | |||
func_signature = get_func_signature(forward_func) | |||
func_signature = _get_func_signature(forward_func) | |||
errs = [] | |||
suggestions = [] | |||
_unused = [] | |||
# if check_res.varargs: | |||
# errs.append(f"\tvarargs: {check_res.varargs}") | |||
# suggestions.append(f"Does not support pass positional arguments, please delete *{check_res.varargs}.") | |||
@@ -365,20 +588,20 @@ def _check_forward_error(forward_func, batch_x, dataset, check_level): | |||
# _tmp += f"Or you might find it in `unused field:`, you can use DataSet.rename_field() to " \ | |||
# f"rename the field in `unused field:`." | |||
suggestions.append(_tmp) | |||
if check_res.unused: | |||
_unused = [f"\tunused field: {check_res.unused}"] | |||
if len(errs)>0: | |||
if len(errs) > 0: | |||
errs.extend(_unused) | |||
elif check_level == STRICT_CHECK_LEVEL: | |||
errs.extend(_unused) | |||
if len(errs) > 0: | |||
errs.insert(0, f'Problems occurred when calling {func_signature}') | |||
sugg_str = "" | |||
if len(suggestions) > 1: | |||
for idx, sugg in enumerate(suggestions): | |||
sugg_str += f'({idx+1}). {sugg}' | |||
sugg_str += f'({idx + 1}). {sugg}' | |||
else: | |||
sugg_str += suggestions[0] | |||
err_str = '\n' + '\n'.join(errs) + '\n\tSuggestion: ' + sugg_str | |||
@@ -389,72 +612,66 @@ def _check_forward_error(forward_func, batch_x, dataset, check_level): | |||
warnings.warn(message=_unused_warn) | |||
def seq_lens_to_masks(seq_lens, float=False): | |||
def seq_len_to_mask(seq_len): | |||
""" | |||
Convert seq_lens to masks. | |||
:param seq_lens: list, np.ndarray, or torch.LongTensor, shape should all be (B,) | |||
:param float: if True, the return masks is in float type, otherwise it is byte. | |||
:return: list, np.ndarray or torch.Tensor, shape will be (B, max_length) | |||
将一个表示sequence length的一维数组转换为二维的mask,不包含的位置为0。 | |||
转变 1-d seq_len到2-d mask. | |||
Example:: | |||
>>> seq_len = torch.arange(2, 16) | |||
>>> mask = seq_len_to_mask(seq_len) | |||
>>> print(mask.size()) | |||
torch.Size([14, 15]) | |||
>>> seq_len = np.arange(2, 16) | |||
>>> mask = seq_len_to_mask(seq_len) | |||
>>> print(mask.shape) | |||
(14, 15) | |||
:param np.ndarray,torch.LongTensor seq_len: shape将是(B,) | |||
:return: np.ndarray or torch.Tensor, shape将是(B, max_length)。 元素类似为bool或torch.uint8 | |||
""" | |||
if isinstance(seq_lens, np.ndarray): | |||
assert len(np.shape(seq_lens)) == 1, f"seq_lens can only have one dimension, got {len(np.shape(seq_lens))}." | |||
assert seq_lens.dtype in (int, np.int32, np.int64), f"seq_lens can only be integer, not {seq_lens.dtype}." | |||
raise NotImplemented | |||
elif isinstance(seq_lens, torch.Tensor): | |||
assert len(seq_lens.size()) == 1, f"seq_lens can only have one dimension, got {len(seq_lens.size())==1}." | |||
batch_size = seq_lens.size(0) | |||
max_len = seq_lens.max() | |||
indexes = torch.arange(max_len).view(1, -1).repeat(batch_size, 1).to(seq_lens.device) | |||
masks = indexes.lt(seq_lens.unsqueeze(1)) | |||
if float: | |||
masks = masks.float() | |||
return masks | |||
elif isinstance(seq_lens, list): | |||
raise NotImplemented | |||
if isinstance(seq_len, np.ndarray): | |||
assert len(np.shape(seq_len)) == 1, f"seq_len can only have one dimension, got {len(np.shape(seq_len))}." | |||
max_len = int(seq_len.max()) | |||
broad_cast_seq_len = np.tile(np.arange(max_len), (len(seq_len), 1)) | |||
mask = broad_cast_seq_len < seq_len.reshape(-1, 1) | |||
elif isinstance(seq_len, torch.Tensor): | |||
assert seq_len.dim() == 1, f"seq_len can only have one dimension, got {seq_len.dim() == 1}." | |||
batch_size = seq_len.size(0) | |||
max_len = seq_len.max().long() | |||
broad_cast_seq_len = torch.arange(max_len).expand(batch_size, -1).to(seq_len) | |||
mask = broad_cast_seq_len.lt(seq_len.unsqueeze(1)) | |||
else: | |||
raise NotImplemented | |||
def seq_mask(seq_len, max_len): | |||
"""Create sequence mask. | |||
:param seq_len: list or torch.Tensor, the lengths of sequences in a batch. | |||
:param max_len: int, the maximum sequence length in a batch. | |||
:return mask: torch.LongTensor, [batch_size, max_len] | |||
""" | |||
if not isinstance(seq_len, torch.Tensor): | |||
seq_len = torch.LongTensor(seq_len) | |||
seq_len = seq_len.view(-1, 1).long() # [batch_size, 1] | |||
seq_range = torch.arange(start=0, end=max_len, dtype=torch.long, device=seq_len.device).view(1, -1) # [1, max_len] | |||
return torch.gt(seq_len, seq_range) # [batch_size, max_len] | |||
raise TypeError("Only support 1-d numpy.ndarray or 1-d torch.Tensor.") | |||
return mask | |||
class pseudo_tqdm: | |||
class _pseudo_tqdm: | |||
""" | |||
当无法引入tqdm,或者Trainer中设置use_tqdm为false的时候,用该方法打印数据 | |||
""" | |||
def __init__(self, **kwargs): | |||
pass | |||
def write(self, info): | |||
print(info) | |||
def set_postfix_str(self, info): | |||
print(info) | |||
def __getattr__(self, item): | |||
def pass_func(*args, **kwargs): | |||
pass | |||
return pass_func | |||
def __enter__(self): | |||
return self | |||
def __exit__(self, exc_type, exc_val, exc_tb): | |||
del self |
@@ -1,24 +1,33 @@ | |||
__all__ = [ | |||
"Vocabulary" | |||
] | |||
from functools import wraps | |||
from collections import Counter | |||
from .dataset import DataSet | |||
def check_build_vocab(func): | |||
def _check_build_vocab(func): | |||
"""A decorator to make sure the indexing is built before used. | |||
""" | |||
@wraps(func) # to solve missing docstring | |||
def _wrapper(self, *args, **kwargs): | |||
if self.word2idx is None or self.rebuild is True: | |||
self.build_vocab() | |||
return func(self, *args, **kwargs) | |||
return _wrapper | |||
def check_build_status(func): | |||
def _check_build_status(func): | |||
"""A decorator to check whether the vocabulary updates after the last build. | |||
""" | |||
@wraps(func) # to solve missing docstring | |||
def _wrapper(self, *args, **kwargs): | |||
if self.rebuild is False: | |||
self.rebuild = True | |||
@@ -27,27 +36,38 @@ def check_build_status(func): | |||
"Adding more words may cause unexpected behaviour of Vocabulary. ".format( | |||
self.max_size, func.__name__)) | |||
return func(self, *args, **kwargs) | |||
return _wrapper | |||
class Vocabulary(object): | |||
"""Use for word and index one to one mapping | |||
""" | |||
别名::class:`fastNLP.Vocabulary` :class:`fastNLP.core.vocabulary.Vocabulary` | |||
用于构建, 存储和使用 `str` 到 `int` 的一一映射 | |||
Example:: | |||
vocab = Vocabulary() | |||
word_list = "this is a word list".split() | |||
vocab.update(word_list) | |||
vocab["word"] | |||
vocab.to_word(5) | |||
:param int max_size: set the max number of words in Vocabulary. Default: None | |||
:param int min_freq: set the min occur frequency of words in Vocabulary. Default: None | |||
vocab["word"] # str to int | |||
vocab.to_word(5) # int to str | |||
:param int max_size: `Vocabulary` 的最大大小, 即能存储词的最大数量 | |||
若为 ``None`` , 则不限制大小. Default: ``None`` | |||
:param int min_freq: 能被记录下的词在文本中的最小出现频率, 应大于或等于 1. | |||
若小于该频率, 词语将被视为 `unknown`. 若为 ``None`` , 所有文本中的词都被记录. Default: ``None`` | |||
:param str optional padding: padding的字符. 如果设置为 ``None`` , | |||
则vocabulary中不考虑padding, 也不计入词表大小,为 ``None`` 的情况多在为label建立Vocabulary的情况. | |||
Default: '<pad>' | |||
:param str optional unknown: unknown的字符,所有未被记录的词在转为 `int` 时将被视为unknown. | |||
如果设置为 ``None`` ,则vocabulary中不考虑unknow, 也不计入词表大小. | |||
为 ``None`` 的情况多在为label建立Vocabulary的情况. | |||
Default: '<unk>' | |||
""" | |||
def __init__(self, max_size=None, min_freq=None, unknown='<unk>', padding='<pad>'): | |||
def __init__(self, max_size=None, min_freq=None, padding='<pad>', unknown='<unk>'): | |||
self.max_size = max_size | |||
self.min_freq = min_freq | |||
self.word_count = Counter() | |||
@@ -56,51 +76,55 @@ class Vocabulary(object): | |||
self.word2idx = None | |||
self.idx2word = None | |||
self.rebuild = True | |||
@check_build_status | |||
@_check_build_status | |||
def update(self, word_lst): | |||
"""Add a list of words into the vocabulary. | |||
"""依次增加序列中词在词典中的出现频率 | |||
:param list word_lst: a list of strings | |||
""" | |||
self.word_count.update(word_lst) | |||
@check_build_status | |||
@_check_build_status | |||
def add(self, word): | |||
"""Add a single word into the vocabulary. | |||
""" | |||
增加一个新词在词典中的出现频率 | |||
:param str word: a word or token. | |||
:param str word: 新词 | |||
""" | |||
self.word_count[word] += 1 | |||
@check_build_status | |||
@_check_build_status | |||
def add_word(self, word): | |||
"""Add a single word into the vocabulary. | |||
:param str word: a word or token. | |||
""" | |||
增加一个新词在词典中的出现频率 | |||
:param str word: 新词 | |||
""" | |||
self.add(word) | |||
@check_build_status | |||
@_check_build_status | |||
def add_word_lst(self, word_lst): | |||
"""Add a list of words into the vocabulary. | |||
:param list word_lst: a list of strings | |||
""" | |||
依次增加序列中词在词典中的出现频率 | |||
:param list[str] word_lst: 词的序列 | |||
""" | |||
self.update(word_lst) | |||
def build_vocab(self): | |||
"""Build a mapping from word to index, and filter the word using ``max_size`` and ``min_freq``. | |||
""" | |||
根据已经出现的词和出现频率构建词典. 注意: 重复构建可能会改变词典的大小, | |||
但已经记录在词典中的词, 不会改变对应的 `int` | |||
""" | |||
self.word2idx = {} | |||
if self.word2idx is None: | |||
self.word2idx = {} | |||
if self.padding is not None: | |||
self.word2idx[self.padding] = 0 | |||
self.word2idx[self.padding] = len(self.word2idx) | |||
if self.unknown is not None: | |||
self.word2idx[self.unknown] = 1 | |||
self.word2idx[self.unknown] = len(self.word2idx) | |||
max_size = min(self.max_size, len(self.word_count)) if self.max_size else None | |||
words = self.word_count.most_common(max_size) | |||
if self.min_freq is not None: | |||
@@ -111,32 +135,47 @@ class Vocabulary(object): | |||
self.word2idx.update({w: i + start_idx for i, (w, _) in enumerate(words)}) | |||
self.build_reverse_vocab() | |||
self.rebuild = False | |||
def build_reverse_vocab(self): | |||
"""Build "index to word" dict based on "word to index" dict. | |||
""" | |||
基于 "word to index" dict, 构建 "index to word" dict. | |||
""" | |||
self.idx2word = {i: w for w, i in self.word2idx.items()} | |||
@check_build_vocab | |||
@_check_build_vocab | |||
def __len__(self): | |||
return len(self.word2idx) | |||
@check_build_vocab | |||
@_check_build_vocab | |||
def __contains__(self, item): | |||
"""Check if a word in vocabulary. | |||
""" | |||
检查词是否被记录 | |||
:param item: the word | |||
:return: True or False | |||
""" | |||
return item in self.word2idx | |||
def has_word(self, w): | |||
return self.__contains__(w) | |||
""" | |||
检查词是否被记录 | |||
Example:: | |||
@check_build_vocab | |||
has_abc = vocab.has_word('abc') | |||
# equals to | |||
has_abc = 'abc' in vocab | |||
:param item: the word | |||
:return: ``True`` or ``False`` | |||
""" | |||
return self.__contains__(w) | |||
@_check_build_vocab | |||
def __getitem__(self, w): | |||
"""To support usage like:: | |||
""" | |||
To support usage like:: | |||
vocab[w] | |||
""" | |||
@@ -146,49 +185,174 @@ class Vocabulary(object): | |||
return self.word2idx[self.unknown] | |||
else: | |||
raise ValueError("word {} not in vocabulary".format(w)) | |||
@_check_build_vocab | |||
def index_dataset(self, *datasets, field_name, new_field_name=None): | |||
""" | |||
将DataSet中对应field的词转为数字. | |||
Example:: | |||
# remember to use `field_name` | |||
vocab.index_dataset(train_data, dev_data, test_data, field_name='words') | |||
:param datasets: 需要转index的 class:`~fastNLP.DataSet` , 支持一个或多个(list) | |||
:param str field_name: 需要转index的field, 若有多个 DataSet, 每个DataSet都必须有此 field. | |||
目前仅支持 ``str`` , ``list(str)`` , ``list(list(str))`` | |||
:param str new_field_name: 保存结果的field_name. 若为 ``None`` , 将覆盖原field. | |||
Default: ``None`` | |||
""" | |||
def index_instance(ins): | |||
""" | |||
有几种情况, str, 1d-list, 2d-list | |||
:param ins: | |||
:return: | |||
""" | |||
field = ins[field_name] | |||
if isinstance(field, str): | |||
return self.to_index(field) | |||
elif isinstance(field, list): | |||
if not isinstance(field[0], list): | |||
return [self.to_index(w) for w in field] | |||
else: | |||
if isinstance(field[0][0], list): | |||
raise RuntimeError("Only support field with 2 dimensions.") | |||
return [[self.to_index(c) for c in w] for w in field] | |||
if new_field_name is None: | |||
new_field_name = field_name | |||
for idx, dataset in enumerate(datasets): | |||
if isinstance(dataset, DataSet): | |||
try: | |||
dataset.apply(index_instance, new_field_name=new_field_name) | |||
except Exception as e: | |||
print("When processing the `{}` dataset, the following error occurred.".format(idx)) | |||
raise e | |||
else: | |||
raise RuntimeError("Only DataSet type is allowed.") | |||
def from_dataset(self, *datasets, field_name): | |||
""" | |||
使用dataset的对应field中词构建词典 | |||
Example:: | |||
# remember to use `field_name` | |||
vocab.from_dataset(train_data1, train_data2, field_name='words') | |||
:param datasets: 需要转index的 class:`~fastNLP.DataSet` , 支持一个或多个(list) | |||
:param field_name: 可为 ``str`` 或 ``list(str)`` . | |||
构建词典所使用的 field(s), 支持一个或多个field | |||
若有多个 DataSet, 每个DataSet都必须有这些field. | |||
目前仅支持的field结构: ``str`` , ``list(str)`` , ``list(list(str))`` | |||
:return self: | |||
""" | |||
if isinstance(field_name, str): | |||
field_name = [field_name] | |||
elif not isinstance(field_name, list): | |||
raise TypeError('invalid argument field_name: {}'.format(field_name)) | |||
def construct_vocab(ins): | |||
for fn in field_name: | |||
field = ins[fn] | |||
if isinstance(field, str): | |||
self.add_word(field) | |||
elif isinstance(field, list): | |||
if not isinstance(field[0], list): | |||
self.add_word_lst(field) | |||
else: | |||
if isinstance(field[0][0], list): | |||
raise RuntimeError("Only support field with 2 dimensions.") | |||
[self.add_word_lst(w) for w in field] | |||
for idx, dataset in enumerate(datasets): | |||
if isinstance(dataset, DataSet): | |||
try: | |||
dataset.apply(construct_vocab) | |||
except Exception as e: | |||
print("When processing the `{}` dataset, the following error occurred.".format(idx)) | |||
raise e | |||
else: | |||
raise RuntimeError("Only DataSet type is allowed.") | |||
return self | |||
def to_index(self, w): | |||
""" Turn a word to an index. If w is not in Vocabulary, return the unknown label. | |||
""" | |||
将词转为数字. 若词不再词典中被记录, 将视为 unknown, 若 ``unknown=None`` , 将抛出 | |||
``ValueError`` | |||
Example:: | |||
index = vocab.to_index('abc') | |||
# equals to | |||
index = vocab['abc'] | |||
:param str w: a word | |||
:return int index: the number | |||
""" | |||
return self.__getitem__(w) | |||
@property | |||
@check_build_vocab | |||
@_check_build_vocab | |||
def unknown_idx(self): | |||
""" | |||
unknown 对应的数字. | |||
""" | |||
if self.unknown is None: | |||
return None | |||
return self.word2idx[self.unknown] | |||
@property | |||
@check_build_vocab | |||
@_check_build_vocab | |||
def padding_idx(self): | |||
""" | |||
padding 对应的数字 | |||
""" | |||
if self.padding is None: | |||
return None | |||
return self.word2idx[self.padding] | |||
@check_build_vocab | |||
@_check_build_vocab | |||
def to_word(self, idx): | |||
"""given a word's index, return the word itself | |||
""" | |||
给定一个数字, 将其转为对应的词. | |||
:param int idx: the index | |||
:return str word: the indexed word | |||
:return str word: the word | |||
""" | |||
return self.idx2word[idx] | |||
def clear(self): | |||
""" | |||
删除Vocabulary中的词表数据。相当于重新初始化一下。 | |||
:return: | |||
""" | |||
self.word_count.clear() | |||
self.word2idx = None | |||
self.idx2word = None | |||
self.rebuild = True | |||
def __getstate__(self): | |||
"""Use to prepare data for pickle. | |||
""" | |||
len(self) # make sure vocab has been built | |||
state = self.__dict__.copy() | |||
# no need to pickle idx2word as it can be constructed from word2idx | |||
del state['idx2word'] | |||
return state | |||
def __setstate__(self, state): | |||
"""Use to restore state from pickle. | |||
""" | |||
self.__dict__.update(state) | |||
self.build_reverse_vocab() | |||
def __repr__(self): | |||
return "Vocabulary({}...)".format(list(self.word_count.keys())[:5]) | |||
def __iter__(self): | |||
return iter(list(self.word_count.keys())) |
@@ -0,0 +1,31 @@ | |||
""" | |||
用于IO的模块, 具体包括: | |||
1. 用于读入 embedding 的 :doc:`EmbedLoader <fastNLP.io.embed_loader>` 类, | |||
2. 用于读入数据的 :doc:`DataSetLoader <fastNLP.io.dataset_loader>` 类 | |||
3. 用于保存和载入模型的类, 参考 :doc:`/fastNLP.io.model_io` | |||
这些类的使用方法如下: | |||
""" | |||
__all__ = [ | |||
'EmbedLoader', | |||
'DataSetLoader', | |||
'CSVLoader', | |||
'JsonLoader', | |||
'ConllLoader', | |||
'SNLILoader', | |||
'SSTLoader', | |||
'PeopleDailyCorpusLoader', | |||
'Conll2003Loader', | |||
'ModelLoader', | |||
'ModelSaver', | |||
] | |||
from .embed_loader import EmbedLoader | |||
from .dataset_loader import DataSetLoader, CSVLoader, JsonLoader, ConllLoader, SNLILoader, SSTLoader, \ | |||
PeopleDailyCorpusLoader, Conll2003Loader | |||
from .model_io import ModelLoader, ModelSaver |
@@ -1,30 +1,42 @@ | |||
__all__ = [ | |||
"BaseLoader" | |||
] | |||
import _pickle as pickle | |||
import os | |||
class BaseLoader(object): | |||
"""Base loader for all loaders. | |||
""" | |||
各个 Loader 的基类,提供了 API 的参考。 | |||
""" | |||
def __init__(self): | |||
super(BaseLoader, self).__init__() | |||
@staticmethod | |||
def load_lines(data_path): | |||
"""按行读取,舍弃每行两侧空白字符,返回list of str | |||
""" | |||
按行读取,舍弃每行两侧空白字符,返回list of str | |||
:param data_path: 读取数据的路径 | |||
""" | |||
with open(data_path, "r", encoding="utf=8") as f: | |||
text = f.readlines() | |||
return [line.strip() for line in text] | |||
@classmethod | |||
def load(cls, data_path): | |||
"""先按行读取,去除一行两侧空白,再提取每行的字符。返回list of list of str | |||
""" | |||
先按行读取,去除一行两侧空白,再提取每行的字符。返回list of list of str | |||
:param data_path: | |||
""" | |||
with open(data_path, "r", encoding="utf-8") as f: | |||
text = f.readlines() | |||
return [[word for word in sent.strip()] for sent in text] | |||
@classmethod | |||
def load_with_cache(cls, data_path, cache_path): | |||
"""缓存版的load | |||
@@ -40,22 +52,23 @@ class BaseLoader(object): | |||
class DataLoaderRegister: | |||
"""Register for all data sets. | |||
""" | |||
_readers = {} | |||
@classmethod | |||
def set_reader(cls, reader_cls, read_fn_name): | |||
# def wrapper(reader_cls): | |||
if read_fn_name in cls._readers: | |||
raise KeyError('duplicate reader: {} and {} for read_func: {}'.format(cls._readers[read_fn_name], reader_cls, read_fn_name)) | |||
raise KeyError( | |||
'duplicate reader: {} and {} for read_func: {}'.format(cls._readers[read_fn_name], reader_cls, | |||
read_fn_name)) | |||
if hasattr(reader_cls, 'load'): | |||
cls._readers[read_fn_name] = reader_cls().load | |||
return reader_cls | |||
@classmethod | |||
def get_reader(cls, read_fn_name): | |||
if read_fn_name in cls._readers: | |||
return cls._readers[read_fn_name] | |||
raise AttributeError('no read function: {}'.format(read_fn_name)) | |||
# TODO 这个类使用在何处? |
@@ -1,31 +1,48 @@ | |||
""" | |||
用于读入和处理和保存 config 文件 | |||
.. todo:: | |||
这个模块中的类可能被抛弃? | |||
""" | |||
__all__ = [ | |||
"ConfigLoader", | |||
"ConfigSection", | |||
"ConfigSaver" | |||
] | |||
import configparser | |||
import json | |||
import os | |||
from fastNLP.io.base_loader import BaseLoader | |||
from .base_loader import BaseLoader | |||
class ConfigLoader(BaseLoader): | |||
"""Loader for configuration. | |||
""" | |||
别名::class:`fastNLP.io.ConfigLoader` :class:`fastNLP.io.config_io.ConfigLoader` | |||
读取配置文件的Loader | |||
:param str data_path: path to the config | |||
:param str data_path: 配置文件的路径 | |||
""" | |||
def __init__(self, data_path=None): | |||
super(ConfigLoader, self).__init__() | |||
if data_path is not None: | |||
self.config = self.parse(super(ConfigLoader, self).load(data_path)) | |||
@staticmethod | |||
def parse(string): | |||
raise NotImplementedError | |||
@staticmethod | |||
def load_config(file_path, sections): | |||
"""Load section(s) of configuration into the ``sections`` provided. No returns. | |||
""" | |||
把配置文件的section 存入提供的 ``sections`` 中 | |||
:param str file_path: the path of config file | |||
:param dict sections: the dict of ``{section_name(string): ConfigSection object}`` | |||
:param str file_path: 配置文件的路径 | |||
:param dict sections: 符合如下键值对组成的字典 `section_name(string)` : :class:`~fastNLP.io.ConfigSection` | |||
Example:: | |||
test_args = ConfigSection() | |||
@@ -65,13 +82,16 @@ class ConfigLoader(BaseLoader): | |||
class ConfigSection(object): | |||
"""ConfigSection is the data structure storing all key-value pairs in one section in a config file. | |||
""" | |||
别名::class:`fastNLP.io.ConfigSection` :class:`fastNLP.io.config_io.ConfigSection` | |||
ConfigSection是一个存储了一个section中所有键值对的数据结构,推荐使用此类的实例来配合 :meth:`ConfigLoader.load_config` 使用 | |||
""" | |||
def __init__(self): | |||
super(ConfigSection, self).__init__() | |||
def __getitem__(self, key): | |||
""" | |||
:param key: str, the name of the attribute | |||
@@ -84,7 +104,7 @@ class ConfigSection(object): | |||
if key in self.__dict__.keys(): | |||
return getattr(self, key) | |||
raise AttributeError("do NOT have attribute %s" % key) | |||
def __setitem__(self, key, value): | |||
""" | |||
:param key: str, the name of the attribute | |||
@@ -99,14 +119,14 @@ class ConfigSection(object): | |||
raise AttributeError("attr %s except %s but got %s" % | |||
(key, str(type(getattr(self, key))), str(type(value)))) | |||
setattr(self, key, value) | |||
def __contains__(self, item): | |||
""" | |||
:param item: The key of item. | |||
:return: True if the key in self.__dict__.keys() else False. | |||
""" | |||
return item in self.__dict__.keys() | |||
def __eq__(self, other): | |||
"""Overwrite the == operator | |||
@@ -118,15 +138,15 @@ class ConfigSection(object): | |||
return False | |||
if getattr(self, k) != getattr(self, k): | |||
return False | |||
for k in other.__dict__.keys(): | |||
if k not in self.__dict__.keys(): | |||
return False | |||
if getattr(self, k) != getattr(self, k): | |||
return False | |||
return True | |||
def __ne__(self, other): | |||
"""Overwrite the != operator | |||
@@ -134,25 +154,30 @@ class ConfigSection(object): | |||
:return: | |||
""" | |||
return not self.__eq__(other) | |||
@property | |||
def data(self): | |||
return self.__dict__ | |||
class ConfigSaver(object): | |||
"""ConfigSaver is used to save config file and solve related conflicts. | |||
""" | |||
别名::class:`fastNLP.io.ConfigSaver` :class:`fastNLP.io.config_io.ConfigSaver` | |||
ConfigSaver 是用来存储配置文件并解决相关冲突的类 | |||
:param str file_path: path to the config file | |||
:param str file_path: 配置文件的路径 | |||
""" | |||
def __init__(self, file_path): | |||
self.file_path = file_path | |||
if not os.path.exists(self.file_path): | |||
raise FileNotFoundError("file {} NOT found!".__format__(self.file_path)) | |||
def _get_section(self, sect_name): | |||
"""This is the function to get the section with the section name. | |||
""" | |||
This is the function to get the section with the section name. | |||
:param sect_name: The name of section what wants to load. | |||
:return: The section. | |||
@@ -160,25 +185,26 @@ class ConfigSaver(object): | |||
sect = ConfigSection() | |||
ConfigLoader().load_config(self.file_path, {sect_name: sect}) | |||
return sect | |||
def _read_section(self): | |||
"""This is the function to read sections from the config file. | |||
""" | |||
This is the function to read sections from the config file. | |||
:return: sect_list, sect_key_list | |||
sect_list: A list of ConfigSection(). | |||
sect_key_list: A list of names in sect_list. | |||
""" | |||
sect_name = None | |||
sect_list = {} | |||
sect_key_list = [] | |||
single_section = {} | |||
single_section_key = [] | |||
with open(self.file_path, 'r') as f: | |||
lines = f.readlines() | |||
for line in lines: | |||
if line.startswith('[') and line.endswith(']\n'): | |||
if sect_name is None: | |||
@@ -190,31 +216,32 @@ class ConfigSaver(object): | |||
sect_key_list.append(sect_name) | |||
sect_name = line[1: -2] | |||
continue | |||
if line.startswith('#'): | |||
single_section[line] = '#' | |||
single_section_key.append(line) | |||
continue | |||
if line.startswith('\n'): | |||
single_section_key.append('\n') | |||
continue | |||
if '=' not in line: | |||
raise RuntimeError("can NOT load config file {}".__format__(self.file_path)) | |||
key = line.split('=', maxsplit=1)[0].strip() | |||
value = line.split('=', maxsplit=1)[1].strip() + '\n' | |||
single_section[key] = value | |||
single_section_key.append(key) | |||
if sect_name is not None: | |||
sect_list[sect_name] = single_section, single_section_key | |||
sect_key_list.append(sect_name) | |||
return sect_list, sect_key_list | |||
def _write_section(self, sect_list, sect_key_list): | |||
"""This is the function to write config file with section list and name list. | |||
""" | |||
This is the function to write config file with section list and name list. | |||
:param sect_list: A list of ConfigSection() need to be writen into file. | |||
:param sect_key_list: A list of name of sect_list. | |||
@@ -233,12 +260,13 @@ class ConfigSaver(object): | |||
continue | |||
f.write(key + ' = ' + single_section[key]) | |||
f.write('\n') | |||
def save_config_file(self, section_name, section): | |||
"""This is the function to be called to change the config file with a single section and its name. | |||
""" | |||
这个方法可以用来修改并保存配置文件中单独的一个 section | |||
:param str section_name: The name of section what needs to be changed and saved. | |||
:param ConfigSection section: The section with key and value what needs to be changed and saved. | |||
:param str section_name: 需要保存的 section 的名字. | |||
:param section: 你需要修改并保存的 section, :class:`~fastNLP.io.ConfigSaver` 类型 | |||
""" | |||
section_file = self._get_section(section_name) | |||
if len(section_file.__dict__.keys()) == 0: # the section not in the file before | |||
@@ -264,11 +292,11 @@ class ConfigSaver(object): | |||
break | |||
if not change_file: | |||
return | |||
sect_list, sect_key_list = self._read_section() | |||
if section_name not in sect_key_list: | |||
raise AttributeError() | |||
sect, sect_key = sect_list[section_name] | |||
for k in section.__dict__.keys(): | |||
if k not in sect_key: | |||
@@ -1,126 +1,155 @@ | |||
__all__ = [ | |||
"EmbedLoader" | |||
] | |||
import os | |||
import warnings | |||
import numpy as np | |||
import torch | |||
from fastNLP.core.vocabulary import Vocabulary | |||
from fastNLP.io.base_loader import BaseLoader | |||
from ..core.vocabulary import Vocabulary | |||
from .base_loader import BaseLoader | |||
class EmbedLoader(BaseLoader): | |||
"""docstring for EmbedLoader""" | |||
""" | |||
别名::class:`fastNLP.io.EmbedLoader` :class:`fastNLP.io.embed_loader.EmbedLoader` | |||
用于读取预训练的embedding, 读取结果可直接载入为模型参数。 | |||
""" | |||
def __init__(self): | |||
super(EmbedLoader, self).__init__() | |||
@staticmethod | |||
def _load_glove(emb_file): | |||
"""Read file as a glove embedding | |||
file format: | |||
embeddings are split by line, | |||
for one embedding, word and numbers split by space | |||
Example:: | |||
word_1 float_1 float_2 ... float_emb_dim | |||
word_2 float_1 float_2 ... float_emb_dim | |||
... | |||
def load_with_vocab(embed_filepath, vocab, dtype=np.float32, normalize=True, error='ignore'): | |||
""" | |||
emb = {} | |||
with open(emb_file, 'r', encoding='utf-8') as f: | |||
for line in f: | |||
line = list(filter(lambda w: len(w) > 0, line.strip().split(' '))) | |||
if len(line) > 2: | |||
emb[line[0]] = torch.Tensor(list(map(float, line[1:]))) | |||
return emb | |||
@staticmethod | |||
def _load_pretrain(emb_file, emb_type): | |||
"""Read txt data from embedding file and convert to np.array as pre-trained embedding | |||
:param str emb_file: the pre-trained embedding file path | |||
:param str emb_type: the pre-trained embedding data format | |||
:return: a dict of ``{str: np.array}`` | |||
从embed_filepath这个预训练的词向量中抽取出vocab这个词表的词的embedding。EmbedLoader将自动判断embed_filepath是 | |||
word2vec(第一行只有两个元素)还是glove格式的数据。 | |||
:param str embed_filepath: 预训练的embedding的路径。 | |||
:param vocab: 词表 :class:`~fastNLP.Vocabulary` 类型,读取出现在vocab中的词的embedding。 | |||
没有出现在vocab中的词的embedding将通过找到的词的embedding的正态分布采样出来,以使得整个Embedding是同分布的。 | |||
:param dtype: 读出的embedding的类型 | |||
:param bool normalize: 是否将每个vector归一化到norm为1 | |||
:param str error: `ignore` , `strict` ; 如果 `ignore` ,错误将自动跳过; 如果 `strict` , 错误将抛出。 | |||
这里主要可能出错的地方在于词表有空行或者词表出现了维度不一致。 | |||
:return numpy.ndarray: shape为 [len(vocab), dimension], dimension由pretrain的embedding决定。 | |||
""" | |||
if emb_type == 'glove': | |||
return EmbedLoader._load_glove(emb_file) | |||
else: | |||
raise Exception("embedding type {} not support yet".format(emb_type)) | |||
assert isinstance(vocab, Vocabulary), "Only fastNLP.Vocabulary is supported." | |||
if not os.path.exists(embed_filepath): | |||
raise FileNotFoundError("`{}` does not exist.".format(embed_filepath)) | |||
with open(embed_filepath, 'r', encoding='utf-8') as f: | |||
hit_flags = np.zeros(len(vocab), dtype=bool) | |||
line = f.readline().strip() | |||
parts = line.split() | |||
start_idx = 0 | |||
if len(parts) == 2: | |||
dim = int(parts[1]) | |||
start_idx += 1 | |||
else: | |||
dim = len(parts) - 1 | |||
f.seek(0) | |||
matrix = np.random.randn(len(vocab), dim).astype(dtype) | |||
for idx, line in enumerate(f, start_idx): | |||
try: | |||
parts = line.strip().split() | |||
if parts[0] in vocab: | |||
index = vocab.to_index(parts[0]) | |||
matrix[index] = np.fromstring(' '.join(parts[1:]), sep=' ', dtype=dtype, count=dim) | |||
hit_flags[index] = True | |||
except Exception as e: | |||
if error == 'ignore': | |||
warnings.warn("Error occurred at the {} line.".format(idx)) | |||
else: | |||
print("Error occurred at the {} line.".format(idx)) | |||
raise e | |||
total_hits = sum(hit_flags) | |||
print("Found {} out of {} words in the pre-training embedding.".format(total_hits, len(vocab))) | |||
found_vectors = matrix[hit_flags] | |||
if len(found_vectors) != 0: | |||
mean = np.mean(found_vectors, axis=0, keepdims=True) | |||
std = np.std(found_vectors, axis=0, keepdims=True) | |||
unfound_vec_num = len(vocab) - total_hits | |||
r_vecs = np.random.randn(unfound_vec_num, dim).astype(dtype) * std + mean | |||
matrix[hit_flags == False] = r_vecs | |||
if normalize: | |||
matrix /= np.linalg.norm(matrix, axis=1, keepdims=True) | |||
return matrix | |||
@staticmethod | |||
def load_embedding(emb_dim, emb_file, emb_type, vocab): | |||
"""Load the pre-trained embedding and combine with the given dictionary. | |||
:param int emb_dim: the dimension of the embedding. Should be the same as pre-trained embedding. | |||
:param str emb_file: the pre-trained embedding file path. | |||
:param str emb_type: the pre-trained embedding format, support glove now | |||
:param Vocabulary vocab: a mapping from word to index, can be provided by user or built from pre-trained embedding | |||
:return (embedding_tensor, vocab): | |||
embedding_tensor - Tensor of shape (len(word_dict), emb_dim); | |||
vocab - input vocab or vocab built by pre-train | |||
def load_without_vocab(embed_filepath, dtype=np.float32, padding='<pad>', unknown='<unk>', normalize=True, | |||
error='ignore'): | |||
""" | |||
pretrain = EmbedLoader._load_pretrain(emb_file, emb_type) | |||
if vocab is None: | |||
# build vocabulary from pre-trained embedding | |||
vocab = Vocabulary() | |||
for w in pretrain.keys(): | |||
vocab.add(w) | |||
embedding_tensor = torch.randn(len(vocab), emb_dim) | |||
for w, v in pretrain.items(): | |||
if len(v.shape) > 1 or emb_dim != v.shape[0]: | |||
raise ValueError( | |||
"Pretrained embedding dim is {}. Dimension dismatched. Required {}".format(v.shape, (emb_dim,))) | |||
if vocab.has_word(w): | |||
embedding_tensor[vocab[w]] = v | |||
return embedding_tensor, vocab | |||
@staticmethod | |||
def parse_glove_line(line): | |||
line = line.split() | |||
if len(line) <= 2: | |||
raise RuntimeError("something goes wrong in parsing glove embedding") | |||
return line[0], line[1:] | |||
@staticmethod | |||
def str_list_2_vec(line): | |||
try: | |||
return torch.Tensor(list(map(float, line))) | |||
except Exception: | |||
raise RuntimeError("something goes wrong in parsing glove embedding") | |||
@staticmethod | |||
def fast_load_embedding(emb_dim, emb_file, vocab): | |||
"""Fast load the pre-trained embedding and combine with the given dictionary. | |||
This loading method uses line-by-line operation. | |||
:param int emb_dim: the dimension of the embedding. Should be the same as pre-trained embedding. | |||
:param str emb_file: the pre-trained embedding file path. | |||
:param Vocabulary vocab: a mapping from word to index, can be provided by user or built from pre-trained embedding | |||
:return embedding_matrix: numpy.ndarray | |||
从embed_filepath中读取预训练的word vector。根据预训练的词表读取embedding并生成一个对应的Vocabulary。 | |||
:param str embed_filepath: 预训练的embedding的路径。 | |||
:param dtype: 读出的embedding的类型 | |||
:param str padding: the padding tag for vocabulary. | |||
:param str unknown: the unknown tag for vocabulary. | |||
:param bool normalize: 是否将每个vector归一化到norm为1 | |||
:param str error: `ignore` , `strict` ; 如果 `ignore` ,错误将自动跳过; 如果 `strict` , 错误将抛出。这里主要可能出错的地 | |||
方在于词表有空行或者词表出现了维度不一致。 | |||
:return numpy.ndarray: shape为 [len(vocab), dimension], dimension由pretrain的embedding决定。 | |||
:return numpy.ndarray: Vocabulary Embedding的shape是[词表大小+x, 词表维度], "词表大小+x"是由于最终的大小还取决与 | |||
是否使用padding, 以及unknown有没有在词表中找到对应的词。 Vocabulary中的词的顺序与Embedding的顺序是一一对应的。 | |||
""" | |||
if vocab is None: | |||
raise RuntimeError("You must provide a vocabulary.") | |||
embedding_matrix = np.zeros(shape=(len(vocab), emb_dim), dtype=np.float32) | |||
hit_flags = np.zeros(shape=(len(vocab),), dtype=int) | |||
with open(emb_file, "r", encoding="utf-8") as f: | |||
startline = f.readline() | |||
if len(startline.split()) > 2: | |||
vocab = Vocabulary(padding=padding, unknown=unknown) | |||
vec_dict = {} | |||
found_unknown = False | |||
found_pad = False | |||
with open(embed_filepath, 'r', encoding='utf-8') as f: | |||
line = f.readline() | |||
start = 1 | |||
dim = -1 | |||
if len(line.strip().split()) != 2: | |||
f.seek(0) | |||
for line in f: | |||
word, vector = EmbedLoader.parse_glove_line(line) | |||
if word in vocab: | |||
vector = EmbedLoader.str_list_2_vec(vector) | |||
if len(vector.shape) > 1 or emb_dim != vector.shape[0]: | |||
raise ValueError("Pre-trained embedding dim is {}. Expect {}.".format(vector.shape, (emb_dim,))) | |||
embedding_matrix[vocab[word]] = vector | |||
hit_flags[vocab[word]] = 1 | |||
if np.sum(hit_flags) < len(vocab): | |||
# some words from vocab are missing in pre-trained embedding | |||
# we normally sample each dimension | |||
vocab_embed = embedding_matrix[np.where(hit_flags)] | |||
sampled_vectors = np.random.normal(vocab_embed.mean(axis=0), vocab_embed.std(axis=0), | |||
size=(len(vocab) - np.sum(hit_flags), emb_dim)) | |||
embedding_matrix[np.where(1 - hit_flags)] = sampled_vectors | |||
return embedding_matrix | |||
start = 0 | |||
for idx, line in enumerate(f, start=start): | |||
try: | |||
parts = line.strip().split() | |||
word = parts[0] | |||
if dim == -1: | |||
dim = len(parts) - 1 | |||
vec = np.fromstring(' '.join(parts[1:]), sep=' ', dtype=dtype, count=dim) | |||
vec_dict[word] = vec | |||
vocab.add_word(word) | |||
if unknown is not None and unknown == word: | |||
found_unknown = True | |||
if found_pad is not None and padding == word: | |||
found_pad = True | |||
except Exception as e: | |||
if error == 'ignore': | |||
warnings.warn("Error occurred at the {} line.".format(idx)) | |||
pass | |||
else: | |||
print("Error occurred at the {} line.".format(idx)) | |||
raise e | |||
if dim == -1: | |||
raise RuntimeError("{} is an empty file.".format(embed_filepath)) | |||
matrix = np.random.randn(len(vocab), dim).astype(dtype) | |||
if (unknown is not None and not found_unknown) or (padding is not None and not found_pad): | |||
start_idx = 0 | |||
if padding is not None: | |||
start_idx += 1 | |||
if unknown is not None: | |||
start_idx += 1 | |||
mean = np.mean(matrix[start_idx:], axis=0, keepdims=True) | |||
std = np.std(matrix[start_idx:], axis=0, keepdims=True) | |||
if (unknown is not None and not found_unknown): | |||
matrix[start_idx - 1] = np.random.randn(1, dim).astype(dtype) * std + mean | |||
if (padding is not None and not found_pad): | |||
matrix[0] = np.random.randn(1, dim).astype(dtype) * std + mean | |||
for key, vec in vec_dict.items(): | |||
index = vocab.to_index(key) | |||
matrix[index] = vec | |||
if normalize: | |||
matrix /= np.linalg.norm(matrix, axis=1, keepdims=True) | |||
return matrix, vocab |
@@ -0,0 +1,118 @@ | |||
""" | |||
此模块用于给其它模块提供读取文件的函数,没有为用户提供 API | |||
""" | |||
import json | |||
def _read_csv(path, encoding='utf-8', headers=None, sep=',', dropna=True): | |||
""" | |||
Construct a generator to read csv items. | |||
:param path: file path | |||
:param encoding: file's encoding, default: utf-8 | |||
:param headers: file's headers, if None, make file's first line as headers. default: None | |||
:param sep: separator for each column. default: ',' | |||
:param dropna: weather to ignore and drop invalid data, | |||
:if False, raise ValueError when reading invalid data. default: True | |||
:return: generator, every time yield (line number, csv item) | |||
""" | |||
with open(path, 'r', encoding=encoding) as f: | |||
start_idx = 0 | |||
if headers is None: | |||
headers = f.readline().rstrip('\r\n') | |||
headers = headers.split(sep) | |||
start_idx += 1 | |||
elif not isinstance(headers, (list, tuple)): | |||
raise TypeError("headers should be list or tuple, not {}." \ | |||
.format(type(headers))) | |||
for line_idx, line in enumerate(f, start_idx): | |||
contents = line.rstrip('\r\n').split(sep) | |||
if len(contents) != len(headers): | |||
if dropna: | |||
continue | |||
else: | |||
raise ValueError("Line {} has {} parts, while header has {} parts." \ | |||
.format(line_idx, len(contents), len(headers))) | |||
_dict = {} | |||
for header, content in zip(headers, contents): | |||
_dict[header] = content | |||
yield line_idx, _dict | |||
def _read_json(path, encoding='utf-8', fields=None, dropna=True): | |||
""" | |||
Construct a generator to read json items. | |||
:param path: file path | |||
:param encoding: file's encoding, default: utf-8 | |||
:param fields: json object's fields that needed, if None, all fields are needed. default: None | |||
:param dropna: weather to ignore and drop invalid data, | |||
:if False, raise ValueError when reading invalid data. default: True | |||
:return: generator, every time yield (line number, json item) | |||
""" | |||
if fields: | |||
fields = set(fields) | |||
with open(path, 'r', encoding=encoding) as f: | |||
for line_idx, line in enumerate(f): | |||
data = json.loads(line) | |||
if fields is None: | |||
yield line_idx, data | |||
continue | |||
_res = {} | |||
for k, v in data.items(): | |||
if k in fields: | |||
_res[k] = v | |||
if len(_res) < len(fields): | |||
if dropna: | |||
continue | |||
else: | |||
raise ValueError('invalid instance at line: {}'.format(line_idx)) | |||
yield line_idx, _res | |||
def _read_conll(path, encoding='utf-8', indexes=None, dropna=True): | |||
""" | |||
Construct a generator to read conll items. | |||
:param path: file path | |||
:param encoding: file's encoding, default: utf-8 | |||
:param indexes: conll object's column indexes that needed, if None, all columns are needed. default: None | |||
:param dropna: weather to ignore and drop invalid data, | |||
:if False, raise ValueError when reading invalid data. default: True | |||
:return: generator, every time yield (line number, conll item) | |||
""" | |||
def parse_conll(sample): | |||
sample = list(map(list, zip(*sample))) | |||
sample = [sample[i] for i in indexes] | |||
for f in sample: | |||
if len(f) <= 0: | |||
raise ValueError('empty field') | |||
return sample | |||
with open(path, 'r', encoding=encoding) as f: | |||
sample = [] | |||
start = next(f) | |||
if '-DOCSTART-' not in start: | |||
sample.append(start.split()) | |||
for line_idx, line in enumerate(f, 1): | |||
if line.startswith('\n'): | |||
if len(sample): | |||
try: | |||
res = parse_conll(sample) | |||
sample = [] | |||
yield line_idx, res | |||
except Exception as e: | |||
if dropna: | |||
continue | |||
raise ValueError('invalid instance at line: {}'.format(line_idx)) | |||
elif line.startswith('#'): | |||
continue | |||
else: | |||
sample.append(line.split()) | |||
if len(sample) > 0: | |||
try: | |||
res = parse_conll(sample) | |||
yield line_idx, res | |||
except Exception as e: | |||
if dropna: | |||
return | |||
raise ValueError('invalid instance at line: {}'.format(line_idx)) |
@@ -1,35 +0,0 @@ | |||
import logging | |||
import os | |||
def create_logger(logger_name, log_path, log_format=None, log_level=logging.INFO): | |||
"""Create a logger. | |||
:param str logger_name: | |||
:param str log_path: | |||
:param log_format: | |||
:param log_level: | |||
:return: logger | |||
To use a logger:: | |||
logger.debug("this is a debug message") | |||
logger.info("this is a info message") | |||
logger.warning("this is a warning message") | |||
logger.error("this is an error message") | |||
""" | |||
logger = logging.getLogger(logger_name) | |||
logger.setLevel(log_level) | |||
if log_path is None: | |||
handler = logging.StreamHandler() | |||
else: | |||
os.stat(os.path.dirname(os.path.abspath(log_path))) | |||
handler = logging.FileHandler(log_path) | |||
handler.setLevel(log_level) | |||
if log_format is None: | |||
log_format = "[%(asctime)s %(name)-13s %(levelname)s %(process)d %(thread)d " \ | |||
"%(filename)s:%(lineno)-5d] %(message)s" | |||
formatter = logging.Formatter(log_format) | |||
handler.setFormatter(formatter) | |||
logger.addHandler(handler) | |||
return logger |
@@ -1,53 +1,72 @@ | |||
""" | |||
用于载入和保存模型 | |||
""" | |||
__all__ = [ | |||
"ModelLoader", | |||
"ModelSaver" | |||
] | |||
import torch | |||
from fastNLP.io.base_loader import BaseLoader | |||
from .base_loader import BaseLoader | |||
class ModelLoader(BaseLoader): | |||
""" | |||
Loader for models. | |||
""" | |||
别名::class:`fastNLP.io.ModelLoader` :class:`fastNLP.io.model_io.ModelLoader` | |||
用于读取模型 | |||
""" | |||
def __init__(self): | |||
super(ModelLoader, self).__init__() | |||
@staticmethod | |||
def load_pytorch(empty_model, model_path): | |||
"""Load model parameters from ".pkl" files into the empty PyTorch model. | |||
""" | |||
从 ".pkl" 文件读取 PyTorch 模型 | |||
:param empty_model: a PyTorch model with initialized parameters. | |||
:param str model_path: the path to the saved model. | |||
:param empty_model: 初始化参数的 PyTorch 模型 | |||
:param str model_path: 模型保存的路径 | |||
""" | |||
empty_model.load_state_dict(torch.load(model_path)) | |||
@staticmethod | |||
def load_pytorch_model(model_path): | |||
"""Load the entire model. | |||
""" | |||
读取整个模型 | |||
:param str model_path: the path to the saved model. | |||
:param str model_path: 模型保存的路径 | |||
""" | |||
return torch.load(model_path) | |||
class ModelSaver(object): | |||
"""Save a model | |||
""" | |||
别名::class:`fastNLP.io.ModelSaver` :class:`fastNLP.io.model_io.ModelSaver` | |||
:param str save_path: the path to the saving directory. | |||
Example:: | |||
用于保存模型 | |||
Example:: | |||
saver = ModelSaver("./save/model_ckpt_100.pkl") | |||
saver.save_pytorch(model) | |||
saver = ModelSaver("./save/model_ckpt_100.pkl") | |||
saver.save_pytorch(model) | |||
""" | |||
def __init__(self, save_path): | |||
self.save_path = save_path | |||
""" | |||
:param save_path: 模型保存的路径 | |||
""" | |||
self.save_path = save_path | |||
def save_pytorch(self, model, param_only=True): | |||
"""Save a pytorch model into ".pkl" file. | |||
""" | |||
把 PyTorch 模型存入 ".pkl" 文件 | |||
:param model: a PyTorch model | |||
:param bool param_only: whether only to save the model parameters or the entire model. | |||
:param model: PyTorch 模型 | |||
:param bool param_only: 是否只保存模型的参数(否则保存整个模型) | |||
""" | |||
if param_only is True: | |||
@@ -1,6 +1,34 @@ | |||
""" | |||
fastNLP 在 :mod:`~fastNLP.models` 模块中内置了如 :class:`~fastNLP.models.CNNText` 、 | |||
:class:`~fastNLP.models.SeqLabeling` 等完整的模型,以供用户直接使用。 | |||
.. todo:: | |||
这些模型的介绍(与主页一致) | |||
""" | |||
__all__ = [ | |||
"CNNText", | |||
"SeqLabeling", | |||
"AdvSeqLabel", | |||
"ESIM", | |||
"StarTransEnc", | |||
"STSeqLabel", | |||
"STNLICls", | |||
"STSeqCls", | |||
"BiaffineParser", | |||
"GraphParser" | |||
] | |||
from .base_model import BaseModel | |||
from .bert import BertForMultipleChoice, BertForQuestionAnswering, BertForSequenceClassification, \ | |||
BertForTokenClassification | |||
from .biaffine_parser import BiaffineParser, GraphParser | |||
from .char_language_model import CharLM | |||
from .cnn_text_classification import CNNText | |||
from .sequence_modeling import SeqLabeling, AdvSeqLabel | |||
from .sequence_labeling import SeqLabeling, AdvSeqLabel | |||
from .snli import ESIM | |||
from .star_transformer import StarTransEnc, STSeqCls, STNLICls, STSeqLabel |
@@ -1,18 +1,18 @@ | |||
import torch | |||
from fastNLP.modules.decoder.MLP import MLP | |||
from ..modules.decoder.mlp import MLP | |||
class BaseModel(torch.nn.Module): | |||
"""Base PyTorch model for all models. | |||
""" | |||
def __init__(self): | |||
super(BaseModel, self).__init__() | |||
def fit(self, train_data, dev_data=None, **train_args): | |||
pass | |||
def predict(self, *args, **kwargs): | |||
raise NotImplementedError | |||
@@ -21,9 +21,9 @@ class NaiveClassifier(BaseModel): | |||
def __init__(self, in_feature_dim, out_feature_dim): | |||
super(NaiveClassifier, self).__init__() | |||
self.mlp = MLP([in_feature_dim, in_feature_dim, out_feature_dim]) | |||
def forward(self, x): | |||
return {"predict": torch.sigmoid(self.mlp(x))} | |||
def predict(self, x): | |||
return {"predict": torch.sigmoid(self.mlp(x)) > 0.5} |
@@ -2,361 +2,292 @@ | |||
bert.py is modified from huggingface/pytorch-pretrained-BERT, which is licensed under the Apache License 2.0. | |||
""" | |||
import copy | |||
import json | |||
import math | |||
import os | |||
import torch | |||
from torch import nn | |||
CONFIG_FILE = 'bert_config.json' | |||
MODEL_WEIGHTS = 'pytorch_model.bin' | |||
def gelu(x): | |||
return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) | |||
def swish(x): | |||
return x * torch.sigmoid(x) | |||
ACT2FN = {"gelu": gelu, "relu": torch.nn.functional.relu, "swish": swish} | |||
class BertLayerNorm(nn.Module): | |||
def __init__(self, hidden_size, eps=1e-12): | |||
super(BertLayerNorm, self).__init__() | |||
self.weight = nn.Parameter(torch.ones(hidden_size)) | |||
self.bias = nn.Parameter(torch.zeros(hidden_size)) | |||
self.variance_epsilon = eps | |||
def forward(self, x): | |||
u = x.mean(-1, keepdim=True) | |||
s = (x - u).pow(2).mean(-1, keepdim=True) | |||
x = (x - u) / torch.sqrt(s + self.variance_epsilon) | |||
return self.weight * x + self.bias | |||
class BertEmbeddings(nn.Module): | |||
def __init__(self, vocab_size, hidden_size, max_position_embeddings, type_vocab_size, hidden_dropout_prob): | |||
super(BertEmbeddings, self).__init__() | |||
self.word_embeddings = nn.Embedding(vocab_size, hidden_size) | |||
self.position_embeddings = nn.Embedding(max_position_embeddings, hidden_size) | |||
self.token_type_embeddings = nn.Embedding(type_vocab_size, hidden_size) | |||
# self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load | |||
# any TensorFlow checkpoint file | |||
self.LayerNorm = BertLayerNorm(hidden_size, eps=1e-12) | |||
self.dropout = nn.Dropout(hidden_dropout_prob) | |||
def forward(self, input_ids, token_type_ids=None): | |||
seq_length = input_ids.size(1) | |||
position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device) | |||
position_ids = position_ids.unsqueeze(0).expand_as(input_ids) | |||
if token_type_ids is None: | |||
token_type_ids = torch.zeros_like(input_ids) | |||
words_embeddings = self.word_embeddings(input_ids) | |||
position_embeddings = self.position_embeddings(position_ids) | |||
token_type_embeddings = self.token_type_embeddings(token_type_ids) | |||
embeddings = words_embeddings + position_embeddings + token_type_embeddings | |||
embeddings = self.LayerNorm(embeddings) | |||
embeddings = self.dropout(embeddings) | |||
return embeddings | |||
class BertSelfAttention(nn.Module): | |||
def __init__(self, hidden_size, num_attention_heads, attention_probs_dropout_prob): | |||
super(BertSelfAttention, self).__init__() | |||
if hidden_size % num_attention_heads != 0: | |||
raise ValueError( | |||
"The hidden size (%d) is not a multiple of the number of attention " | |||
"heads (%d)" % (hidden_size, num_attention_heads)) | |||
self.num_attention_heads = num_attention_heads | |||
self.attention_head_size = int(hidden_size / num_attention_heads) | |||
self.all_head_size = self.num_attention_heads * self.attention_head_size | |||
self.query = nn.Linear(hidden_size, self.all_head_size) | |||
self.key = nn.Linear(hidden_size, self.all_head_size) | |||
self.value = nn.Linear(hidden_size, self.all_head_size) | |||
self.dropout = nn.Dropout(attention_probs_dropout_prob) | |||
def transpose_for_scores(self, x): | |||
new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) | |||
x = x.view(*new_x_shape) | |||
return x.permute(0, 2, 1, 3) | |||
def forward(self, hidden_states, attention_mask): | |||
mixed_query_layer = self.query(hidden_states) | |||
mixed_key_layer = self.key(hidden_states) | |||
mixed_value_layer = self.value(hidden_states) | |||
query_layer = self.transpose_for_scores(mixed_query_layer) | |||
key_layer = self.transpose_for_scores(mixed_key_layer) | |||
value_layer = self.transpose_for_scores(mixed_value_layer) | |||
# Take the dot product between "query" and "key" to get the raw attention scores. | |||
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) | |||
attention_scores = attention_scores / math.sqrt(self.attention_head_size) | |||
# Apply the attention mask is (precomputed for all layers in BertModel forward() function) | |||
attention_scores = attention_scores + attention_mask | |||
# Normalize the attention scores to probabilities. | |||
attention_probs = nn.Softmax(dim=-1)(attention_scores) | |||
# This is actually dropping out entire tokens to attend to, which might | |||
# seem a bit unusual, but is taken from the original Transformer paper. | |||
attention_probs = self.dropout(attention_probs) | |||
context_layer = torch.matmul(attention_probs, value_layer) | |||
context_layer = context_layer.permute(0, 2, 1, 3).contiguous() | |||
new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,) | |||
context_layer = context_layer.view(*new_context_layer_shape) | |||
return context_layer | |||
class BertSelfOutput(nn.Module): | |||
def __init__(self, hidden_size, hidden_dropout_prob): | |||
super(BertSelfOutput, self).__init__() | |||
self.dense = nn.Linear(hidden_size, hidden_size) | |||
self.LayerNorm = BertLayerNorm(hidden_size, eps=1e-12) | |||
self.dropout = nn.Dropout(hidden_dropout_prob) | |||
def forward(self, hidden_states, input_tensor): | |||
hidden_states = self.dense(hidden_states) | |||
hidden_states = self.dropout(hidden_states) | |||
hidden_states = self.LayerNorm(hidden_states + input_tensor) | |||
return hidden_states | |||
class BertAttention(nn.Module): | |||
def __init__(self, hidden_size, num_attention_heads, attention_probs_dropout_prob, hidden_dropout_prob): | |||
super(BertAttention, self).__init__() | |||
self.self = BertSelfAttention(hidden_size, num_attention_heads, attention_probs_dropout_prob) | |||
self.output = BertSelfOutput(hidden_size, hidden_dropout_prob) | |||
def forward(self, input_tensor, attention_mask): | |||
self_output = self.self(input_tensor, attention_mask) | |||
attention_output = self.output(self_output, input_tensor) | |||
return attention_output | |||
class BertIntermediate(nn.Module): | |||
def __init__(self, hidden_size, intermediate_size, hidden_act): | |||
super(BertIntermediate, self).__init__() | |||
self.dense = nn.Linear(hidden_size, intermediate_size) | |||
self.intermediate_act_fn = ACT2FN[hidden_act] \ | |||
if isinstance(hidden_act, str) else hidden_act | |||
def forward(self, hidden_states): | |||
hidden_states = self.dense(hidden_states) | |||
hidden_states = self.intermediate_act_fn(hidden_states) | |||
return hidden_states | |||
class BertOutput(nn.Module): | |||
def __init__(self, hidden_size, intermediate_size, hidden_dropout_prob): | |||
super(BertOutput, self).__init__() | |||
self.dense = nn.Linear(intermediate_size, hidden_size) | |||
self.LayerNorm = BertLayerNorm(hidden_size, eps=1e-12) | |||
self.dropout = nn.Dropout(hidden_dropout_prob) | |||
def forward(self, hidden_states, input_tensor): | |||
hidden_states = self.dense(hidden_states) | |||
hidden_states = self.dropout(hidden_states) | |||
hidden_states = self.LayerNorm(hidden_states + input_tensor) | |||
return hidden_states | |||
class BertLayer(nn.Module): | |||
def __init__(self, hidden_size, num_attention_heads, attention_probs_dropout_prob, hidden_dropout_prob, | |||
intermediate_size, hidden_act): | |||
super(BertLayer, self).__init__() | |||
self.attention = BertAttention(hidden_size, num_attention_heads, attention_probs_dropout_prob, | |||
hidden_dropout_prob) | |||
self.intermediate = BertIntermediate(hidden_size, intermediate_size, hidden_act) | |||
self.output = BertOutput(hidden_size, intermediate_size, hidden_dropout_prob) | |||
def forward(self, hidden_states, attention_mask): | |||
attention_output = self.attention(hidden_states, attention_mask) | |||
intermediate_output = self.intermediate(attention_output) | |||
layer_output = self.output(intermediate_output, attention_output) | |||
return layer_output | |||
class BertEncoder(nn.Module): | |||
def __init__(self, num_hidden_layers, hidden_size, num_attention_heads, attention_probs_dropout_prob, | |||
hidden_dropout_prob, | |||
intermediate_size, hidden_act): | |||
super(BertEncoder, self).__init__() | |||
layer = BertLayer(hidden_size, num_attention_heads, attention_probs_dropout_prob, hidden_dropout_prob, | |||
intermediate_size, hidden_act) | |||
self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(num_hidden_layers)]) | |||
def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True): | |||
all_encoder_layers = [] | |||
for layer_module in self.layer: | |||
hidden_states = layer_module(hidden_states, attention_mask) | |||
if output_all_encoded_layers: | |||
all_encoder_layers.append(hidden_states) | |||
if not output_all_encoded_layers: | |||
all_encoder_layers.append(hidden_states) | |||
return all_encoder_layers | |||
class BertPooler(nn.Module): | |||
def __init__(self, hidden_size): | |||
super(BertPooler, self).__init__() | |||
self.dense = nn.Linear(hidden_size, hidden_size) | |||
self.activation = nn.Tanh() | |||
def forward(self, hidden_states): | |||
# We "pool" the model by simply taking the hidden state corresponding | |||
# to the first token. | |||
first_token_tensor = hidden_states[:, 0] | |||
pooled_output = self.dense(first_token_tensor) | |||
pooled_output = self.activation(pooled_output) | |||
return pooled_output | |||
class BertModel(nn.Module): | |||
"""Bidirectional Embedding Representations from Transformers. | |||
If you want to use pre-trained weights, please download from the following sources provided by pytorch-pretrained-BERT. | |||
sources:: | |||
'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz", | |||
'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz", | |||
'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz", | |||
'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz", | |||
'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz", | |||
'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased.tar.gz", | |||
'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz", | |||
Construct a BERT model with pre-trained weights:: | |||
model = BertModel.from_pretrained("path/to/weights/directory") | |||
from .base_model import BaseModel | |||
from ..core.const import Const | |||
from ..modules.encoder import BertModel | |||
class BertForSequenceClassification(BaseModel): | |||
"""BERT model for classification. | |||
This module is composed of the BERT model with a linear layer on top of | |||
the pooled output. | |||
Params: | |||
`config`: a BertConfig class instance with the configuration to build a new model. | |||
`num_labels`: the number of classes for the classifier. Default = 2. | |||
Inputs: | |||
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] | |||
with the word token indices in the vocabulary. Items in the batch should begin with the special "CLS" token. (see the tokens preprocessing logic in the scripts | |||
`extract_features.py`, `run_classifier.py` and `run_squad.py`) | |||
`token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token | |||
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to | |||
a `sentence B` token (see BERT paper for more details). | |||
`attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices | |||
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max | |||
input sequence length in the current batch. It's the mask that we typically use for attention when | |||
a batch has varying length sentences. | |||
`labels`: labels for the classification output: torch.LongTensor of shape [batch_size] | |||
with indices selected in [0, ..., num_labels]. | |||
Outputs: | |||
if `labels` is not `None`: | |||
Outputs the CrossEntropy classification loss of the output with the labels. | |||
if `labels` is `None`: | |||
Outputs the classification logits of shape [batch_size, num_labels]. | |||
Example usage: | |||
```python | |||
# Already been converted into WordPiece token ids | |||
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) | |||
input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) | |||
token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) | |||
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, | |||
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) | |||
num_labels = 2 | |||
model = BertForSequenceClassification(config, num_labels) | |||
logits = model(input_ids, token_type_ids, input_mask) | |||
``` | |||
""" | |||
def __init__(self, vocab_size, | |||
hidden_size=768, | |||
num_hidden_layers=12, | |||
num_attention_heads=12, | |||
intermediate_size=3072, | |||
hidden_act="gelu", | |||
hidden_dropout_prob=0.1, | |||
attention_probs_dropout_prob=0.1, | |||
max_position_embeddings=512, | |||
type_vocab_size=2, | |||
initializer_range=0.02, **kwargs): | |||
super(BertModel, self).__init__() | |||
self.embeddings = BertEmbeddings(vocab_size, hidden_size, max_position_embeddings, | |||
type_vocab_size, hidden_dropout_prob) | |||
self.encoder = BertEncoder(num_hidden_layers, hidden_size, num_attention_heads, | |||
attention_probs_dropout_prob, hidden_dropout_prob, intermediate_size, | |||
hidden_act) | |||
self.pooler = BertPooler(hidden_size) | |||
self.initializer_range = initializer_range | |||
self.apply(self.init_bert_weights) | |||
def init_bert_weights(self, module): | |||
if isinstance(module, (nn.Linear, nn.Embedding)): | |||
# Slightly different from the TF version which uses truncated_normal for initialization | |||
# cf https://github.com/pytorch/pytorch/pull/5617 | |||
module.weight.data.normal_(mean=0.0, std=self.initializer_range) | |||
elif isinstance(module, BertLayerNorm): | |||
module.bias.data.zero_() | |||
module.weight.data.fill_(1.0) | |||
if isinstance(module, nn.Linear) and module.bias is not None: | |||
module.bias.data.zero_() | |||
def forward(self, input_ids, token_type_ids=None, attention_mask=None, output_all_encoded_layers=True): | |||
if attention_mask is None: | |||
attention_mask = torch.ones_like(input_ids) | |||
if token_type_ids is None: | |||
token_type_ids = torch.zeros_like(input_ids) | |||
# We create a 3D attention mask from a 2D tensor mask. | |||
# Sizes are [batch_size, 1, 1, to_seq_length] | |||
# So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length] | |||
# this attention mask is more simple than the triangular masking of causal attention | |||
# used in OpenAI GPT, we just need to prepare the broadcast dimension here. | |||
extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) | |||
# Since attention_mask is 1.0 for positions we want to attend and 0.0 for | |||
# masked positions, this operation will create a tensor which is 0.0 for | |||
# positions we want to attend and -10000.0 for masked positions. | |||
# Since we are adding it to the raw scores before the softmax, this is | |||
# effectively the same as removing these entirely. | |||
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility | |||
extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0 | |||
embedding_output = self.embeddings(input_ids, token_type_ids) | |||
encoded_layers = self.encoder(embedding_output, | |||
extended_attention_mask, | |||
output_all_encoded_layers=output_all_encoded_layers) | |||
sequence_output = encoded_layers[-1] | |||
pooled_output = self.pooler(sequence_output) | |||
if not output_all_encoded_layers: | |||
encoded_layers = encoded_layers[-1] | |||
return encoded_layers, pooled_output | |||
@classmethod | |||
def from_pretrained(cls, pretrained_model_dir, state_dict=None, *inputs, **kwargs): | |||
# Load config | |||
config_file = os.path.join(pretrained_model_dir, CONFIG_FILE) | |||
config = json.load(open(config_file, "r")) | |||
# config = BertConfig.from_json_file(config_file) | |||
# logger.info("Model config {}".format(config)) | |||
# Instantiate model. | |||
model = cls(*inputs, **config, **kwargs) | |||
if state_dict is None: | |||
weights_path = os.path.join(pretrained_model_dir, MODEL_WEIGHTS) | |||
state_dict = torch.load(weights_path) | |||
old_keys = [] | |||
new_keys = [] | |||
for key in state_dict.keys(): | |||
new_key = None | |||
if 'gamma' in key: | |||
new_key = key.replace('gamma', 'weight') | |||
if 'beta' in key: | |||
new_key = key.replace('beta', 'bias') | |||
if new_key: | |||
old_keys.append(key) | |||
new_keys.append(new_key) | |||
for old_key, new_key in zip(old_keys, new_keys): | |||
state_dict[new_key] = state_dict.pop(old_key) | |||
missing_keys = [] | |||
unexpected_keys = [] | |||
error_msgs = [] | |||
# copy state_dict so _load_from_state_dict can modify it | |||
metadata = getattr(state_dict, '_metadata', None) | |||
state_dict = state_dict.copy() | |||
if metadata is not None: | |||
state_dict._metadata = metadata | |||
def load(module, prefix=''): | |||
local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {}) | |||
module._load_from_state_dict( | |||
state_dict, prefix, local_metadata, True, missing_keys, unexpected_keys, error_msgs) | |||
for name, child in module._modules.items(): | |||
if child is not None: | |||
load(child, prefix + name + '.') | |||
load(model, prefix='' if hasattr(model, 'bert') else 'bert.') | |||
if len(missing_keys) > 0: | |||
print("Weights of {} not initialized from pretrained model: {}".format( | |||
model.__class__.__name__, missing_keys)) | |||
if len(unexpected_keys) > 0: | |||
print("Weights from pretrained model not used in {}: {}".format( | |||
model.__class__.__name__, unexpected_keys)) | |||
return model | |||
def __init__(self, config, num_labels, bert_dir): | |||
super(BertForSequenceClassification, self).__init__() | |||
self.num_labels = num_labels | |||
self.bert = BertModel.from_pretrained(bert_dir) | |||
self.dropout = nn.Dropout(config.hidden_dropout_prob) | |||
self.classifier = nn.Linear(config.hidden_size, num_labels) | |||
def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None): | |||
_, pooled_output = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) | |||
pooled_output = self.dropout(pooled_output) | |||
logits = self.classifier(pooled_output) | |||
if labels is not None: | |||
loss_fct = nn.CrossEntropyLoss() | |||
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) | |||
return {Const.OUTPUT: logits, Const.LOSS: loss} | |||
else: | |||
return {Const.OUTPUT: logits} | |||
def predict(self, input_ids, token_type_ids=None, attention_mask=None): | |||
logits = self.forward(input_ids, token_type_ids, attention_mask) | |||
return {Const.OUTPUT: torch.argmax(logits, dim=-1)} | |||
class BertForMultipleChoice(BaseModel): | |||
"""BERT model for multiple choice tasks. | |||
This module is composed of the BERT model with a linear layer on top of | |||
the pooled output. | |||
Params: | |||
`config`: a BertConfig class instance with the configuration to build a new model. | |||
`num_choices`: the number of classes for the classifier. Default = 2. | |||
Inputs: | |||
`input_ids`: a torch.LongTensor of shape [batch_size, num_choices, sequence_length] | |||
with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts | |||
`extract_features.py`, `run_classifier.py` and `run_squad.py`) | |||
`token_type_ids`: an optional torch.LongTensor of shape [batch_size, num_choices, sequence_length] | |||
with the token types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` | |||
and type 1 corresponds to a `sentence B` token (see BERT paper for more details). | |||
`attention_mask`: an optional torch.LongTensor of shape [batch_size, num_choices, sequence_length] with indices | |||
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max | |||
input sequence length in the current batch. It's the mask that we typically use for attention when | |||
a batch has varying length sentences. | |||
`labels`: labels for the classification output: torch.LongTensor of shape [batch_size] | |||
with indices selected in [0, ..., num_choices]. | |||
Outputs: | |||
if `labels` is not `None`: | |||
Outputs the CrossEntropy classification loss of the output with the labels. | |||
if `labels` is `None`: | |||
Outputs the classification logits of shape [batch_size, num_labels]. | |||
Example usage: | |||
```python | |||
# Already been converted into WordPiece token ids | |||
input_ids = torch.LongTensor([[[31, 51, 99], [15, 5, 0]], [[12, 16, 42], [14, 28, 57]]]) | |||
input_mask = torch.LongTensor([[[1, 1, 1], [1, 1, 0]],[[1,1,0], [1, 0, 0]]]) | |||
token_type_ids = torch.LongTensor([[[0, 0, 1], [0, 1, 0]],[[0, 1, 1], [0, 0, 1]]]) | |||
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, | |||
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) | |||
num_choices = 2 | |||
model = BertForMultipleChoice(config, num_choices, bert_dir) | |||
logits = model(input_ids, token_type_ids, input_mask) | |||
``` | |||
""" | |||
def __init__(self, config, num_choices, bert_dir): | |||
super(BertForMultipleChoice, self).__init__() | |||
self.num_choices = num_choices | |||
self.bert = BertModel.from_pretrained(bert_dir) | |||
self.dropout = nn.Dropout(config.hidden_dropout_prob) | |||
self.classifier = nn.Linear(config.hidden_size, 1) | |||
def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None): | |||
flat_input_ids = input_ids.view(-1, input_ids.size(-1)) | |||
flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) | |||
flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1)) | |||
_, pooled_output = self.bert(flat_input_ids, flat_token_type_ids, flat_attention_mask, output_all_encoded_layers=False) | |||
pooled_output = self.dropout(pooled_output) | |||
logits = self.classifier(pooled_output) | |||
reshaped_logits = logits.view(-1, self.num_choices) | |||
if labels is not None: | |||
loss_fct = nn.CrossEntropyLoss() | |||
loss = loss_fct(reshaped_logits, labels) | |||
return {Const.OUTPUT: reshaped_logits, Const.LOSS: loss} | |||
else: | |||
return {Const.OUTPUT: reshaped_logits} | |||
def predict(self, input_ids, token_type_ids=None, attention_mask=None): | |||
logits = self.forward(input_ids, token_type_ids, attention_mask)[Const.OUTPUT] | |||
return {Const.OUTPUT: torch.argmax(logits, dim=-1)} | |||
class BertForTokenClassification(BaseModel): | |||
"""BERT model for token-level classification. | |||
This module is composed of the BERT model with a linear layer on top of | |||
the full hidden state of the last layer. | |||
Params: | |||
`config`: a BertConfig class instance with the configuration to build a new model. | |||
`num_labels`: the number of classes for the classifier. Default = 2. | |||
`bert_dir`: a dir which contains the bert parameters within file `pytorch_model.bin` | |||
Inputs: | |||
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] | |||
with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts | |||
`extract_features.py`, `run_classifier.py` and `run_squad.py`) | |||
`token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token | |||
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to | |||
a `sentence B` token (see BERT paper for more details). | |||
`attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices | |||
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max | |||
input sequence length in the current batch. It's the mask that we typically use for attention when | |||
a batch has varying length sentences. | |||
`labels`: labels for the classification output: torch.LongTensor of shape [batch_size, sequence_length] | |||
with indices selected in [0, ..., num_labels]. | |||
Outputs: | |||
if `labels` is not `None`: | |||
Outputs the CrossEntropy classification loss of the output with the labels. | |||
if `labels` is `None`: | |||
Outputs the classification logits of shape [batch_size, sequence_length, num_labels]. | |||
Example usage: | |||
```python | |||
# Already been converted into WordPiece token ids | |||
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) | |||
input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) | |||
token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) | |||
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, | |||
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) | |||
num_labels = 2 | |||
bert_dir = 'your-bert-file-dir' | |||
model = BertForTokenClassification(config, num_labels, bert_dir) | |||
logits = model(input_ids, token_type_ids, input_mask) | |||
``` | |||
""" | |||
def __init__(self, config, num_labels, bert_dir): | |||
super(BertForTokenClassification, self).__init__() | |||
self.num_labels = num_labels | |||
self.bert = BertModel.from_pretrained(bert_dir) | |||
self.dropout = nn.Dropout(config.hidden_dropout_prob) | |||
self.classifier = nn.Linear(config.hidden_size, num_labels) | |||
def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None): | |||
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) | |||
sequence_output = self.dropout(sequence_output) | |||
logits = self.classifier(sequence_output) | |||
if labels is not None: | |||
loss_fct = nn.CrossEntropyLoss() | |||
# Only keep active parts of the loss | |||
if attention_mask is not None: | |||
active_loss = attention_mask.view(-1) == 1 | |||
active_logits = logits.view(-1, self.num_labels)[active_loss] | |||
active_labels = labels.view(-1)[active_loss] | |||
loss = loss_fct(active_logits, active_labels) | |||
else: | |||
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) | |||
return {Const.OUTPUT: logits, Const.LOSS: loss} | |||
else: | |||
return {Const.OUTPUT: logits} | |||
def predict(self, input_ids, token_type_ids=None, attention_mask=None): | |||
logits = self.forward(input_ids, token_type_ids, attention_mask)[Const.OUTPUT] | |||
return {Const.OUTPUT: torch.argmax(logits, dim=-1)} | |||
class BertForQuestionAnswering(BaseModel): | |||
"""BERT model for Question Answering (span extraction). | |||
This module is composed of the BERT model with a linear layer on top of | |||
the sequence output that computes start_logits and end_logits | |||
Params: | |||
`config`: a BertConfig class instance with the configuration to build a new model. | |||
`bert_dir`: a dir which contains the bert parameters within file `pytorch_model.bin` | |||
Inputs: | |||
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] | |||
with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts | |||
`extract_features.py`, `run_classifier.py` and `run_squad.py`) | |||
`token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token | |||
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to | |||
a `sentence B` token (see BERT paper for more details). | |||
`attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices | |||
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max | |||
input sequence length in the current batch. It's the mask that we typically use for attention when | |||
a batch has varying length sentences. | |||
`start_positions`: position of the first token for the labeled span: torch.LongTensor of shape [batch_size]. | |||
Positions are clamped to the length of the sequence and position outside of the sequence are not taken | |||
into account for computing the loss. | |||
`end_positions`: position of the last token for the labeled span: torch.LongTensor of shape [batch_size]. | |||
Positions are clamped to the length of the sequence and position outside of the sequence are not taken | |||
into account for computing the loss. | |||
Outputs: | |||
if `start_positions` and `end_positions` are not `None`: | |||
Outputs the total_loss which is the sum of the CrossEntropy loss for the start and end token positions. | |||
if `start_positions` or `end_positions` is `None`: | |||
Outputs a tuple of start_logits, end_logits which are the logits respectively for the start and end | |||
position tokens of shape [batch_size, sequence_length]. | |||
Example usage: | |||
```python | |||
# Already been converted into WordPiece token ids | |||
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) | |||
input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) | |||
token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) | |||
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, | |||
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) | |||
bert_dir = 'your-bert-file-dir' | |||
model = BertForQuestionAnswering(config, bert_dir) | |||
start_logits, end_logits = model(input_ids, token_type_ids, input_mask) | |||
``` | |||
""" | |||
def __init__(self, config, bert_dir): | |||
super(BertForQuestionAnswering, self).__init__() | |||
self.bert = BertModel.from_pretrained(bert_dir) | |||
# TODO check with Google if it's normal there is no dropout on the token classifier of SQuAD in the TF version | |||
# self.dropout = nn.Dropout(config.hidden_dropout_prob) | |||
self.qa_outputs = nn.Linear(config.hidden_size, 2) | |||
def forward(self, input_ids, token_type_ids=None, attention_mask=None, start_positions=None, end_positions=None): | |||
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) | |||
logits = self.qa_outputs(sequence_output) | |||
start_logits, end_logits = logits.split(1, dim=-1) | |||
start_logits = start_logits.squeeze(-1) | |||
end_logits = end_logits.squeeze(-1) | |||
if start_positions is not None and end_positions is not None: | |||
# If we are on multi-GPU, split add a dimension | |||
if len(start_positions.size()) > 1: | |||
start_positions = start_positions.squeeze(-1) | |||
if len(end_positions.size()) > 1: | |||
end_positions = end_positions.squeeze(-1) | |||
# sometimes the start/end positions are outside our model inputs, we ignore these terms | |||
ignored_index = start_logits.size(1) | |||
start_positions.clamp_(0, ignored_index) | |||
end_positions.clamp_(0, ignored_index) | |||
loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index) | |||
start_loss = loss_fct(start_logits, start_positions) | |||
end_loss = loss_fct(end_logits, end_positions) | |||
total_loss = (start_loss + end_loss) / 2 | |||
return {Const.OUTPUTS(0): start_logits, Const.OUTPUTS(1): end_logits, Const.LOSS: total_loss} | |||
else: | |||
return {Const.OUTPUTS(0): start_logits, Const.OUTPUTS(1): end_logits} | |||
def predict(self, input_ids, token_type_ids=None, attention_mask=None, **kwargs): | |||
logits = self.forward(input_ids, token_type_ids, attention_mask) | |||
start_logits = logits[Const.OUTPUTS(0)] | |||
end_logits = logits[Const.OUTPUTS(1)] | |||
return {Const.OUTPUTS(0): torch.argmax(start_logits, dim=-1), | |||
Const.OUTPUTS(1): torch.argmax(end_logits, dim=-1)} |
@@ -1,22 +1,31 @@ | |||
from collections import defaultdict | |||
""" | |||
Biaffine Dependency Parser 的 Pytorch 实现. | |||
""" | |||
__all__ = [ | |||
"BiaffineParser", | |||
"GraphParser" | |||
] | |||
import numpy as np | |||
import torch | |||
from torch import nn | |||
from torch.nn import functional as F | |||
import torch.nn as nn | |||
import torch.nn.functional as F | |||
from collections import defaultdict | |||
from fastNLP.core.losses import LossFunc | |||
from fastNLP.core.metrics import MetricBase | |||
from fastNLP.core.utils import seq_lens_to_masks | |||
from fastNLP.models.base_model import BaseModel | |||
from fastNLP.modules.dropout import TimestepDropout | |||
from fastNLP.modules.encoder.transformer import TransformerEncoder | |||
from fastNLP.modules.encoder.variational_rnn import VarLSTM | |||
from fastNLP.modules.utils import initial_parameter | |||
from fastNLP.modules.utils import seq_mask | |||
from ..core.const import Const as C | |||
from ..core.losses import LossFunc | |||
from ..core.metrics import MetricBase | |||
from ..modules.dropout import TimestepDropout | |||
from ..modules.encoder.transformer import TransformerEncoder | |||
from ..modules.encoder.variational_rnn import VarLSTM | |||
from ..modules.utils import initial_parameter | |||
from ..modules.utils import get_embeddings | |||
from .base_model import BaseModel | |||
from ..core.utils import seq_len_to_mask | |||
def mst(scores): | |||
def _mst(scores): | |||
""" | |||
with some modification to support parser output for MST decoding | |||
https://github.com/tdozat/Parser/blob/0739216129cd39d69997d28cbc4133b360ea3934/lib/models/nn.py#L692 | |||
@@ -42,7 +51,7 @@ def mst(scores): | |||
scores[roots, new_heads] / root_scores)] | |||
heads[roots] = new_heads | |||
heads[new_root] = 0 | |||
edges = defaultdict(set) | |||
vertices = set((0,)) | |||
for dep, head in enumerate(heads[tokens]): | |||
@@ -71,7 +80,7 @@ def mst(scores): | |||
heads[changed_cycle] = new_head | |||
edges[new_head].add(changed_cycle) | |||
edges[old_head].remove(changed_cycle) | |||
return heads | |||
@@ -86,7 +95,7 @@ def _find_cycle(vertices, edges): | |||
_lowlinks = {} | |||
_onstack = defaultdict(lambda: False) | |||
_SCCs = [] | |||
def _strongconnect(v): | |||
nonlocal _index | |||
_indices[v] = _index | |||
@@ -94,38 +103,49 @@ def _find_cycle(vertices, edges): | |||
_index += 1 | |||
_stack.append(v) | |||
_onstack[v] = True | |||
for w in edges[v]: | |||
if w not in _indices: | |||
_strongconnect(w) | |||
_lowlinks[v] = min(_lowlinks[v], _lowlinks[w]) | |||
elif _onstack[w]: | |||
_lowlinks[v] = min(_lowlinks[v], _indices[w]) | |||
if _lowlinks[v] == _indices[v]: | |||
SCC = set() | |||
while True: | |||
w = _stack.pop() | |||
_onstack[w] = False | |||
SCC.add(w) | |||
if not(w != v): | |||
if not (w != v): | |||
break | |||
_SCCs.append(SCC) | |||
for v in vertices: | |||
if v not in _indices: | |||
_strongconnect(v) | |||
return [SCC for SCC in _SCCs if len(SCC) > 1] | |||
class GraphParser(BaseModel): | |||
"""Graph based Parser helper class, support greedy decoding and MST(Maximum Spanning Tree) decoding | |||
""" | |||
基于图的parser base class, 支持贪婪解码和最大生成树解码 | |||
""" | |||
def __init__(self): | |||
super(GraphParser, self).__init__() | |||
@staticmethod | |||
def greedy_decoder(arc_matrix, mask=None): | |||
""" | |||
贪心解码方式, 输入图, 输出贪心解码的parsing结果, 不保证合法的构成树 | |||
def _greedy_decoder(self, arc_matrix, mask=None): | |||
:param arc_matrix: [batch, seq_len, seq_len] 输入图矩阵 | |||
:param mask: [batch, seq_len] 输入图的padding mask, 有内容的部分为 1, 否则为 0. | |||
若为 ``None`` 时, 默认为全1向量. Default: ``None`` | |||
:return heads: [batch, seq_len] 每个元素在树中对应的head(parent)预测结果 | |||
""" | |||
_, seq_len, _ = arc_matrix.shape | |||
matrix = arc_matrix + torch.diag(arc_matrix.new(seq_len).fill_(-np.inf)) | |||
flip_mask = (mask == 0).byte() | |||
@@ -134,24 +154,37 @@ class GraphParser(BaseModel): | |||
if mask is not None: | |||
heads *= mask.long() | |||
return heads | |||
@staticmethod | |||
def mst_decoder(arc_matrix, mask=None): | |||
""" | |||
用最大生成树算法, 计算parsing结果, 保证输出合法的树结构 | |||
def _mst_decoder(self, arc_matrix, mask=None): | |||
:param arc_matrix: [batch, seq_len, seq_len] 输入图矩阵 | |||
:param mask: [batch, seq_len] 输入图的padding mask, 有内容的部分为 1, 否则为 0. | |||
若为 ``None`` 时, 默认为全1向量. Default: ``None`` | |||
:return heads: [batch, seq_len] 每个元素在树中对应的head(parent)预测结果 | |||
""" | |||
batch_size, seq_len, _ = arc_matrix.shape | |||
matrix = arc_matrix.clone() | |||
ans = matrix.new_zeros(batch_size, seq_len).long() | |||
lens = (mask.long()).sum(1) if mask is not None else torch.zeros(batch_size) + seq_len | |||
batch_idx = torch.arange(batch_size, dtype=torch.long, device=lens.device) | |||
for i, graph in enumerate(matrix): | |||
len_i = lens[i] | |||
ans[i, :len_i] = torch.as_tensor(mst(graph.detach()[:len_i, :len_i].cpu().numpy()), device=ans.device) | |||
ans[i, :len_i] = torch.as_tensor(_mst(graph.detach()[:len_i, :len_i].cpu().numpy()), device=ans.device) | |||
if mask is not None: | |||
ans *= mask.long() | |||
return ans | |||
class ArcBiaffine(nn.Module): | |||
"""helper module for Biaffine Dependency Parser predicting arc | |||
""" | |||
Biaffine Dependency Parser 的子模块, 用于构建预测边的图 | |||
:param hidden_size: 输入的特征维度 | |||
:param bias: 是否使用bias. Default: ``True`` | |||
""" | |||
def __init__(self, hidden_size, bias=True): | |||
super(ArcBiaffine, self).__init__() | |||
self.U = nn.Parameter(torch.Tensor(hidden_size, hidden_size), requires_grad=True) | |||
@@ -161,13 +194,13 @@ class ArcBiaffine(nn.Module): | |||
else: | |||
self.register_parameter("bias", None) | |||
initial_parameter(self) | |||
def forward(self, head, dep): | |||
""" | |||
:param head arc-head tensor = [batch, length, emb_dim] | |||
:param dep arc-dependent tensor = [batch, length, emb_dim] | |||
:return output tensor = [bacth, length, length] | |||
:param head: arc-head tensor [batch, length, hidden] | |||
:param dep: arc-dependent tensor [batch, length, hidden] | |||
:return output: tensor [bacth, length, length] | |||
""" | |||
output = dep.matmul(self.U) | |||
output = output.bmm(head.transpose(-1, -2)) | |||
@@ -177,41 +210,72 @@ class ArcBiaffine(nn.Module): | |||
class LabelBilinear(nn.Module): | |||
"""helper module for Biaffine Dependency Parser predicting label | |||
""" | |||
Biaffine Dependency Parser 的子模块, 用于构建预测边类别的图 | |||
:param in1_features: 输入的特征1维度 | |||
:param in2_features: 输入的特征2维度 | |||
:param num_label: 边类别的个数 | |||
:param bias: 是否使用bias. Default: ``True`` | |||
""" | |||
def __init__(self, in1_features, in2_features, num_label, bias=True): | |||
super(LabelBilinear, self).__init__() | |||
self.bilinear = nn.Bilinear(in1_features, in2_features, num_label, bias=bias) | |||
self.lin = nn.Linear(in1_features + in2_features, num_label, bias=False) | |||
def forward(self, x1, x2): | |||
""" | |||
:param x1: [batch, seq_len, hidden] 输入特征1, 即label-head | |||
:param x2: [batch, seq_len, hidden] 输入特征2, 即label-dep | |||
:return output: [batch, seq_len, num_cls] 每个元素对应类别的概率图 | |||
""" | |||
output = self.bilinear(x1, x2) | |||
output += self.lin(torch.cat([x1, x2], dim=2)) | |||
return output | |||
class BiaffineParser(GraphParser): | |||
"""Biaffine Dependency Parser implemantation. | |||
refer to ` Deep Biaffine Attention for Neural Dependency Parsing (Dozat and Manning, 2016) | |||
<https://arxiv.org/abs/1611.01734>`_ . | |||
""" | |||
别名::class:`fastNLP.models.BiaffineParser` :class:`fastNLP.models.baffine_parser.BiaffineParser` | |||
Biaffine Dependency Parser 实现. | |||
论文参考 `Deep Biaffine Attention for Neural Dependency Parsing (Dozat and Manning, 2016) <https://arxiv.org/abs/1611.01734>`_ . | |||
:param init_embed: 单词词典, 可以是 tuple, 包括(num_embedings, embedding_dim), 即 | |||
embedding的大小和每个词的维度. 也可以传入 nn.Embedding 对象, | |||
此时就以传入的对象作为embedding | |||
:param pos_vocab_size: part-of-speech 词典大小 | |||
:param pos_emb_dim: part-of-speech 向量维度 | |||
:param num_label: 边的类别个数 | |||
:param rnn_layers: rnn encoder的层数 | |||
:param rnn_hidden_size: rnn encoder 的隐状态维度 | |||
:param arc_mlp_size: 边预测的MLP维度 | |||
:param label_mlp_size: 类别预测的MLP维度 | |||
:param dropout: dropout概率. | |||
:param encoder: encoder类别, 可选 ('lstm', 'var-lstm', 'transformer'). Default: lstm | |||
:param use_greedy_infer: 是否在inference时使用贪心算法. | |||
若 ``False`` , 使用更加精确但相对缓慢的MST算法. Default: ``False`` | |||
""" | |||
def __init__(self, | |||
word_vocab_size, | |||
word_emb_dim, | |||
pos_vocab_size, | |||
pos_emb_dim, | |||
num_label, | |||
rnn_layers=1, | |||
rnn_hidden_size=200, | |||
arc_mlp_size=100, | |||
label_mlp_size=100, | |||
dropout=0.3, | |||
encoder='lstm', | |||
use_greedy_infer=False): | |||
init_embed, | |||
pos_vocab_size, | |||
pos_emb_dim, | |||
num_label, | |||
rnn_layers=1, | |||
rnn_hidden_size=200, | |||
arc_mlp_size=100, | |||
label_mlp_size=100, | |||
dropout=0.3, | |||
encoder='lstm', | |||
use_greedy_infer=False): | |||
super(BiaffineParser, self).__init__() | |||
rnn_out_size = 2 * rnn_hidden_size | |||
word_hid_dim = pos_hid_dim = rnn_hidden_size | |||
self.word_embedding = nn.Embedding(num_embeddings=word_vocab_size, embedding_dim=word_emb_dim) | |||
self.word_embedding = get_embeddings(init_embed) | |||
word_emb_dim = self.word_embedding.embedding_dim | |||
self.pos_embedding = nn.Embedding(num_embeddings=pos_vocab_size, embedding_dim=pos_emb_dim) | |||
self.word_fc = nn.Linear(word_emb_dim, word_hid_dim) | |||
self.pos_fc = nn.Linear(pos_emb_dim, pos_hid_dim) | |||
@@ -242,20 +306,20 @@ class BiaffineParser(GraphParser): | |||
if (d_k * n_head) != rnn_out_size: | |||
raise ValueError('unsupported rnn_out_size: {} for transformer'.format(rnn_out_size)) | |||
self.position_emb = nn.Embedding(num_embeddings=self.max_len, | |||
embedding_dim=rnn_out_size,) | |||
embedding_dim=rnn_out_size, ) | |||
self.encoder = TransformerEncoder(num_layers=rnn_layers, | |||
model_size=rnn_out_size, | |||
inner_size=1024, | |||
key_size=d_k, | |||
value_size=d_v, | |||
num_head=n_head, | |||
dropout=dropout,) | |||
dropout=dropout, ) | |||
else: | |||
raise ValueError('unsupported encoder type: {}'.format(encoder)) | |||
self.mlp = nn.Sequential(nn.Linear(rnn_out_size, arc_mlp_size * 2 + label_mlp_size * 2), | |||
nn.ELU(), | |||
TimestepDropout(p=dropout),) | |||
nn.ELU(), | |||
TimestepDropout(p=dropout), ) | |||
self.arc_mlp_size = arc_mlp_size | |||
self.label_mlp_size = label_mlp_size | |||
self.arc_predictor = ArcBiaffine(arc_mlp_size, bias=True) | |||
@@ -263,7 +327,7 @@ class BiaffineParser(GraphParser): | |||
self.use_greedy_infer = use_greedy_infer | |||
self.reset_parameters() | |||
self.dropout = dropout | |||
def reset_parameters(self): | |||
for m in self.modules(): | |||
if isinstance(m, nn.Embedding): | |||
@@ -274,167 +338,210 @@ class BiaffineParser(GraphParser): | |||
else: | |||
for p in m.parameters(): | |||
nn.init.normal_(p, 0, 0.1) | |||
def forward(self, words1, words2, seq_len, target1=None): | |||
"""模型forward阶段 | |||
:param words1: [batch_size, seq_len] 输入word序列 | |||
:param words2: [batch_size, seq_len] 输入pos序列 | |||
:param seq_len: [batch_size, seq_len] 输入序列长度 | |||
:param target1: [batch_size, seq_len] 输入真实标注的heads, 仅在训练阶段有效, | |||
用于训练label分类器. 若为 ``None`` , 使用预测的heads输入到label分类器 | |||
Default: ``None`` | |||
:return dict: parsing | |||
结果:: | |||
pred1: [batch_size, seq_len, seq_len] 边预测logits | |||
pred2: [batch_size, seq_len, num_label] label预测logits | |||
pred3: [batch_size, seq_len] heads的预测结果, 在 ``target1=None`` 时预测 | |||
def forward(self, word_seq, pos_seq, seq_lens, gold_heads=None): | |||
""" | |||
:param word_seq: [batch_size, seq_len] sequence of word's indices | |||
:param pos_seq: [batch_size, seq_len] sequence of word's indices | |||
:param seq_lens: [batch_size, seq_len] sequence of length masks | |||
:param gold_heads: [batch_size, seq_len] sequence of golden heads | |||
:return dict: parsing results | |||
arc_pred: [batch_size, seq_len, seq_len] | |||
label_pred: [batch_size, seq_len, seq_len] | |||
mask: [batch_size, seq_len] | |||
head_pred: [batch_size, seq_len] if gold_heads is not provided, predicting the heads | |||
""" | |||
# prepare embeddings | |||
batch_size, seq_len = word_seq.shape | |||
batch_size, length = words1.shape | |||
# print('forward {} {}'.format(batch_size, seq_len)) | |||
# get sequence mask | |||
mask = seq_mask(seq_lens, seq_len).long() | |||
word = self.word_embedding(word_seq) # [N,L] -> [N,L,C_0] | |||
pos = self.pos_embedding(pos_seq) # [N,L] -> [N,L,C_1] | |||
mask = seq_len_to_mask(seq_len).long() | |||
word = self.word_embedding(words1) # [N,L] -> [N,L,C_0] | |||
pos = self.pos_embedding(words2) # [N,L] -> [N,L,C_1] | |||
word, pos = self.word_fc(word), self.pos_fc(pos) | |||
word, pos = self.word_norm(word), self.pos_norm(pos) | |||
x = torch.cat([word, pos], dim=2) # -> [N,L,C] | |||
x = torch.cat([word, pos], dim=2) # -> [N,L,C] | |||
# encoder, extract features | |||
if self.encoder_name.endswith('lstm'): | |||
sort_lens, sort_idx = torch.sort(seq_lens, dim=0, descending=True) | |||
sort_lens, sort_idx = torch.sort(seq_len, dim=0, descending=True) | |||
x = x[sort_idx] | |||
x = nn.utils.rnn.pack_padded_sequence(x, sort_lens, batch_first=True) | |||
feat, _ = self.encoder(x) # -> [N,L,C] | |||
feat, _ = self.encoder(x) # -> [N,L,C] | |||
feat, _ = nn.utils.rnn.pad_packed_sequence(feat, batch_first=True) | |||
_, unsort_idx = torch.sort(sort_idx, dim=0, descending=False) | |||
feat = feat[unsort_idx] | |||
else: | |||
seq_range = torch.arange(seq_len, dtype=torch.long, device=x.device)[None,:] | |||
seq_range = torch.arange(length, dtype=torch.long, device=x.device)[None, :] | |||
x = x + self.position_emb(seq_range) | |||
feat = self.encoder(x, mask.float()) | |||
# for arc biaffine | |||
# mlp, reduce dim | |||
feat = self.mlp(feat) | |||
arc_sz, label_sz = self.arc_mlp_size, self.label_mlp_size | |||
arc_dep, arc_head = feat[:,:,:arc_sz], feat[:,:,arc_sz:2*arc_sz] | |||
label_dep, label_head = feat[:,:,2*arc_sz:2*arc_sz+label_sz], feat[:,:,2*arc_sz+label_sz:] | |||
arc_dep, arc_head = feat[:, :, :arc_sz], feat[:, :, arc_sz:2 * arc_sz] | |||
label_dep, label_head = feat[:, :, 2 * arc_sz:2 * arc_sz + label_sz], feat[:, :, 2 * arc_sz + label_sz:] | |||
# biaffine arc classifier | |||
arc_pred = self.arc_predictor(arc_head, arc_dep) # [N, L, L] | |||
arc_pred = self.arc_predictor(arc_head, arc_dep) # [N, L, L] | |||
# use gold or predicted arc to predict label | |||
if gold_heads is None or not self.training: | |||
if target1 is None or not self.training: | |||
# use greedy decoding in training | |||
if self.training or self.use_greedy_infer: | |||
heads = self._greedy_decoder(arc_pred, mask) | |||
heads = self.greedy_decoder(arc_pred, mask) | |||
else: | |||
heads = self._mst_decoder(arc_pred, mask) | |||
heads = self.mst_decoder(arc_pred, mask) | |||
head_pred = heads | |||
else: | |||
assert self.training # must be training mode | |||
if gold_heads is None: | |||
heads = self._greedy_decoder(arc_pred, mask) | |||
assert self.training # must be training mode | |||
if target1 is None: | |||
heads = self.greedy_decoder(arc_pred, mask) | |||
head_pred = heads | |||
else: | |||
head_pred = None | |||
heads = gold_heads | |||
batch_range = torch.arange(start=0, end=batch_size, dtype=torch.long, device=word_seq.device).unsqueeze(1) | |||
heads = target1 | |||
batch_range = torch.arange(start=0, end=batch_size, dtype=torch.long, device=words1.device).unsqueeze(1) | |||
label_head = label_head[batch_range, heads].contiguous() | |||
label_pred = self.label_predictor(label_head, label_dep) # [N, L, num_label] | |||
res_dict = {'arc_pred': arc_pred, 'label_pred': label_pred, 'mask': mask} | |||
label_pred = self.label_predictor(label_head, label_dep) # [N, L, num_label] | |||
res_dict = {C.OUTPUTS(0): arc_pred, C.OUTPUTS(1): label_pred} | |||
if head_pred is not None: | |||
res_dict['head_pred'] = head_pred | |||
res_dict[C.OUTPUTS(2)] = head_pred | |||
return res_dict | |||
@staticmethod | |||
def loss(arc_pred, label_pred, arc_true, label_true, mask): | |||
def loss(pred1, pred2, target1, target2, seq_len): | |||
""" | |||
Compute loss. | |||
:param arc_pred: [batch_size, seq_len, seq_len] | |||
:param label_pred: [batch_size, seq_len, n_tags] | |||
:param arc_true: [batch_size, seq_len] | |||
:param label_true: [batch_size, seq_len] | |||
:param mask: [batch_size, seq_len] | |||
:return: loss value | |||
计算parser的loss | |||
:param pred1: [batch_size, seq_len, seq_len] 边预测logits | |||
:param pred2: [batch_size, seq_len, num_label] label预测logits | |||
:param target1: [batch_size, seq_len] 真实边的标注 | |||
:param target2: [batch_size, seq_len] 真实类别的标注 | |||
:param seq_len: [batch_size, seq_len] 真实目标的长度 | |||
:return loss: scalar | |||
""" | |||
batch_size, seq_len, _ = arc_pred.shape | |||
batch_size, length, _ = pred1.shape | |||
mask = seq_len_to_mask(seq_len) | |||
flip_mask = (mask == 0) | |||
_arc_pred = arc_pred.clone() | |||
_arc_pred = pred1.clone() | |||
_arc_pred.masked_fill_(flip_mask.unsqueeze(1), -float('inf')) | |||
arc_logits = F.log_softmax(_arc_pred, dim=2) | |||
label_logits = F.log_softmax(label_pred, dim=2) | |||
label_logits = F.log_softmax(pred2, dim=2) | |||
batch_index = torch.arange(batch_size, device=arc_logits.device, dtype=torch.long).unsqueeze(1) | |||
child_index = torch.arange(seq_len, device=arc_logits.device, dtype=torch.long).unsqueeze(0) | |||
arc_loss = arc_logits[batch_index, child_index, arc_true] | |||
label_loss = label_logits[batch_index, child_index, label_true] | |||
child_index = torch.arange(length, device=arc_logits.device, dtype=torch.long).unsqueeze(0) | |||
arc_loss = arc_logits[batch_index, child_index, target1] | |||
label_loss = label_logits[batch_index, child_index, target2] | |||
byte_mask = flip_mask.byte() | |||
arc_loss.masked_fill_(byte_mask, 0) | |||
label_loss.masked_fill_(byte_mask, 0) | |||
arc_nll = -arc_loss.mean() | |||
label_nll = -label_loss.mean() | |||
return arc_nll + label_nll | |||
def predict(self, words1, words2, seq_len): | |||
"""模型预测API | |||
def predict(self, word_seq, pos_seq, seq_lens): | |||
""" | |||
:param words1: [batch_size, seq_len] 输入word序列 | |||
:param words2: [batch_size, seq_len] 输入pos序列 | |||
:param seq_len: [batch_size, seq_len] 输入序列长度 | |||
:return dict: parsing | |||
结果:: | |||
pred1: [batch_size, seq_len] heads的预测结果 | |||
pred2: [batch_size, seq_len, num_label] label预测logits | |||
:param word_seq: | |||
:param pos_seq: | |||
:param seq_lens: | |||
:return: arc_pred: [B, L] | |||
label_pred: [B, L] | |||
""" | |||
res = self(word_seq, pos_seq, seq_lens) | |||
res = self(words1, words2, seq_len) | |||
output = {} | |||
output['arc_pred'] = res.pop('head_pred') | |||
_, label_pred = res.pop('label_pred').max(2) | |||
output['label_pred'] = label_pred | |||
output[C.OUTPUTS(0)] = res.pop(C.OUTPUTS(2)) | |||
_, label_pred = res.pop(C.OUTPUTS(1)).max(2) | |||
output[C.OUTPUTS(1)] = label_pred | |||
return output | |||
class ParserLoss(LossFunc): | |||
def __init__(self, arc_pred=None, label_pred=None, arc_true=None, label_true=None): | |||
""" | |||
别名::class:`fastNLP.models.ParserLoss` :class:`fastNLP.models.baffine_parser.ParserLoss` | |||
计算parser的loss | |||
:param pred1: [batch_size, seq_len, seq_len] 边预测logits | |||
:param pred2: [batch_size, seq_len, num_label] label预测logits | |||
:param target1: [batch_size, seq_len] 真实边的标注 | |||
:param target2: [batch_size, seq_len] 真实类别的标注 | |||
:param seq_len: [batch_size, seq_len] 真实目标的长度 | |||
:return loss: scalar | |||
""" | |||
def __init__(self, pred1=None, pred2=None, | |||
target1=None, target2=None, | |||
seq_len=None): | |||
super(ParserLoss, self).__init__(BiaffineParser.loss, | |||
arc_pred=arc_pred, | |||
label_pred=label_pred, | |||
arc_true=arc_true, | |||
label_true=label_true) | |||
pred1=pred1, | |||
pred2=pred2, | |||
target1=target1, | |||
target2=target2, | |||
seq_len=seq_len) | |||
class ParserMetric(MetricBase): | |||
def __init__(self, arc_pred=None, label_pred=None, | |||
arc_true=None, label_true=None, seq_lens=None): | |||
""" | |||
别名::class:`fastNLP.models.ParserMetric` :class:`fastNLP.models.baffine_parser.ParserMetric` | |||
评估parser的性能 | |||
:param pred1: 边预测logits | |||
:param pred2: label预测logits | |||
:param target1: 真实边的标注 | |||
:param target2: 真实类别的标注 | |||
:param seq_len: 序列长度 | |||
:return dict: 评估结果:: | |||
UAS: 不带label时, 边预测的准确率 | |||
LAS: 同时预测边和label的准确率 | |||
""" | |||
def __init__(self, pred1=None, pred2=None, | |||
target1=None, target2=None, seq_len=None): | |||
super().__init__() | |||
self._init_param_map(arc_pred=arc_pred, label_pred=label_pred, | |||
arc_true=arc_true, label_true=label_true, | |||
seq_lens=seq_lens) | |||
self._init_param_map(pred1=pred1, pred2=pred2, | |||
target1=target1, target2=target2, | |||
seq_len=seq_len) | |||
self.num_arc = 0 | |||
self.num_label = 0 | |||
self.num_sample = 0 | |||
def get_metric(self, reset=True): | |||
res = {'UAS': self.num_arc*1.0 / self.num_sample, 'LAS': self.num_label*1.0 / self.num_sample} | |||
res = {'UAS': self.num_arc * 1.0 / self.num_sample, 'LAS': self.num_label * 1.0 / self.num_sample} | |||
if reset: | |||
self.num_sample = self.num_label = self.num_arc = 0 | |||
return res | |||
def evaluate(self, arc_pred, label_pred, arc_true, label_true, seq_lens=None): | |||
def evaluate(self, pred1, pred2, target1, target2, seq_len=None): | |||
"""Evaluate the performance of prediction. | |||
""" | |||
if seq_lens is None: | |||
seq_mask = arc_pred.new_ones(arc_pred.size(), dtype=torch.long) | |||
if seq_len is None: | |||
seq_mask = pred1.new_ones(pred1.size(), dtype=torch.long) | |||
else: | |||
seq_mask = seq_lens_to_masks(seq_lens.long(), float=False).long() | |||
seq_mask = seq_len_to_mask(seq_len.long()).long() | |||
# mask out <root> tag | |||
seq_mask[:,0] = 0 | |||
head_pred_correct = (arc_pred == arc_true).long() * seq_mask | |||
label_pred_correct = (label_pred == label_true).long() * head_pred_correct | |||
seq_mask[:, 0] = 0 | |||
head_pred_correct = (pred1 == target1).long() * seq_mask | |||
label_pred_correct = (pred2 == target2).long() * head_pred_correct | |||
self.num_arc += head_pred_correct.sum().item() | |||
self.num_label += label_pred_correct.sum().item() | |||
self.num_sample += seq_mask.sum().item() |
@@ -1,131 +0,0 @@ | |||
import torch | |||
import torch.nn as nn | |||
import torch.nn.functional as F | |||
from fastNLP.modules.encoder.lstm import LSTM | |||
class Highway(nn.Module): | |||
"""Highway network""" | |||
def __init__(self, input_size): | |||
super(Highway, self).__init__() | |||
self.fc1 = nn.Linear(input_size, input_size, bias=True) | |||
self.fc2 = nn.Linear(input_size, input_size, bias=True) | |||
def forward(self, x): | |||
t = F.sigmoid(self.fc1(x)) | |||
return torch.mul(t, F.relu(self.fc2(x))) + torch.mul(1 - t, x) | |||
class CharLM(nn.Module): | |||
"""CNN + highway network + LSTM | |||
# Input: | |||
4D tensor with shape [batch_size, in_channel, height, width] | |||
# Output: | |||
2D Tensor with shape [batch_size, vocab_size] | |||
# Arguments: | |||
char_emb_dim: the size of each character's attention | |||
word_emb_dim: the size of each word's attention | |||
vocab_size: num of unique words | |||
num_char: num of characters | |||
use_gpu: True or False | |||
""" | |||
def __init__(self, char_emb_dim, word_emb_dim, | |||
vocab_size, num_char): | |||
super(CharLM, self).__init__() | |||
self.char_emb_dim = char_emb_dim | |||
self.word_emb_dim = word_emb_dim | |||
self.vocab_size = vocab_size | |||
# char attention layer | |||
self.char_embed = nn.Embedding(num_char, char_emb_dim) | |||
# convolutions of filters with different sizes | |||
self.convolutions = [] | |||
# list of tuples: (the number of filter, width) | |||
self.filter_num_width = [(25, 1), (50, 2), (75, 3), (100, 4), (125, 5), (150, 6)] | |||
for out_channel, filter_width in self.filter_num_width: | |||
self.convolutions.append( | |||
nn.Conv2d( | |||
1, # in_channel | |||
out_channel, # out_channel | |||
kernel_size=(char_emb_dim, filter_width), # (height, width) | |||
bias=True | |||
) | |||
) | |||
self.highway_input_dim = sum([x for x, y in self.filter_num_width]) | |||
self.batch_norm = nn.BatchNorm1d(self.highway_input_dim, affine=False) | |||
# highway net | |||
self.highway1 = Highway(self.highway_input_dim) | |||
self.highway2 = Highway(self.highway_input_dim) | |||
# LSTM | |||
self.lstm_num_layers = 2 | |||
self.lstm = LSTM(self.highway_input_dim, hidden_size=self.word_emb_dim, num_layers=self.lstm_num_layers, | |||
dropout=0.5) | |||
# output layer | |||
self.dropout = nn.Dropout(p=0.5) | |||
self.linear = nn.Linear(self.word_emb_dim, self.vocab_size) | |||
def forward(self, x): | |||
# Input: Variable of Tensor with shape [num_seq, seq_len, max_word_len+2] | |||
# Return: Variable of Tensor with shape [num_words, len(word_dict)] | |||
lstm_batch_size = x.size()[0] | |||
lstm_seq_len = x.size()[1] | |||
x = x.contiguous().view(-1, x.size()[2]) | |||
# [num_seq*seq_len, max_word_len+2] | |||
x = self.char_embed(x) | |||
# [num_seq*seq_len, max_word_len+2, char_emb_dim] | |||
x = torch.transpose(x.view(x.size()[0], 1, x.size()[1], -1), 2, 3) | |||
# [num_seq*seq_len, 1, max_word_len+2, char_emb_dim] | |||
x = self.conv_layers(x) | |||
# [num_seq*seq_len, total_num_filters] | |||
x = self.batch_norm(x) | |||
# [num_seq*seq_len, total_num_filters] | |||
x = self.highway1(x) | |||
x = self.highway2(x) | |||
# [num_seq*seq_len, total_num_filters] | |||
x = x.contiguous().view(lstm_batch_size, lstm_seq_len, -1) | |||
# [num_seq, seq_len, total_num_filters] | |||
x = self.lstm(x) | |||
# [seq_len, num_seq, hidden_size] | |||
x = self.dropout(x) | |||
# [seq_len, num_seq, hidden_size] | |||
x = x.contiguous().view(lstm_batch_size * lstm_seq_len, -1) | |||
# [num_seq*seq_len, hidden_size] | |||
x = self.linear(x) | |||
# [num_seq*seq_len, vocab_size] | |||
return x | |||
def conv_layers(self, x): | |||
chosen_list = list() | |||
for conv in self.convolutions: | |||
feature_map = F.tanh(conv(x)) | |||
# (batch_size, out_channel, 1, max_word_len-width+1) | |||
chosen = torch.max(feature_map, 3)[0] | |||
# (batch_size, out_channel, 1) | |||
chosen = chosen.squeeze() | |||
# (batch_size, out_channel) | |||
chosen_list.append(chosen) | |||
# (batch_size, total_num_filers) | |||
return torch.cat(chosen_list, 1) |
@@ -1,57 +1,68 @@ | |||
# python: 3.6 | |||
# encoding: utf-8 | |||
__all__ = [ | |||
"CNNText" | |||
] | |||
import torch | |||
import torch.nn as nn | |||
# import torch.nn.functional as F | |||
import fastNLP.modules.encoder as encoder | |||
from ..core.const import Const as C | |||
from ..modules import encoder | |||
class CNNText(torch.nn.Module): | |||
""" | |||
Text classification model by character CNN, the implementation of paper | |||
'Yoon Kim. 2014. Convolution Neural Networks for Sentence | |||
Classification.' | |||
""" | |||
别名::class:`fastNLP.models.CNNText` :class:`fastNLP.models.cnn_text_classification.CNNText` | |||
def __init__(self, embed_num, | |||
embed_dim, | |||
使用CNN进行文本分类的模型 | |||
'Yoon Kim. 2014. Convolution Neural Networks for Sentence Classification.' | |||
:param tuple(int,int),torch.FloatTensor,nn.Embedding,numpy.ndarray init_embed: Embedding的大小(传入tuple(int, int), | |||
第一个int为vocab_zie, 第二个int为embed_dim); 如果为Tensor, Embedding, ndarray等则直接使用该值初始化Embedding | |||
:param int num_classes: 一共有多少类 | |||
:param int,tuple(int) out_channels: 输出channel的数量。如果为list,则需要与kernel_sizes的数量保持一致 | |||
:param int,tuple(int) kernel_sizes: 输出channel的kernel大小。 | |||
:param int padding: 对句子前后的pad的大小, 用0填充。 | |||
:param float dropout: Dropout的大小 | |||
""" | |||
def __init__(self, init_embed, | |||
num_classes, | |||
kernel_nums=(3, 4, 5), | |||
kernel_sizes=(3, 4, 5), | |||
padding=0, | |||
dropout=0.5): | |||
super(CNNText, self).__init__() | |||
# no support for pre-trained embedding currently | |||
self.embed = encoder.Embedding(embed_num, embed_dim) | |||
self.embed = encoder.Embedding(init_embed) | |||
self.conv_pool = encoder.ConvMaxpool( | |||
in_channels=embed_dim, | |||
in_channels=self.embed.embedding_dim, | |||
out_channels=kernel_nums, | |||
kernel_sizes=kernel_sizes, | |||
padding=padding) | |||
self.dropout = nn.Dropout(dropout) | |||
self.fc = encoder.Linear(sum(kernel_nums), num_classes) | |||
def forward(self, word_seq): | |||
self.fc = nn.Linear(sum(kernel_nums), num_classes) | |||
def forward(self, words, seq_len=None): | |||
""" | |||
:param word_seq: torch.LongTensor, [batch_size, seq_len] | |||
:param torch.LongTensor words: [batch_size, seq_len],句子中word的index | |||
:param torch.LongTensor seq_len: [batch,] 每个句子的长度 | |||
:return output: dict of torch.LongTensor, [batch_size, num_classes] | |||
""" | |||
x = self.embed(word_seq) # [N,L] -> [N,L,C] | |||
x = self.embed(words) # [N,L] -> [N,L,C] | |||
x = self.conv_pool(x) # [N,L,C] -> [N,C] | |||
x = self.dropout(x) | |||
x = self.fc(x) # [N,C] -> [N, N_class] | |||
return {'pred': x} | |||
def predict(self, word_seq): | |||
return {C.OUTPUT: x} | |||
def predict(self, words, seq_len=None): | |||
""" | |||
:param torch.LongTensor words: [batch_size, seq_len],句子中word的index | |||
:param torch.LongTensor seq_len: [batch,] 每个句子的长度 | |||
:param word_seq: torch.LongTensor, [batch_size, seq_len] | |||
:return predict: dict of torch.LongTensor, [batch_size, seq_len] | |||
:return predict: dict of torch.LongTensor, [batch_size, ] | |||
""" | |||
output = self(word_seq) | |||
_, predict = output['pred'].max(dim=1) | |||
return {'pred': predict} | |||
output = self(words, seq_len) | |||
_, predict = output[C.OUTPUT].max(dim=1) | |||
return {C.OUTPUT: predict} |
@@ -5,9 +5,9 @@ import os | |||
import torch | |||
import torch.nn.functional as F | |||
import fastNLP | |||
import fastNLP.models.enas_utils as utils | |||
from fastNLP.models.enas_utils import Node | |||
from . import enas_utils as utils | |||
from .enas_utils import Node | |||
def _construct_dags(prev_nodes, activations, func_names, num_blocks): | |||
@@ -1,17 +1,18 @@ | |||
# Code Modified from https://github.com/carpedm20/ENAS-pytorch | |||
"""Module containing the shared RNN model.""" | |||
import numpy as np | |||
""" | |||
Module containing the shared RNN model. | |||
Code Modified from https://github.com/carpedm20/ENAS-pytorch | |||
""" | |||
import collections | |||
import numpy as np | |||
import torch | |||
from torch import nn | |||
import torch.nn as nn | |||
import torch.nn.functional as F | |||
from torch.autograd import Variable | |||
import fastNLP.models.enas_utils as utils | |||
from fastNLP.models.base_model import BaseModel | |||
import fastNLP.modules.encoder as encoder | |||
from . import enas_utils as utils | |||
from .base_model import BaseModel | |||
def _get_dropped_weights(w_raw, dropout_p, is_training): | |||
"""Drops out weights to implement DropConnect. | |||
@@ -36,12 +37,13 @@ def _get_dropped_weights(w_raw, dropout_p, is_training): | |||
The above TODO is the reason for the hacky check for `torch.nn.Parameter`. | |||
""" | |||
dropped_w = F.dropout(w_raw, p=dropout_p, training=is_training) | |||
if isinstance(dropped_w, torch.nn.Parameter): | |||
dropped_w = dropped_w.clone() | |||
return dropped_w | |||
class EmbeddingDropout(torch.nn.Embedding): | |||
"""Class for dropping out embeddings by zero'ing out parameters in the | |||
embedding matrix. | |||
@@ -54,6 +56,7 @@ class EmbeddingDropout(torch.nn.Embedding): | |||
See 'A Theoretically Grounded Application of Dropout in Recurrent Neural | |||
Networks', (Gal and Ghahramani, 2016). | |||
""" | |||
def __init__(self, | |||
num_embeddings, | |||
embedding_dim, | |||
@@ -84,14 +87,14 @@ class EmbeddingDropout(torch.nn.Embedding): | |||
assert (dropout >= 0.0) and (dropout < 1.0), ('Dropout must be >= 0.0 ' | |||
'and < 1.0') | |||
self.scale = scale | |||
def forward(self, inputs): # pylint:disable=arguments-differ | |||
"""Embeds `inputs` with the dropped out embedding weight matrix.""" | |||
if self.training: | |||
dropout = self.dropout | |||
else: | |||
dropout = 0 | |||
if dropout: | |||
mask = self.weight.data.new(self.weight.size(0), 1) | |||
mask.bernoulli_(1 - dropout) | |||
@@ -102,7 +105,7 @@ class EmbeddingDropout(torch.nn.Embedding): | |||
masked_weight = self.weight | |||
if self.scale and self.scale != 1: | |||
masked_weight = masked_weight * self.scale | |||
return F.embedding(inputs, | |||
masked_weight, | |||
max_norm=self.max_norm, | |||
@@ -115,7 +118,7 @@ class LockedDropout(nn.Module): | |||
# code from https://github.com/salesforce/awd-lstm-lm/blob/master/locked_dropout.py | |||
def __init__(self): | |||
super().__init__() | |||
def forward(self, x, dropout=0.5): | |||
if not self.training or not dropout: | |||
return x | |||
@@ -127,11 +130,12 @@ class LockedDropout(nn.Module): | |||
class ENASModel(BaseModel): | |||
"""Shared RNN model.""" | |||
def __init__(self, embed_num, num_classes, num_blocks=4, cuda=False, shared_hid=1000, shared_embed=1000): | |||
super(ENASModel, self).__init__() | |||
self.use_cuda = cuda | |||
self.shared_hid = shared_hid | |||
self.num_blocks = num_blocks | |||
self.decoder = nn.Linear(self.shared_hid, num_classes) | |||
@@ -140,16 +144,16 @@ class ENASModel(BaseModel): | |||
dropout=0.1) | |||
self.lockdrop = LockedDropout() | |||
self.dag = None | |||
# Tie weights | |||
# self.decoder.weight = self.encoder.weight | |||
# Since W^{x, c} and W^{h, c} are always summed, there | |||
# is no point duplicating their bias offset parameter. Likewise for | |||
# W^{x, h} and W^{h, h}. | |||
self.w_xc = nn.Linear(shared_embed, self.shared_hid) | |||
self.w_xh = nn.Linear(shared_embed, self.shared_hid) | |||
# The raw weights are stored here because the hidden-to-hidden weights | |||
# are weight dropped on the forward pass. | |||
self.w_hc_raw = torch.nn.Parameter( | |||
@@ -158,10 +162,10 @@ class ENASModel(BaseModel): | |||
torch.Tensor(self.shared_hid, self.shared_hid)) | |||
self.w_hc = None | |||
self.w_hh = None | |||
self.w_h = collections.defaultdict(dict) | |||
self.w_c = collections.defaultdict(dict) | |||
for idx in range(self.num_blocks): | |||
for jdx in range(idx + 1, self.num_blocks): | |||
self.w_h[idx][jdx] = nn.Linear(self.shared_hid, | |||
@@ -170,48 +174,47 @@ class ENASModel(BaseModel): | |||
self.w_c[idx][jdx] = nn.Linear(self.shared_hid, | |||
self.shared_hid, | |||
bias=False) | |||
self._w_h = nn.ModuleList([self.w_h[idx][jdx] | |||
for idx in self.w_h | |||
for jdx in self.w_h[idx]]) | |||
self._w_c = nn.ModuleList([self.w_c[idx][jdx] | |||
for idx in self.w_c | |||
for jdx in self.w_c[idx]]) | |||
self.batch_norm = None | |||
# if args.mode == 'train': | |||
# self.batch_norm = nn.BatchNorm1d(self.shared_hid) | |||
# else: | |||
# self.batch_norm = None | |||
self.reset_parameters() | |||
self.static_init_hidden = utils.keydefaultdict(self.init_hidden) | |||
def setDAG(self, dag): | |||
if self.dag is None: | |||
self.dag = dag | |||
def forward(self, word_seq, hidden=None): | |||
inputs = torch.transpose(word_seq, 0, 1) | |||
time_steps = inputs.size(0) | |||
batch_size = inputs.size(1) | |||
self.w_hh = _get_dropped_weights(self.w_hh_raw, | |||
0.5, | |||
self.training) | |||
self.w_hc = _get_dropped_weights(self.w_hc_raw, | |||
0.5, | |||
self.training) | |||
# hidden = self.static_init_hidden[batch_size] if hidden is None else hidden | |||
hidden = self.static_init_hidden[batch_size] | |||
embed = self.encoder(inputs) | |||
embed = self.lockdrop(embed, 0.65 if self.training else 0) | |||
# The norm of hidden states are clipped here because | |||
# otherwise ENAS is especially prone to exploding activations on the | |||
# forward pass. This could probably be fixed in a more elegant way, but | |||
@@ -227,7 +230,7 @@ class ENASModel(BaseModel): | |||
for step in range(time_steps): | |||
x_t = embed[step] | |||
logit, hidden = self.cell(x_t, hidden, self.dag) | |||
hidden_norms = hidden.norm(dim=-1) | |||
max_norm = 25.0 | |||
if hidden_norms.data.max() > max_norm: | |||
@@ -238,60 +241,60 @@ class ENASModel(BaseModel): | |||
# because the PyTorch slicing and slice assignment is too | |||
# flaky. | |||
hidden_norms = hidden_norms.data.cpu().numpy() | |||
clipped_num += 1 | |||
if hidden_norms.max() > max_clipped_norm: | |||
max_clipped_norm = hidden_norms.max() | |||
clip_select = hidden_norms > max_norm | |||
clip_norms = hidden_norms[clip_select] | |||
mask = np.ones(hidden.size()) | |||
normalizer = max_norm/clip_norms | |||
normalizer = max_norm / clip_norms | |||
normalizer = normalizer[:, np.newaxis] | |||
mask[clip_select] = normalizer | |||
if self.use_cuda: | |||
hidden *= torch.autograd.Variable( | |||
torch.FloatTensor(mask).cuda(), requires_grad=False) | |||
else: | |||
hidden *= torch.autograd.Variable( | |||
torch.FloatTensor(mask), requires_grad=False) | |||
torch.FloatTensor(mask), requires_grad=False) | |||
logits.append(logit) | |||
h1tohT.append(hidden) | |||
h1tohT = torch.stack(h1tohT) | |||
output = torch.stack(logits) | |||
raw_output = output | |||
output = self.lockdrop(output, 0.4 if self.training else 0) | |||
#Pooling | |||
# Pooling | |||
output = torch.mean(output, 0) | |||
decoded = self.decoder(output) | |||
extra_out = {'dropped': decoded, | |||
'hiddens': h1tohT, | |||
'raw': raw_output} | |||
return {'pred': decoded, 'hidden': hidden, 'extra_out': extra_out} | |||
def cell(self, x, h_prev, dag): | |||
"""Computes a single pass through the discovered RNN cell.""" | |||
c = {} | |||
h = {} | |||
f = {} | |||
f[0] = self.get_f(dag[-1][0].name) | |||
c[0] = torch.sigmoid(self.w_xc(x) + F.linear(h_prev, self.w_hc, None)) | |||
h[0] = (c[0]*f[0](self.w_xh(x) + F.linear(h_prev, self.w_hh, None)) + | |||
(1 - c[0])*h_prev) | |||
h[0] = (c[0] * f[0](self.w_xh(x) + F.linear(h_prev, self.w_hh, None)) + | |||
(1 - c[0]) * h_prev) | |||
leaf_node_ids = [] | |||
q = collections.deque() | |||
q.append(0) | |||
# Computes connections from the parent nodes `node_id` | |||
# to their child nodes `next_id` recursively, skipping leaf nodes. A | |||
# leaf node is a node whose id == `self.num_blocks`. | |||
@@ -307,10 +310,10 @@ class ENASModel(BaseModel): | |||
while True: | |||
if len(q) == 0: | |||
break | |||
node_id = q.popleft() | |||
nodes = dag[node_id] | |||
for next_node in nodes: | |||
next_id = next_node.id | |||
if next_id == self.num_blocks: | |||
@@ -318,38 +321,38 @@ class ENASModel(BaseModel): | |||
assert len(nodes) == 1, ('parent of leaf node should have ' | |||
'only one child') | |||
continue | |||
w_h = self.w_h[node_id][next_id] | |||
w_c = self.w_c[node_id][next_id] | |||
f[next_id] = self.get_f(next_node.name) | |||
c[next_id] = torch.sigmoid(w_c(h[node_id])) | |||
h[next_id] = (c[next_id]*f[next_id](w_h(h[node_id])) + | |||
(1 - c[next_id])*h[node_id]) | |||
h[next_id] = (c[next_id] * f[next_id](w_h(h[node_id])) + | |||
(1 - c[next_id]) * h[node_id]) | |||
q.append(next_id) | |||
# Instead of averaging loose ends, perhaps there should | |||
# be a set of separate unshared weights for each "loose" connection | |||
# between each node in a cell and the output. | |||
# | |||
# As it stands, all weights W^h_{ij} are doing double duty by | |||
# connecting both from i to j, as well as from i to the output. | |||
# average all the loose ends | |||
leaf_nodes = [h[node_id] for node_id in leaf_node_ids] | |||
output = torch.mean(torch.stack(leaf_nodes, 2), -1) | |||
# stabilizing the Updates of omega | |||
if self.batch_norm is not None: | |||
output = self.batch_norm(output) | |||
return output, h[self.num_blocks - 1] | |||
def init_hidden(self, batch_size): | |||
zeros = torch.zeros(batch_size, self.shared_hid) | |||
return utils.get_variable(zeros, self.use_cuda, requires_grad=False) | |||
def get_f(self, name): | |||
name = name.lower() | |||
if name == 'relu': | |||
@@ -361,22 +364,21 @@ class ENASModel(BaseModel): | |||
elif name == 'sigmoid': | |||
f = torch.sigmoid | |||
return f | |||
@property | |||
def num_parameters(self): | |||
def size(p): | |||
return np.prod(p.size()) | |||
return sum([size(param) for param in self.parameters()]) | |||
def reset_parameters(self): | |||
init_range = 0.025 | |||
# init_range = 0.025 if self.args.mode == 'train' else 0.04 | |||
for param in self.parameters(): | |||
param.data.uniform_(-init_range, init_range) | |||
self.decoder.bias.data.fill_(0) | |||
def predict(self, word_seq): | |||
""" | |||
@@ -1,30 +1,25 @@ | |||
# Code Modified from https://github.com/carpedm20/ENAS-pytorch | |||
import os | |||
import time | |||
from datetime import datetime | |||
from datetime import timedelta | |||
import math | |||
import numpy as np | |||
import time | |||
import torch | |||
import math | |||
from torch import nn | |||
from datetime import datetime, timedelta | |||
from torch.optim import Adam | |||
try: | |||
from tqdm.autonotebook import tqdm | |||
from tqdm.auto import tqdm | |||
except: | |||
from fastNLP.core.utils import pseudo_tqdm as tqdm | |||
from fastNLP.core.batch import Batch | |||
from fastNLP.core.callback import CallbackManager, CallbackException | |||
from fastNLP.core.dataset import DataSet | |||
from fastNLP.core.utils import CheckError | |||
from fastNLP.core.utils import _move_dict_value_to_device | |||
import fastNLP | |||
import fastNLP.models.enas_utils as utils | |||
from fastNLP.core.utils import _build_args | |||
from ..core.utils import _pseudo_tqdm as tqdm | |||
from torch.optim import Adam | |||
from ..core.trainer import Trainer | |||
from ..core.batch import Batch | |||
from ..core.callback import CallbackManager, CallbackException | |||
from ..core.dataset import DataSet | |||
from ..core.utils import _move_dict_value_to_device | |||
from . import enas_utils as utils | |||
from ..core.utils import _build_args | |||
def _get_no_grad_ctx_mgr(): | |||
@@ -34,8 +29,9 @@ def _get_no_grad_ctx_mgr(): | |||
return torch.no_grad() | |||
class ENASTrainer(fastNLP.Trainer): | |||
class ENASTrainer(Trainer): | |||
"""A class to wrap training code.""" | |||
def __init__(self, train_data, model, controller, **kwargs): | |||
"""Constructor for training algorithm. | |||
:param DataSet train_data: the training data | |||
@@ -48,30 +44,31 @@ class ENASTrainer(fastNLP.Trainer): | |||
self.controller_step = 0 | |||
self.shared_step = 0 | |||
self.max_length = 35 | |||
self.shared = model | |||
self.controller = controller | |||
self.shared_optim = Adam( | |||
self.shared.parameters(), | |||
lr=20.0, | |||
weight_decay=1e-7) | |||
self.controller_optim = Adam( | |||
self.controller.parameters(), | |||
lr=3.5e-4) | |||
def train(self, load_best_model=True): | |||
""" | |||
:param bool load_best_model: 该参数只有在初始化提供了dev_data的情况下有效,如果True, trainer将在返回之前重新加载dev表现 | |||
最好的模型参数。 | |||
:return results: 返回一个字典类型的数据, 内含以下内容:: | |||
:return results: 返回一个字典类型的数据, | |||
内含以下内容:: | |||
seconds: float, 表示训练时长 | |||
以下三个内容只有在提供了dev_data的情况下会有。 | |||
best_eval: Dict of Dict, 表示evaluation的结果 | |||
best_epoch: int,在第几个epoch取得的最佳值 | |||
best_step: int, 在第几个step(batch)更新取得的最佳值 | |||
seconds: float, 表示训练时长 | |||
以下三个内容只有在提供了dev_data的情况下会有。 | |||
best_eval: Dict of Dict, 表示evaluation的结果 | |||
best_epoch: int,在第几个epoch取得的最佳值 | |||
best_step: int, 在第几个step(batch)更新取得的最佳值 | |||
""" | |||
results = {} | |||
@@ -80,25 +77,26 @@ class ENASTrainer(fastNLP.Trainer): | |||
results['seconds'] = 0. | |||
return results | |||
try: | |||
if torch.cuda.is_available() and self.use_cuda: | |||
if torch.cuda.is_available() and "cuda" in self.device: | |||
self.model = self.model.cuda() | |||
self._model_device = self.model.parameters().__next__().device | |||
self._mode(self.model, is_test=False) | |||
self.start_time = str(datetime.now().strftime('%Y-%m-%d-%H-%M-%S')) | |||
start_time = time.time() | |||
print("training epochs started " + self.start_time, flush=True) | |||
try: | |||
self.callback_manager.on_train_begin() | |||
self._train() | |||
self.callback_manager.on_train_end(self.model) | |||
self.callback_manager.on_train_end() | |||
except (CallbackException, KeyboardInterrupt) as e: | |||
self.callback_manager.on_exception(e, self.model) | |||
self.callback_manager.on_exception(e) | |||
if self.dev_data is not None: | |||
print("\nIn Epoch:{}/Step:{}, got best dev performance:".format(self.best_dev_epoch, self.best_dev_step) + | |||
self.tester._format_eval_results(self.best_dev_perf),) | |||
print( | |||
"\nIn Epoch:{}/Step:{}, got best dev performance:".format(self.best_dev_epoch, self.best_dev_step) + | |||
self.tester._format_eval_results(self.best_dev_perf), ) | |||
results['best_eval'] = self.best_dev_perf | |||
results['best_epoch'] = self.best_dev_epoch | |||
results['best_step'] = self.best_dev_step | |||
@@ -112,12 +110,12 @@ class ENASTrainer(fastNLP.Trainer): | |||
finally: | |||
pass | |||
results['seconds'] = round(time.time() - start_time, 2) | |||
return results | |||
def _train(self): | |||
if not self.use_tqdm: | |||
from fastNLP.core.utils import pseudo_tqdm as inner_tqdm | |||
from fastNLP.core.utils import _pseudo_tqdm as inner_tqdm | |||
else: | |||
inner_tqdm = tqdm | |||
self.step = 0 | |||
@@ -128,21 +126,21 @@ class ENASTrainer(fastNLP.Trainer): | |||
avg_loss = 0 | |||
data_iterator = Batch(self.train_data, batch_size=self.batch_size, sampler=self.sampler, as_numpy=False, | |||
prefetch=self.prefetch) | |||
for epoch in range(1, self.n_epochs+1): | |||
for epoch in range(1, self.n_epochs + 1): | |||
pbar.set_description_str(desc="Epoch {}/{}".format(epoch, self.n_epochs)) | |||
last_stage = (epoch > self.n_epochs + 1 - self.final_epochs) | |||
if epoch == self.n_epochs + 1 - self.final_epochs: | |||
print('Entering the final stage. (Only train the selected structure)') | |||
# early stopping | |||
self.callback_manager.on_epoch_begin(epoch, self.n_epochs) | |||
self.callback_manager.on_epoch_begin() | |||
# 1. Training the shared parameters omega of the child models | |||
self.train_shared(pbar) | |||
# 2. Training the controller parameters theta | |||
if not last_stage: | |||
self.train_controller() | |||
if ((self.validate_every > 0 and self.step % self.validate_every == 0) or | |||
(self.validate_every < 0 and self.step % len(data_iterator) == 0)) \ | |||
and self.dev_data is not None: | |||
@@ -151,16 +149,15 @@ class ENASTrainer(fastNLP.Trainer): | |||
eval_res = self._do_validation(epoch=epoch, step=self.step) | |||
eval_str = "Evaluation at Epoch {}/{}. Step:{}/{}. ".format(epoch, self.n_epochs, self.step, | |||
total_steps) + \ | |||
self.tester._format_eval_results(eval_res) | |||
self.tester._format_eval_results(eval_res) | |||
pbar.write(eval_str) | |||
# lr decay; early stopping | |||
self.callback_manager.on_epoch_end(epoch, self.n_epochs, self.optimizer) | |||
self.callback_manager.on_epoch_end() | |||
# =============== epochs end =================== # | |||
pbar.close() | |||
# ============ tqdm end ============== # | |||
def get_loss(self, inputs, targets, hidden, dags): | |||
"""Computes the loss for the same batch for M models. | |||
@@ -169,7 +166,7 @@ class ENASTrainer(fastNLP.Trainer): | |||
""" | |||
if not isinstance(dags, list): | |||
dags = [dags] | |||
loss = 0 | |||
for dag in dags: | |||
self.shared.setDAG(dag) | |||
@@ -177,14 +174,14 @@ class ENASTrainer(fastNLP.Trainer): | |||
inputs['hidden'] = hidden | |||
result = self.shared(**inputs) | |||
output, hidden, extra_out = result['pred'], result['hidden'], result['extra_out'] | |||
self.callback_manager.on_loss_begin(targets, result) | |||
sample_loss = self._compute_loss(result, targets) | |||
loss += sample_loss | |||
assert len(dags) == 1, 'there are multiple `hidden` for multple `dags`' | |||
return loss, hidden, extra_out | |||
def train_shared(self, pbar=None, max_step=None, dag=None): | |||
"""Train the language model for 400 steps of minibatches of 64 | |||
examples. | |||
@@ -202,9 +199,9 @@ class ENASTrainer(fastNLP.Trainer): | |||
model = self.shared | |||
model.train() | |||
self.controller.eval() | |||
hidden = self.shared.init_hidden(self.batch_size) | |||
abs_max_grad = 0 | |||
abs_max_hidden_norm = 0 | |||
step = 0 | |||
@@ -213,15 +210,15 @@ class ENASTrainer(fastNLP.Trainer): | |||
train_idx = 0 | |||
avg_loss = 0 | |||
data_iterator = Batch(self.train_data, batch_size=self.batch_size, sampler=self.sampler, as_numpy=False, | |||
prefetch=self.prefetch) | |||
prefetch=self.prefetch) | |||
for batch_x, batch_y in data_iterator: | |||
_move_dict_value_to_device(batch_x, batch_y, device=self._model_device) | |||
indices = data_iterator.get_batch_indices() | |||
# negative sampling; replace unknown; re-weight batch_y | |||
self.callback_manager.on_batch_begin(batch_x, batch_y, indices) | |||
# prediction = self._data_forward(self.model, batch_x) | |||
dags = self.controller.sample(1) | |||
inputs, targets = batch_x, batch_y | |||
# self.callback_manager.on_loss_begin(batch_y, prediction) | |||
@@ -230,18 +227,18 @@ class ENASTrainer(fastNLP.Trainer): | |||
hidden, | |||
dags) | |||
hidden.detach_() | |||
avg_loss += loss.item() | |||
# Is loss NaN or inf? requires_grad = False | |||
self.callback_manager.on_backward_begin(loss, self.model) | |||
self.callback_manager.on_backward_begin(loss) | |||
self._grad_backward(loss) | |||
self.callback_manager.on_backward_end(self.model) | |||
self.callback_manager.on_backward_end() | |||
self._update() | |||
self.callback_manager.on_step_end(self.optimizer) | |||
if (self.step+1) % self.print_every == 0: | |||
self.callback_manager.on_step_end() | |||
if (self.step + 1) % self.print_every == 0: | |||
if self.use_tqdm: | |||
print_output = "loss:{0:<6.5f}".format(avg_loss / self.print_every) | |||
pbar.update(self.print_every) | |||
@@ -257,30 +254,29 @@ class ENASTrainer(fastNLP.Trainer): | |||
self.shared_step += 1 | |||
self.callback_manager.on_batch_end() | |||
# ================= mini-batch end ==================== # | |||
def get_reward(self, dag, entropies, hidden, valid_idx=0): | |||
"""Computes the perplexity of a single sampled model on a minibatch of | |||
validation data. | |||
""" | |||
if not isinstance(entropies, np.ndarray): | |||
entropies = entropies.data.cpu().numpy() | |||
data_iterator = Batch(self.dev_data, batch_size=self.batch_size, sampler=self.sampler, as_numpy=False, | |||
prefetch=self.prefetch) | |||
prefetch=self.prefetch) | |||
for inputs, targets in data_iterator: | |||
valid_loss, hidden, _ = self.get_loss(inputs, targets, hidden, dag) | |||
valid_loss = utils.to_item(valid_loss.data) | |||
valid_ppl = math.exp(valid_loss) | |||
R = 80 / valid_ppl | |||
rewards = R + 1e-4 * entropies | |||
return rewards, hidden | |||
def train_controller(self): | |||
"""Fixes the shared parameters and updates the controller parameters. | |||
@@ -298,13 +294,13 @@ class ENASTrainer(fastNLP.Trainer): | |||
# Why can't we call shared.eval() here? Leads to loss | |||
# being uniformly zero for the controller. | |||
# self.shared.eval() | |||
avg_reward_base = None | |||
baseline = None | |||
adv_history = [] | |||
entropy_history = [] | |||
reward_history = [] | |||
hidden = self.shared.init_hidden(self.batch_size) | |||
total_loss = 0 | |||
valid_idx = 0 | |||
@@ -312,7 +308,7 @@ class ENASTrainer(fastNLP.Trainer): | |||
# sample models | |||
dags, log_probs, entropies = self.controller.sample( | |||
with_details=True) | |||
# calculate reward | |||
np_entropies = entropies.data.cpu().numpy() | |||
# No gradients should be backpropagated to the | |||
@@ -322,40 +318,39 @@ class ENASTrainer(fastNLP.Trainer): | |||
np_entropies, | |||
hidden, | |||
valid_idx) | |||
reward_history.extend(rewards) | |||
entropy_history.extend(np_entropies) | |||
# moving average baseline | |||
if baseline is None: | |||
baseline = rewards | |||
else: | |||
decay = 0.95 | |||
baseline = decay * baseline + (1 - decay) * rewards | |||
adv = rewards - baseline | |||
adv_history.extend(adv) | |||
# policy loss | |||
loss = -log_probs*utils.get_variable(adv, | |||
self.use_cuda, | |||
requires_grad=False) | |||
loss = -log_probs * utils.get_variable(adv, | |||
'cuda' in self.device, | |||
requires_grad=False) | |||
loss = loss.sum() # or loss.mean() | |||
# update | |||
self.controller_optim.zero_grad() | |||
loss.backward() | |||
self.controller_optim.step() | |||
total_loss += utils.to_item(loss.data) | |||
if ((step % 50) == 0) and (step > 0): | |||
reward_history, adv_history, entropy_history = [], [], [] | |||
total_loss = 0 | |||
self.controller_step += 1 | |||
# prev_valid_idx = valid_idx | |||
# valid_idx = ((valid_idx + self.max_length) % | |||
@@ -364,16 +359,16 @@ class ENASTrainer(fastNLP.Trainer): | |||
# # validation data, we reset the hidden states. | |||
# if prev_valid_idx > valid_idx: | |||
# hidden = self.shared.init_hidden(self.batch_size) | |||
def derive(self, sample_num=10, valid_idx=0): | |||
"""We are always deriving based on the very first batch | |||
of validation data? This seems wrong... | |||
""" | |||
hidden = self.shared.init_hidden(self.batch_size) | |||
dags, _, entropies = self.controller.sample(sample_num, | |||
with_details=True) | |||
max_R = 0 | |||
best_dag = None | |||
for dag in dags: | |||
@@ -381,5 +376,5 @@ class ENASTrainer(fastNLP.Trainer): | |||
if R.max() > max_R: | |||
max_R = R.max() | |||
best_dag = dag | |||
self.model.setDAG(best_dag) |
@@ -1,24 +1,20 @@ | |||
# Code Modified from https://github.com/carpedm20/ENAS-pytorch | |||
from __future__ import print_function | |||
from collections import defaultdict | |||
import collections | |||
from datetime import datetime | |||
import os | |||
import json | |||
import numpy as np | |||
import torch | |||
from torch.autograd import Variable | |||
def detach(h): | |||
if type(h) == Variable: | |||
return Variable(h.data) | |||
else: | |||
return tuple(detach(v) for v in h) | |||
def get_variable(inputs, cuda=False, **kwargs): | |||
if type(inputs) in [list, np.ndarray]: | |||
inputs = torch.Tensor(inputs) | |||
@@ -28,10 +24,12 @@ def get_variable(inputs, cuda=False, **kwargs): | |||
out = Variable(inputs, **kwargs) | |||
return out | |||
def update_lr(optimizer, lr): | |||
for param_group in optimizer.param_groups: | |||
param_group['lr'] = lr | |||
Node = collections.namedtuple('Node', ['id', 'name']) | |||
@@ -48,9 +46,9 @@ def to_item(x): | |||
"""Converts x, possibly scalar and possibly tensor, to a Python scalar.""" | |||
if isinstance(x, (float, int)): | |||
return x | |||
if float(torch.__version__[0:3]) < 0.4: | |||
assert (x.dim() == 1) and (len(x) == 1) | |||
return x[0] | |||
return x.item() |
@@ -0,0 +1,233 @@ | |||
""" | |||
本模块实现了两种序列标注模型 | |||
""" | |||
__all__ = [ | |||
"SeqLabeling", | |||
"AdvSeqLabel" | |||
] | |||
import torch | |||
import torch.nn as nn | |||
from .base_model import BaseModel | |||
from ..modules import decoder, encoder | |||
from ..modules.decoder.crf import allowed_transitions | |||
from ..core.utils import seq_len_to_mask | |||
from ..core.const import Const as C | |||
class SeqLabeling(BaseModel): | |||
""" | |||
别名::class:`fastNLP.models.SeqLabeling` :class:`fastNLP.models.sequence_labeling.SeqLabeling` | |||
一个基础的Sequence labeling的模型。 | |||
用于做sequence labeling的基础类。结构包含一层Embedding,一层LSTM(单向,一层),一层FC,以及一层CRF。 | |||
:param tuple(int,int),torch.FloatTensor,nn.Embedding,numpy.ndarray init_embed: Embedding的大小(传入tuple(int, int), | |||
第一个int为vocab_zie, 第二个int为embed_dim); 如果为Tensor, Embedding, ndarray等则直接使用该值初始化Embedding | |||
:param int hidden_size: LSTM隐藏层的大小 | |||
:param int num_classes: 一共有多少类 | |||
""" | |||
def __init__(self, init_embed, hidden_size, num_classes): | |||
super(SeqLabeling, self).__init__() | |||
self.Embedding = encoder.embedding.Embedding(init_embed) | |||
self.Rnn = encoder.lstm.LSTM(self.Embedding.embedding_dim, hidden_size) | |||
self.Linear = nn.Linear(hidden_size, num_classes) | |||
self.Crf = decoder.crf.ConditionalRandomField(num_classes) | |||
self.mask = None | |||
def forward(self, words, seq_len, target): | |||
""" | |||
:param torch.LongTensor words: [batch_size, max_len],序列的index | |||
:param torch.LongTensor seq_len: [batch_size,], 这个序列的长度 | |||
:param torch.LongTensor target: [batch_size, max_len], 序列的目标值 | |||
:return y: If truth is None, return list of [decode path(list)]. Used in testing and predicting. | |||
If truth is not None, return loss, a scalar. Used in training. | |||
""" | |||
assert words.shape[0] == seq_len.shape[0] | |||
assert target.shape == words.shape | |||
self.mask = self._make_mask(words, seq_len) | |||
x = self.Embedding(words) | |||
# [batch_size, max_len, word_emb_dim] | |||
x, _ = self.Rnn(x, seq_len) | |||
# [batch_size, max_len, hidden_size * direction] | |||
x = self.Linear(x) | |||
# [batch_size, max_len, num_classes] | |||
return {C.LOSS: self._internal_loss(x, target)} | |||
def predict(self, words, seq_len): | |||
""" | |||
用于在预测时使用 | |||
:param torch.LongTensor words: [batch_size, max_len] | |||
:param torch.LongTensor seq_len: [batch_size,] | |||
:return: {'pred': xx}, [batch_size, max_len] | |||
""" | |||
self.mask = self._make_mask(words, seq_len) | |||
x = self.Embedding(words) | |||
# [batch_size, max_len, word_emb_dim] | |||
x, _ = self.Rnn(x, seq_len) | |||
# [batch_size, max_len, hidden_size * direction] | |||
x = self.Linear(x) | |||
# [batch_size, max_len, num_classes] | |||
pred = self._decode(x) | |||
return {C.OUTPUT: pred} | |||
def _internal_loss(self, x, y): | |||
""" | |||
Negative log likelihood loss. | |||
:param x: Tensor, [batch_size, max_len, tag_size] | |||
:param y: Tensor, [batch_size, max_len] | |||
:return loss: a scalar Tensor | |||
""" | |||
x = x.float() | |||
y = y.long() | |||
assert x.shape[:2] == y.shape | |||
assert y.shape == self.mask.shape | |||
total_loss = self.Crf(x, y, self.mask) | |||
return torch.mean(total_loss) | |||
def _make_mask(self, x, seq_len): | |||
batch_size, max_len = x.size(0), x.size(1) | |||
mask = seq_len_to_mask(seq_len) | |||
mask = mask.view(batch_size, max_len) | |||
mask = mask.to(x).float() | |||
return mask | |||
def _decode(self, x): | |||
""" | |||
:param torch.FloatTensor x: [batch_size, max_len, tag_size] | |||
:return prediction: [batch_size, max_len] | |||
""" | |||
tag_seq, _ = self.Crf.viterbi_decode(x, self.mask) | |||
return tag_seq | |||
class AdvSeqLabel(nn.Module): | |||
""" | |||
别名::class:`fastNLP.models.AdvSeqLabel` :class:`fastNLP.models.sequence_labeling.AdvSeqLabel` | |||
更复杂的Sequence Labelling模型。结构为Embedding, LayerNorm, 双向LSTM(两层),FC,LayerNorm,DropOut,FC,CRF。 | |||
:param tuple(int,int),torch.FloatTensor,nn.Embedding,numpy.ndarray init_embed: Embedding的大小(传入tuple(int, int), | |||
第一个int为vocab_zie, 第二个int为embed_dim); 如果为Tensor, Embedding, ndarray等则直接使用该值初始化Embedding | |||
:param int hidden_size: LSTM的隐层大小 | |||
:param int num_classes: 有多少个类 | |||
:param float dropout: LSTM中以及DropOut层的drop概率 | |||
:param dict id2words: tag id转为其tag word的表。用于在CRF解码时防止解出非法的顺序,比如'BMES'这个标签规范中,'S' | |||
不能出现在'B'之后。这里也支持类似与'B-NN',即'-'前为标签类型的指示,后面为具体的tag的情况。这里不但会保证 | |||
'B-NN'后面不为'S-NN'还会保证'B-NN'后面不会出现'M-xx'(任何非'M-NN'和'E-NN'的情况。) | |||
:param str encoding_type: 支持"BIO", "BMES", "BEMSO", 只有在id2words不为None的情况有用。 | |||
""" | |||
def __init__(self, init_embed, hidden_size, num_classes, dropout=0.3, id2words=None, encoding_type='bmes'): | |||
super().__init__() | |||
self.Embedding = encoder.embedding.Embedding(init_embed) | |||
self.norm1 = torch.nn.LayerNorm(self.Embedding.embedding_dim) | |||
self.Rnn = encoder.LSTM(input_size=self.Embedding.embedding_dim, hidden_size=hidden_size, num_layers=2, | |||
dropout=dropout, | |||
bidirectional=True, batch_first=True) | |||
self.Linear1 = nn.Linear(hidden_size * 2, hidden_size * 2 // 3) | |||
self.norm2 = torch.nn.LayerNorm(hidden_size * 2 // 3) | |||
self.relu = torch.nn.LeakyReLU() | |||
self.drop = torch.nn.Dropout(dropout) | |||
self.Linear2 = nn.Linear(hidden_size * 2 // 3, num_classes) | |||
if id2words is None: | |||
self.Crf = decoder.crf.ConditionalRandomField(num_classes, include_start_end_trans=False) | |||
else: | |||
self.Crf = decoder.crf.ConditionalRandomField(num_classes, include_start_end_trans=False, | |||
allowed_transitions=allowed_transitions(id2words, | |||
encoding_type=encoding_type)) | |||
def _decode(self, x): | |||
""" | |||
:param torch.FloatTensor x: [batch_size, max_len, tag_size] | |||
:return torch.LongTensor, [batch_size, max_len] | |||
""" | |||
tag_seq, _ = self.Crf.viterbi_decode(x, self.mask) | |||
return tag_seq | |||
def _internal_loss(self, x, y): | |||
""" | |||
Negative log likelihood loss. | |||
:param x: Tensor, [batch_size, max_len, tag_size] | |||
:param y: Tensor, [batch_size, max_len] | |||
:return loss: a scalar Tensor | |||
""" | |||
x = x.float() | |||
y = y.long() | |||
assert x.shape[:2] == y.shape | |||
assert y.shape == self.mask.shape | |||
total_loss = self.Crf(x, y, self.mask) | |||
return torch.mean(total_loss) | |||
def _make_mask(self, x, seq_len): | |||
batch_size, max_len = x.size(0), x.size(1) | |||
mask = seq_len_to_mask(seq_len) | |||
mask = mask.view(batch_size, max_len) | |||
mask = mask.to(x).float() | |||
return mask | |||
def _forward(self, words, seq_len, target=None): | |||
""" | |||
:param torch.LongTensor words: [batch_size, mex_len] | |||
:param torch.LongTensor seq_len:[batch_size, ] | |||
:param torch.LongTensor target: [batch_size, max_len] | |||
:return y: If truth is None, return list of [decode path(list)]. Used in testing and predicting. | |||
If truth is not None, return loss, a scalar. Used in training. | |||
""" | |||
words = words.long() | |||
seq_len = seq_len.long() | |||
self.mask = self._make_mask(words, seq_len) | |||
# seq_len = seq_len.long() | |||
target = target.long() if target is not None else None | |||
if next(self.parameters()).is_cuda: | |||
words = words.cuda() | |||
self.mask = self.mask.cuda() | |||
x = self.Embedding(words) | |||
x = self.norm1(x) | |||
# [batch_size, max_len, word_emb_dim] | |||
x, _ = self.Rnn(x, seq_len=seq_len) | |||
x = self.Linear1(x) | |||
x = self.norm2(x) | |||
x = self.relu(x) | |||
x = self.drop(x) | |||
x = self.Linear2(x) | |||
if target is not None: | |||
return {"loss": self._internal_loss(x, target)} | |||
else: | |||
return {"pred": self._decode(x)} | |||
def forward(self, words, seq_len, target): | |||
""" | |||
:param torch.LongTensor words: [batch_size, mex_len] | |||
:param torch.LongTensor seq_len: [batch_size, ] | |||
:param torch.LongTensor target: [batch_size, max_len], 目标 | |||
:return torch.Tensor: a scalar loss | |||
""" | |||
return self._forward(words, seq_len, target) | |||
def predict(self, words, seq_len): | |||
""" | |||
:param torch.LongTensor words: [batch_size, mex_len] | |||
:param torch.LongTensor seq_len: [batch_size, ] | |||
:return torch.LongTensor: [batch_size, max_len] | |||
""" | |||
return self._forward(words, seq_len) |
@@ -1,225 +0,0 @@ | |||
import torch | |||
from fastNLP.models.base_model import BaseModel | |||
from fastNLP.modules import decoder, encoder | |||
from fastNLP.modules.decoder.CRF import allowed_transitions | |||
from fastNLP.modules.utils import seq_mask | |||
class SeqLabeling(BaseModel): | |||
""" | |||
PyTorch Network for sequence labeling | |||
""" | |||
def __init__(self, args): | |||
super(SeqLabeling, self).__init__() | |||
vocab_size = args["vocab_size"] | |||
word_emb_dim = args["word_emb_dim"] | |||
hidden_dim = args["rnn_hidden_units"] | |||
num_classes = args["num_classes"] | |||
self.Embedding = encoder.embedding.Embedding(vocab_size, word_emb_dim) | |||
self.Rnn = encoder.lstm.LSTM(word_emb_dim, hidden_dim) | |||
self.Linear = encoder.linear.Linear(hidden_dim, num_classes) | |||
self.Crf = decoder.CRF.ConditionalRandomField(num_classes) | |||
self.mask = None | |||
def forward(self, word_seq, word_seq_origin_len, truth=None): | |||
""" | |||
:param word_seq: LongTensor, [batch_size, mex_len] | |||
:param word_seq_origin_len: LongTensor, [batch_size,], the origin lengths of the sequences. | |||
:param truth: LongTensor, [batch_size, max_len] | |||
:return y: If truth is None, return list of [decode path(list)]. Used in testing and predicting. | |||
If truth is not None, return loss, a scalar. Used in training. | |||
""" | |||
assert word_seq.shape[0] == word_seq_origin_len.shape[0] | |||
if truth is not None: | |||
assert truth.shape == word_seq.shape | |||
self.mask = self.make_mask(word_seq, word_seq_origin_len) | |||
x = self.Embedding(word_seq) | |||
# [batch_size, max_len, word_emb_dim] | |||
x = self.Rnn(x) | |||
# [batch_size, max_len, hidden_size * direction] | |||
x = self.Linear(x) | |||
# [batch_size, max_len, num_classes] | |||
return {"loss": self._internal_loss(x, truth) if truth is not None else None, | |||
"predict": self.decode(x)} | |||
def loss(self, x, y): | |||
""" Since the loss has been computed in forward(), this function simply returns x.""" | |||
return x | |||
def _internal_loss(self, x, y): | |||
""" | |||
Negative log likelihood loss. | |||
:param x: Tensor, [batch_size, max_len, tag_size] | |||
:param y: Tensor, [batch_size, max_len] | |||
:return loss: a scalar Tensor | |||
""" | |||
x = x.float() | |||
y = y.long() | |||
assert x.shape[:2] == y.shape | |||
assert y.shape == self.mask.shape | |||
total_loss = self.Crf(x, y, self.mask) | |||
return torch.mean(total_loss) | |||
def make_mask(self, x, seq_len): | |||
batch_size, max_len = x.size(0), x.size(1) | |||
mask = seq_mask(seq_len, max_len) | |||
mask = mask.view(batch_size, max_len) | |||
mask = mask.to(x).float() | |||
return mask | |||
def decode(self, x, pad=True): | |||
""" | |||
:param x: FloatTensor, [batch_size, max_len, tag_size] | |||
:param pad: pad the output sequence to equal lengths | |||
:return prediction: list of [decode path(list)] | |||
""" | |||
max_len = x.shape[1] | |||
tag_seq = self.Crf.viterbi_decode(x, self.mask) | |||
# pad prediction to equal length | |||
if pad is True: | |||
for pred in tag_seq: | |||
if len(pred) < max_len: | |||
pred += [0] * (max_len - len(pred)) | |||
return tag_seq | |||
class AdvSeqLabel(SeqLabeling): | |||
""" | |||
Advanced Sequence Labeling Model | |||
""" | |||
def __init__(self, args, emb=None, id2words=None): | |||
super(AdvSeqLabel, self).__init__(args) | |||
vocab_size = args["vocab_size"] | |||
word_emb_dim = args["word_emb_dim"] | |||
hidden_dim = args["rnn_hidden_units"] | |||
num_classes = args["num_classes"] | |||
dropout = args['dropout'] | |||
self.Embedding = encoder.embedding.Embedding(vocab_size, word_emb_dim, init_emb=emb) | |||
self.norm1 = torch.nn.LayerNorm(word_emb_dim) | |||
# self.Rnn = encoder.lstm.LSTM(word_emb_dim, hidden_dim, num_layers=2, dropout=dropout, bidirectional=True) | |||
self.Rnn = torch.nn.LSTM(input_size=word_emb_dim, hidden_size=hidden_dim, num_layers=2, dropout=dropout, | |||
bidirectional=True, batch_first=True) | |||
self.Linear1 = encoder.Linear(hidden_dim * 2, hidden_dim * 2 // 3) | |||
self.norm2 = torch.nn.LayerNorm(hidden_dim * 2 // 3) | |||
# self.batch_norm = torch.nn.BatchNorm1d(hidden_dim * 2 // 3) | |||
self.relu = torch.nn.LeakyReLU() | |||
self.drop = torch.nn.Dropout(dropout) | |||
self.Linear2 = encoder.Linear(hidden_dim * 2 // 3, num_classes) | |||
if id2words is None: | |||
self.Crf = decoder.CRF.ConditionalRandomField(num_classes, include_start_end_trans=False) | |||
else: | |||
self.Crf = decoder.CRF.ConditionalRandomField(num_classes, include_start_end_trans=False, | |||
allowed_transitions=allowed_transitions(id2words, | |||
encoding_type="bmes")) | |||
def forward(self, word_seq, word_seq_origin_len, truth=None): | |||
""" | |||
:param word_seq: LongTensor, [batch_size, mex_len] | |||
:param word_seq_origin_len: LongTensor, [batch_size, ] | |||
:param truth: LongTensor, [batch_size, max_len] | |||
:return y: If truth is None, return list of [decode path(list)]. Used in testing and predicting. | |||
If truth is not None, return loss, a scalar. Used in training. | |||
""" | |||
word_seq = word_seq.long() | |||
word_seq_origin_len = word_seq_origin_len.long() | |||
self.mask = self.make_mask(word_seq, word_seq_origin_len) | |||
sent_len, idx_sort = torch.sort(word_seq_origin_len, descending=True) | |||
_, idx_unsort = torch.sort(idx_sort, descending=False) | |||
# word_seq_origin_len = word_seq_origin_len.long() | |||
truth = truth.long() if truth is not None else None | |||
batch_size = word_seq.size(0) | |||
max_len = word_seq.size(1) | |||
if next(self.parameters()).is_cuda: | |||
word_seq = word_seq.cuda() | |||
idx_sort = idx_sort.cuda() | |||
idx_unsort = idx_unsort.cuda() | |||
self.mask = self.mask.cuda() | |||
x = self.Embedding(word_seq) | |||
x = self.norm1(x) | |||
# [batch_size, max_len, word_emb_dim] | |||
sent_variable = x[idx_sort] | |||
sent_packed = torch.nn.utils.rnn.pack_padded_sequence(sent_variable, sent_len, batch_first=True) | |||
x, _ = self.Rnn(sent_packed) | |||
# print(x) | |||
# [batch_size, max_len, hidden_size * direction] | |||
sent_output = torch.nn.utils.rnn.pad_packed_sequence(x, batch_first=True)[0] | |||
x = sent_output[idx_unsort] | |||
x = x.contiguous() | |||
# x = x.view(batch_size * max_len, -1) | |||
x = self.Linear1(x) | |||
# x = self.batch_norm(x) | |||
x = self.norm2(x) | |||
x = self.relu(x) | |||
x = self.drop(x) | |||
x = self.Linear2(x) | |||
# x = x.view(batch_size, max_len, -1) | |||
# [batch_size, max_len, num_classes] | |||
# TODO seq_lens的key这样做不合理 | |||
return {"loss": self._internal_loss(x, truth) if truth is not None else None, | |||
"predict": self.decode(x), | |||
'word_seq_origin_len': word_seq_origin_len} | |||
def predict(self, **x): | |||
out = self.forward(**x) | |||
return {"predict": out["predict"]} | |||
def loss(self, **kwargs): | |||
assert 'loss' in kwargs | |||
return kwargs['loss'] | |||
if __name__ == '__main__': | |||
args = { | |||
'vocab_size': 20, | |||
'word_emb_dim': 100, | |||
'rnn_hidden_units': 100, | |||
'num_classes': 10, | |||
} | |||
model = AdvSeqLabel(args) | |||
data = [] | |||
for i in range(20): | |||
word_seq = torch.randint(20, (15,)).long() | |||
word_seq_len = torch.LongTensor([15]) | |||
truth = torch.randint(10, (15,)).long() | |||
data.append((word_seq, word_seq_len, truth)) | |||
optimizer = torch.optim.Adam(model.parameters(), lr=0.01) | |||
print(model) | |||
curidx = 0 | |||
for i in range(1000): | |||
endidx = min(len(data), curidx + 5) | |||
b_word, b_len, b_truth = [], [], [] | |||
for word_seq, word_seq_len, truth in data[curidx: endidx]: | |||
b_word.append(word_seq) | |||
b_len.append(word_seq_len) | |||
b_truth.append(truth) | |||
word_seq = torch.stack(b_word, dim=0) | |||
word_seq_len = torch.cat(b_len, dim=0) | |||
truth = torch.stack(b_truth, dim=0) | |||
res = model(word_seq, word_seq_len, truth) | |||
loss = res['loss'] | |||
pred = res['predict'] | |||
print('loss: {} acc {}'.format(loss.item(), | |||
((pred.data == truth).long().sum().float() / word_seq_len.sum().float()))) | |||
optimizer.zero_grad() | |||
loss.backward() | |||
optimizer.step() | |||
curidx = endidx | |||
if curidx == len(data): | |||
curidx = 0 |
@@ -1,114 +1,152 @@ | |||
__all__ = [ | |||
"ESIM" | |||
] | |||
import torch | |||
import torch.nn as nn | |||
import torch.nn.functional as F | |||
from fastNLP.models.base_model import BaseModel | |||
from fastNLP.modules import decoder as Decoder | |||
from fastNLP.modules import encoder as Encoder | |||
from fastNLP.modules import aggregator as Aggregator | |||
from .base_model import BaseModel | |||
from ..core.const import Const | |||
from ..modules import decoder as Decoder | |||
from ..modules import encoder as Encoder | |||
from ..modules import aggregator as Aggregator | |||
from ..core.utils import seq_len_to_mask | |||
my_inf = 10e12 | |||
class ESIM(BaseModel): | |||
""" | |||
PyTorch Network for SNLI task using ESIM model. | |||
""" | |||
别名::class:`fastNLP.models.ESIM` :class:`fastNLP.models.snli.ESIM` | |||
def __init__(self, **kwargs): | |||
super(ESIM, self).__init__() | |||
self.vocab_size = kwargs["vocab_size"] | |||
self.embed_dim = kwargs["embed_dim"] | |||
self.hidden_size = kwargs["hidden_size"] | |||
self.batch_first = kwargs["batch_first"] | |||
self.dropout = kwargs["dropout"] | |||
self.n_labels = kwargs["num_classes"] | |||
self.gpu = kwargs["gpu"] and torch.cuda.is_available() | |||
ESIM模型的一个PyTorch实现。 | |||
ESIM模型的论文: Enhanced LSTM for Natural Language Inference (arXiv: 1609.06038) | |||
:param int vocab_size: 词表大小 | |||
:param int embed_dim: 词嵌入维度 | |||
:param int hidden_size: LSTM隐层大小 | |||
:param float dropout: dropout大小,默认为0 | |||
:param int num_classes: 标签数目,默认为3 | |||
:param numpy.array init_embedding: 初始词嵌入矩阵,形状为(vocab_size, embed_dim),默认为None,即随机初始化词嵌入矩阵 | |||
""" | |||
def __init__(self, vocab_size, embed_dim, hidden_size, dropout=0.0, num_classes=3, init_embedding=None): | |||
super(ESIM, self).__init__() | |||
self.vocab_size = vocab_size | |||
self.embed_dim = embed_dim | |||
self.hidden_size = hidden_size | |||
self.dropout = dropout | |||
self.n_labels = num_classes | |||
self.drop = nn.Dropout(self.dropout) | |||
self.embedding = Encoder.Embedding( | |||
self.vocab_size, self.embed_dim, dropout=self.dropout, | |||
init_emb=kwargs["init_embedding"] if "inin_embedding" in kwargs.keys() else None, | |||
(self.vocab_size, self.embed_dim), dropout=self.dropout, | |||
) | |||
self.embedding_layer = Encoder.Linear(self.embed_dim, self.hidden_size) | |||
self.embedding_layer = nn.Linear(self.embed_dim, self.hidden_size) | |||
self.encoder = Encoder.LSTM( | |||
input_size=self.embed_dim, hidden_size=self.hidden_size, num_layers=1, bias=True, | |||
batch_first=self.batch_first, bidirectional=True | |||
batch_first=True, bidirectional=True | |||
) | |||
self.bi_attention = Aggregator.Bi_Attention() | |||
self.mean_pooling = Aggregator.MeanPoolWithMask() | |||
self.bi_attention = Aggregator.BiAttention() | |||
self.mean_pooling = Aggregator.AvgPoolWithMask() | |||
self.max_pooling = Aggregator.MaxPoolWithMask() | |||
self.inference_layer = Encoder.Linear(self.hidden_size * 4, self.hidden_size) | |||
self.inference_layer = nn.Linear(self.hidden_size * 4, self.hidden_size) | |||
self.decoder = Encoder.LSTM( | |||
input_size=self.hidden_size, hidden_size=self.hidden_size, num_layers=1, bias=True, | |||
batch_first=self.batch_first, bidirectional=True | |||
batch_first=True, bidirectional=True | |||
) | |||
self.output = Decoder.MLP([4 * self.hidden_size, self.hidden_size, self.n_labels], 'tanh', dropout=self.dropout) | |||
def forward(self, premise, hypothesis, premise_len, hypothesis_len): | |||
def forward(self, words1, words2, seq_len1=None, seq_len2=None, target=None): | |||
""" Forward function | |||
:param premise: A Tensor represents premise: [batch size(B), premise seq len(PL)]. | |||
:param hypothesis: A Tensor represents hypothesis: [B, hypothesis seq len(HL)]. | |||
:param premise_len: A Tensor record which is a real word and which is a padding word in premise: [B, PL]. | |||
:param hypothesis_len: A Tensor record which is a real word and which is a padding word in hypothesis: [B, HL]. | |||
:return: prediction: A Dict with Tensor of classification result: [B, n_labels(N)]. | |||
:param torch.Tensor words1: [batch size(B), premise seq len(PL)] premise的token表示 | |||
:param torch.Tensor words2: [B, hypothesis seq len(HL)] hypothesis的token表示 | |||
:param torch.LongTensor seq_len1: [B] premise的长度 | |||
:param torch.LongTensor seq_len2: [B] hypothesis的长度 | |||
:param torch.LongTensor target: [B] 真实目标值 | |||
:return: dict prediction: [B, n_labels(N)] 预测结果 | |||
""" | |||
premise0 = self.embedding_layer(self.embedding(premise)) | |||
hypothesis0 = self.embedding_layer(self.embedding(hypothesis)) | |||
premise0 = self.embedding_layer(self.embedding(words1)) | |||
hypothesis0 = self.embedding_layer(self.embedding(words2)) | |||
if seq_len1 is not None: | |||
seq_len1 = seq_len_to_mask(seq_len1) | |||
else: | |||
seq_len1 = torch.ones(premise0.size(0), premise0.size(1)) | |||
seq_len1 = (seq_len1.long()).to(device=premise0.device) | |||
if seq_len2 is not None: | |||
seq_len2 = seq_len_to_mask(seq_len2) | |||
else: | |||
seq_len2 = torch.ones(hypothesis0.size(0), hypothesis0.size(1)) | |||
seq_len2 = (seq_len2.long()).to(device=hypothesis0.device) | |||
_BP, _PSL, _HP = premise0.size() | |||
_BH, _HSL, _HH = hypothesis0.size() | |||
_BPL, _PLL = premise_len.size() | |||
_HPL, _HLL = hypothesis_len.size() | |||
_BPL, _PLL = seq_len1.size() | |||
_HPL, _HLL = seq_len2.size() | |||
assert _BP == _BH and _BPL == _HPL and _BP == _BPL | |||
assert _HP == _HH | |||
assert _PSL == _PLL and _HSL == _HLL | |||
B, PL, H = premise0.size() | |||
B, HL, H = hypothesis0.size() | |||
a0 = self.encoder(self.drop(premise0)) # a0: [B, PL, H * 2] | |||
b0 = self.encoder(self.drop(hypothesis0)) # b0: [B, HL, H * 2] | |||
a = torch.mean(a0.view(B, PL, -1, H), dim=2) # a: [B, PL, H] | |||
b = torch.mean(b0.view(B, HL, -1, H), dim=2) # b: [B, HL, H] | |||
ai, bi = self.bi_attention(a, b, premise_len, hypothesis_len) | |||
ai, bi = self.bi_attention(a, b, seq_len1, seq_len2) | |||
ma = torch.cat((a, ai, a - ai, a * ai), dim=2) # ma: [B, PL, 4 * H] | |||
mb = torch.cat((b, bi, b - bi, b * bi), dim=2) # mb: [B, HL, 4 * H] | |||
f_ma = self.inference_layer(ma) | |||
f_mb = self.inference_layer(mb) | |||
vat = self.decoder(self.drop(f_ma)) | |||
vbt = self.decoder(self.drop(f_mb)) | |||
va = torch.mean(vat.view(B, PL, -1, H), dim=2) # va: [B, PL, H] | |||
vb = torch.mean(vbt.view(B, HL, -1, H), dim=2) # vb: [B, HL, H] | |||
va_ave = self.mean_pooling(va, premise_len, dim=1) # va_ave: [B, H] | |||
va_max, va_arg_max = self.max_pooling(va, premise_len, dim=1) # va_max: [B, H] | |||
vb_ave = self.mean_pooling(vb, hypothesis_len, dim=1) # vb_ave: [B, H] | |||
vb_max, vb_arg_max = self.max_pooling(vb, hypothesis_len, dim=1) # vb_max: [B, H] | |||
va_ave = self.mean_pooling(va, seq_len1, dim=1) # va_ave: [B, H] | |||
va_max, va_arg_max = self.max_pooling(va, seq_len1, dim=1) # va_max: [B, H] | |||
vb_ave = self.mean_pooling(vb, seq_len2, dim=1) # vb_ave: [B, H] | |||
vb_max, vb_arg_max = self.max_pooling(vb, seq_len2, dim=1) # vb_max: [B, H] | |||
v = torch.cat((va_ave, va_max, vb_ave, vb_max), dim=1) # v: [B, 4 * H] | |||
prediction = F.tanh(self.output(v)) # prediction: [B, N] | |||
return {'pred': prediction} | |||
def predict(self, premise, hypothesis, premise_len, hypothesis_len): | |||
return self.forward(premise, hypothesis, premise_len, hypothesis_len) | |||
prediction = torch.tanh(self.output(v)) # prediction: [B, N] | |||
if target is not None: | |||
func = nn.CrossEntropyLoss() | |||
loss = func(prediction, target) | |||
return {Const.OUTPUT: prediction, Const.LOSS: loss} | |||
return {Const.OUTPUT: prediction} | |||
def predict(self, words1, words2, seq_len1=None, seq_len2=None, target=None): | |||
""" Predict function | |||
:param torch.Tensor words1: [batch size(B), premise seq len(PL)] premise的token表示 | |||
:param torch.Tensor words2: [B, hypothesis seq len(HL)] hypothesis的token表示 | |||
:param torch.LongTensor seq_len1: [B] premise的长度 | |||
:param torch.LongTensor seq_len2: [B] hypothesis的长度 | |||
:param torch.LongTensor target: [B] 真实目标值 | |||
:return: dict prediction: [B, n_labels(N)] 预测结果 | |||
""" | |||
prediction = self.forward(words1, words2, seq_len1, seq_len2)[Const.OUTPUT] | |||
return {Const.OUTPUT: torch.argmax(prediction, dim=-1)} |
@@ -0,0 +1,307 @@ | |||
""" | |||
Star-Transformer 的 Pytorch 实现。 | |||
""" | |||
__all__ = [ | |||
"StarTransEnc", | |||
"STNLICls", | |||
"STSeqCls", | |||
"STSeqLabel", | |||
] | |||
import torch | |||
from torch import nn | |||
from ..modules.encoder.star_transformer import StarTransformer | |||
from ..core.utils import seq_len_to_mask | |||
from ..modules.utils import get_embeddings | |||
from ..core.const import Const | |||
class StarTransEnc(nn.Module): | |||
""" | |||
别名::class:`fastNLP.models.StarTransEnc` :class:`fastNLP.models.star_transformer.StarTransEnc` | |||
带word embedding的Star-Transformer Encoder | |||
:param init_embed: 单词词典, 可以是 tuple, 包括(num_embedings, embedding_dim), 即 | |||
embedding的大小和每个词的维度. 也可以传入 nn.Embedding 对象, | |||
此时就以传入的对象作为embedding | |||
:param hidden_size: 模型中特征维度. | |||
:param num_layers: 模型层数. | |||
:param num_head: 模型中multi-head的head个数. | |||
:param head_dim: 模型中multi-head中每个head特征维度. | |||
:param max_len: 模型能接受的最大输入长度. | |||
:param emb_dropout: 词嵌入的dropout概率. | |||
:param dropout: 模型除词嵌入外的dropout概率. | |||
""" | |||
def __init__(self, init_embed, | |||
hidden_size, | |||
num_layers, | |||
num_head, | |||
head_dim, | |||
max_len, | |||
emb_dropout, | |||
dropout): | |||
super(StarTransEnc, self).__init__() | |||
self.embedding = get_embeddings(init_embed) | |||
emb_dim = self.embedding.embedding_dim | |||
self.emb_fc = nn.Linear(emb_dim, hidden_size) | |||
self.emb_drop = nn.Dropout(emb_dropout) | |||
self.encoder = StarTransformer(hidden_size=hidden_size, | |||
num_layers=num_layers, | |||
num_head=num_head, | |||
head_dim=head_dim, | |||
dropout=dropout, | |||
max_len=max_len) | |||
def forward(self, x, mask): | |||
""" | |||
:param FloatTensor x: [batch, length, hidden] 输入的序列 | |||
:param ByteTensor mask: [batch, length] 输入序列的padding mask, 在没有内容(padding 部分) 为 0, | |||
否则为 1 | |||
:return: [batch, length, hidden] 编码后的输出序列 | |||
[batch, hidden] 全局 relay 节点, 详见论文 | |||
""" | |||
x = self.embedding(x) | |||
x = self.emb_fc(self.emb_drop(x)) | |||
nodes, relay = self.encoder(x, mask) | |||
return nodes, relay | |||
class _Cls(nn.Module): | |||
def __init__(self, in_dim, num_cls, hid_dim, dropout=0.1): | |||
super(_Cls, self).__init__() | |||
self.fc = nn.Sequential( | |||
nn.Linear(in_dim, hid_dim), | |||
nn.LeakyReLU(), | |||
nn.Dropout(dropout), | |||
nn.Linear(hid_dim, num_cls), | |||
) | |||
def forward(self, x): | |||
h = self.fc(x) | |||
return h | |||
class _NLICls(nn.Module): | |||
def __init__(self, in_dim, num_cls, hid_dim, dropout=0.1): | |||
super(_NLICls, self).__init__() | |||
self.fc = nn.Sequential( | |||
nn.Dropout(dropout), | |||
nn.Linear(in_dim * 4, hid_dim), # 4 | |||
nn.LeakyReLU(), | |||
nn.Dropout(dropout), | |||
nn.Linear(hid_dim, num_cls), | |||
) | |||
def forward(self, x1, x2): | |||
x = torch.cat([x1, x2, torch.abs(x1 - x2), x1 * x2], 1) | |||
h = self.fc(x) | |||
return h | |||
class STSeqLabel(nn.Module): | |||
""" | |||
别名::class:`fastNLP.models.STSeqLabel` :class:`fastNLP.models.star_transformer.STSeqLabel` | |||
用于序列标注的Star-Transformer模型 | |||
:param init_embed: 单词词典, 可以是 tuple, 包括(num_embedings, embedding_dim), 即 | |||
embedding的大小和每个词的维度. 也可以传入 nn.Embedding 对象, | |||
此时就以传入的对象作为embedding | |||
:param num_cls: 输出类别个数 | |||
:param hidden_size: 模型中特征维度. Default: 300 | |||
:param num_layers: 模型层数. Default: 4 | |||
:param num_head: 模型中multi-head的head个数. Default: 8 | |||
:param head_dim: 模型中multi-head中每个head特征维度. Default: 32 | |||
:param max_len: 模型能接受的最大输入长度. Default: 512 | |||
:param cls_hidden_size: 分类器隐层维度. Default: 600 | |||
:param emb_dropout: 词嵌入的dropout概率. Default: 0.1 | |||
:param dropout: 模型除词嵌入外的dropout概率. Default: 0.1 | |||
""" | |||
def __init__(self, init_embed, num_cls, | |||
hidden_size=300, | |||
num_layers=4, | |||
num_head=8, | |||
head_dim=32, | |||
max_len=512, | |||
cls_hidden_size=600, | |||
emb_dropout=0.1, | |||
dropout=0.1, ): | |||
super(STSeqLabel, self).__init__() | |||
self.enc = StarTransEnc(init_embed=init_embed, | |||
hidden_size=hidden_size, | |||
num_layers=num_layers, | |||
num_head=num_head, | |||
head_dim=head_dim, | |||
max_len=max_len, | |||
emb_dropout=emb_dropout, | |||
dropout=dropout) | |||
self.cls = _Cls(hidden_size, num_cls, cls_hidden_size) | |||
def forward(self, words, seq_len): | |||
""" | |||
:param words: [batch, seq_len] 输入序列 | |||
:param seq_len: [batch,] 输入序列的长度 | |||
:return output: [batch, num_cls, seq_len] 输出序列中每个元素的分类的概率 | |||
""" | |||
mask = seq_len_to_mask(seq_len) | |||
nodes, _ = self.enc(words, mask) | |||
output = self.cls(nodes) | |||
output = output.transpose(1, 2) # make hidden to be dim 1 | |||
return {Const.OUTPUT: output} # [bsz, n_cls, seq_len] | |||
def predict(self, words, seq_len): | |||
""" | |||
:param words: [batch, seq_len] 输入序列 | |||
:param seq_len: [batch,] 输入序列的长度 | |||
:return output: [batch, seq_len] 输出序列中每个元素的分类 | |||
""" | |||
y = self.forward(words, seq_len) | |||
_, pred = y[Const.OUTPUT].max(1) | |||
return {Const.OUTPUT: pred} | |||
class STSeqCls(nn.Module): | |||
""" | |||
别名::class:`fastNLP.models.STSeqCls` :class:`fastNLP.models.star_transformer.STSeqCls` | |||
用于分类任务的Star-Transformer | |||
:param init_embed: 单词词典, 可以是 tuple, 包括(num_embedings, embedding_dim), 即 | |||
embedding的大小和每个词的维度. 也可以传入 nn.Embedding 对象, | |||
此时就以传入的对象作为embedding | |||
:param num_cls: 输出类别个数 | |||
:param hidden_size: 模型中特征维度. Default: 300 | |||
:param num_layers: 模型层数. Default: 4 | |||
:param num_head: 模型中multi-head的head个数. Default: 8 | |||
:param head_dim: 模型中multi-head中每个head特征维度. Default: 32 | |||
:param max_len: 模型能接受的最大输入长度. Default: 512 | |||
:param cls_hidden_size: 分类器隐层维度. Default: 600 | |||
:param emb_dropout: 词嵌入的dropout概率. Default: 0.1 | |||
:param dropout: 模型除词嵌入外的dropout概率. Default: 0.1 | |||
""" | |||
def __init__(self, init_embed, num_cls, | |||
hidden_size=300, | |||
num_layers=4, | |||
num_head=8, | |||
head_dim=32, | |||
max_len=512, | |||
cls_hidden_size=600, | |||
emb_dropout=0.1, | |||
dropout=0.1, ): | |||
super(STSeqCls, self).__init__() | |||
self.enc = StarTransEnc(init_embed=init_embed, | |||
hidden_size=hidden_size, | |||
num_layers=num_layers, | |||
num_head=num_head, | |||
head_dim=head_dim, | |||
max_len=max_len, | |||
emb_dropout=emb_dropout, | |||
dropout=dropout) | |||
self.cls = _Cls(hidden_size, num_cls, cls_hidden_size) | |||
def forward(self, words, seq_len): | |||
""" | |||
:param words: [batch, seq_len] 输入序列 | |||
:param seq_len: [batch,] 输入序列的长度 | |||
:return output: [batch, num_cls] 输出序列的分类的概率 | |||
""" | |||
mask = seq_len_to_mask(seq_len) | |||
nodes, relay = self.enc(words, mask) | |||
y = 0.5 * (relay + nodes.max(1)[0]) | |||
output = self.cls(y) # [bsz, n_cls] | |||
return {Const.OUTPUT: output} | |||
def predict(self, words, seq_len): | |||
""" | |||
:param words: [batch, seq_len] 输入序列 | |||
:param seq_len: [batch,] 输入序列的长度 | |||
:return output: [batch, num_cls] 输出序列的分类 | |||
""" | |||
y = self.forward(words, seq_len) | |||
_, pred = y[Const.OUTPUT].max(1) | |||
return {Const.OUTPUT: pred} | |||
class STNLICls(nn.Module): | |||
""" | |||
别名::class:`fastNLP.models.STNLICls` :class:`fastNLP.models.star_transformer.STNLICls` | |||
用于自然语言推断(NLI)的Star-Transformer | |||
:param init_embed: 单词词典, 可以是 tuple, 包括(num_embedings, embedding_dim), 即 | |||
embedding的大小和每个词的维度. 也可以传入 nn.Embedding 对象, | |||
此时就以传入的对象作为embedding | |||
:param num_cls: 输出类别个数 | |||
:param hidden_size: 模型中特征维度. Default: 300 | |||
:param num_layers: 模型层数. Default: 4 | |||
:param num_head: 模型中multi-head的head个数. Default: 8 | |||
:param head_dim: 模型中multi-head中每个head特征维度. Default: 32 | |||
:param max_len: 模型能接受的最大输入长度. Default: 512 | |||
:param cls_hidden_size: 分类器隐层维度. Default: 600 | |||
:param emb_dropout: 词嵌入的dropout概率. Default: 0.1 | |||
:param dropout: 模型除词嵌入外的dropout概率. Default: 0.1 | |||
""" | |||
def __init__(self, init_embed, num_cls, | |||
hidden_size=300, | |||
num_layers=4, | |||
num_head=8, | |||
head_dim=32, | |||
max_len=512, | |||
cls_hidden_size=600, | |||
emb_dropout=0.1, | |||
dropout=0.1, ): | |||
super(STNLICls, self).__init__() | |||
self.enc = StarTransEnc(init_embed=init_embed, | |||
hidden_size=hidden_size, | |||
num_layers=num_layers, | |||
num_head=num_head, | |||
head_dim=head_dim, | |||
max_len=max_len, | |||
emb_dropout=emb_dropout, | |||
dropout=dropout) | |||
self.cls = _NLICls(hidden_size, num_cls, cls_hidden_size) | |||
def forward(self, words1, words2, seq_len1, seq_len2): | |||
""" | |||
:param words1: [batch, seq_len] 输入序列1 | |||
:param words2: [batch, seq_len] 输入序列2 | |||
:param seq_len1: [batch,] 输入序列1的长度 | |||
:param seq_len2: [batch,] 输入序列2的长度 | |||
:return output: [batch, num_cls] 输出分类的概率 | |||
""" | |||
mask1 = seq_len_to_mask(seq_len1) | |||
mask2 = seq_len_to_mask(seq_len2) | |||
def enc(seq, mask): | |||
nodes, relay = self.enc(seq, mask) | |||
return 0.5 * (relay + nodes.max(1)[0]) | |||
y1 = enc(words1, mask1) | |||
y2 = enc(words2, mask2) | |||
output = self.cls(y1, y2) # [bsz, n_cls] | |||
return {Const.OUTPUT: output} | |||
def predict(self, words1, words2, seq_len1, seq_len2): | |||
""" | |||
:param words1: [batch, seq_len] 输入序列1 | |||
:param words2: [batch, seq_len] 输入序列2 | |||
:param seq_len1: [batch,] 输入序列1的长度 | |||
:param seq_len2: [batch,] 输入序列2的长度 | |||
:return output: [batch, num_cls] 输出分类的概率 | |||
""" | |||
y = self.forward(words1, words2, seq_len1, seq_len2) | |||
_, pred = y[Const.OUTPUT].max(1) | |||
return {Const.OUTPUT: pred} |
@@ -1,3 +1,51 @@ | |||
""" | |||
大部分用于的 NLP 任务神经网络都可以看做由编码 :mod:`~fastNLP.modules.encoder` 、 | |||
聚合 :mod:`~fastNLP.modules.aggregator` 、解码 :mod:`~fastNLP.modules.decoder` 三种模块组成。 | |||
.. image:: figures/text_classification.png | |||
:mod:`~fastNLP.modules` 中实现了 fastNLP 提供的诸多模块组件,可以帮助用户快速搭建自己所需的网络。 | |||
三种模块的功能和常见组件如下: | |||
+-----------------------+-----------------------+-----------------------+ | |||
| module type | functionality | example | | |||
+=======================+=======================+=======================+ | |||
| encoder | 将输入编码为具有具 | embedding, RNN, CNN, | | |||
| | 有表示能力的向量 | transformer | | |||
+-----------------------+-----------------------+-----------------------+ | |||
| aggregator | 从多个向量中聚合信息 | self-attention, | | |||
| | | max-pooling | | |||
+-----------------------+-----------------------+-----------------------+ | |||
| decoder | 将具有某种表示意义的 | MLP, CRF | | |||
| | 向量解码为需要的输出 | | | |||
| | 形式 | | | |||
+-----------------------+-----------------------+-----------------------+ | |||
""" | |||
__all__ = [ | |||
# "BertModel", | |||
"ConvolutionCharEncoder", | |||
"LSTMCharEncoder", | |||
"ConvMaxpool", | |||
"Embedding", | |||
"LSTM", | |||
"StarTransformer", | |||
"TransformerEncoder", | |||
"VarRNN", | |||
"VarLSTM", | |||
"VarGRU", | |||
"MaxPool", | |||
"MaxPoolWithMask", | |||
"AvgPool", | |||
"MultiHeadAttention", | |||
"MLP", | |||
"ConditionalRandomField", | |||
"viterbi_decode", | |||
"allowed_transitions", | |||
] | |||
from . import aggregator | |||
from . import decoder | |||
from . import encoder | |||
@@ -5,9 +53,4 @@ from .aggregator import * | |||
from .decoder import * | |||
from .dropout import TimestepDropout | |||
from .encoder import * | |||
__version__ = '0.0.0' | |||
__all__ = ['encoder', | |||
'decoder', | |||
'aggregator'] | |||
from .utils import get_embeddings |