Merge pull request #7 from fastnlp/dev0.5.0

Dev0.5.0 Update
6 years ago · 2b9aab459b
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,6 +1,9 @@
 language: python
 python:
  - "3.6"

 env:
  - TRAVIS=1
 # command to install dependencies
 install:
  - pip install --quiet -r requirements.txt
--- a/README.md
+++ b/README.md
@@ -6,11 +6,12 @@
 ![Hex.pm](https://img.shields.io/hexpm/l/plug.svg)
 [![Documentation Status](https://readthedocs.org/projects/fastnlp/badge/?version=latest)](http://fastnlp.readthedocs.io/?badge=latest)

 fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个序列标注（[NER](reproduction/seqence_labelling/ner)、POS-Tagging等）、中文分词、[文本分类](reproduction/text_classification)、[Matching](reproduction/matching)、[指代消解](reproduction/coreference_resolution)、[摘要](reproduction/Summarization)等任务； 也可以使用它构建许多复杂的网络模型，进行科研。它具有如下的特性：
 fastNLP 是一款轻量级的 NLP 工具包。你既可以使用它快速地完成一个序列标注（[NER](reproduction/seqence_labelling/ner)、POS-Tagging等）、中文分词、[文本分类](reproduction/text_classification)、[Matching](reproduction/matching)、[指代消解](reproduction/coreference_resolution)、[摘要](reproduction/Summarization)等任务； 也可以使用它快速构建许多复杂的网络模型，进行科研。它具有如下的特性：

 - 统一的Tabular式数据容器，让数据预处理过程简洁明了。内置多种数据集的DataSet Loader，省去预处理代码;
 - 统一的Tabular式数据容器，让数据预处理过程简洁明了。内置多种数据集的Loader和Pipe，省去预处理代码;
 - 多种训练、测试组件，例如训练器Trainer；测试器Tester；以及各种评测metrics等等;
 - 各种方便的NLP工具，例如预处理embedding加载（包括ELMo和BERT）; 中间数据cache等;
 - 部分[数据集与预训练模型](https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?c=A1A0A0)的自动下载
 - 详尽的中文[文档](https://fastnlp.readthedocs.io/)、[教程](https://fastnlp.readthedocs.io/zh/latest/user/tutorials.html)以供查阅;
 - 提供诸多高级模块，例如Variational LSTM, Transformer, CRF等;
 - 在序列标注、中文分词、文本分类、Matching、指代消解、摘要等任务上封装了各种模型可供直接使用，详细内容见 [reproduction](reproduction) 部分;
@@ -36,11 +37,15 @@ pip install fastNLP
 python -m spacy download en
 ```

 目前使用pypi安装fastNLP的版本是0.4.1，有较多功能仍未更新，最新内容以master分支为准。
 fastNLP0.5.0版本将在近期推出，请密切关注。


 ## fastNLP教程

 - [0. 快速入门](https://fastnlp.readthedocs.io/zh/latest/user/quickstart.html)
 - [1. 使用DataSet预处理文本](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_1_data_preprocess.html)
 - [2. 使用DataSetLoader加载数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_load_dataset.html)
 - [2. 使用Loader和Pipe加载并处理数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_load_dataset.html)
 - [3. 使用Embedding模块将文本转成向量](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_3_embedding.html)
 - [4. 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_4_loss_optimizer.html)
 - [5. 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_5_datasetiter.html)
@@ -48,17 +53,23 @@ python -m spacy download en
 - [7. 使用Modules和Models快速搭建自定义模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_7_modules_models.html)
 - [8. 使用Metric快速评测你的模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_8_metrics.html)
 - [9. 使用Callback自定义你的训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_9_callback.html)
 - [10. 使用fitlog 辅助 fastNLP 进行科研](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_10_fitlog.html)



 ## 内置组件

 大部分用于的 NLP 任务神经网络都可以看做由编码器（encoder）、解码器（decoder）两种模块组成。
 大部分用于的 NLP 任务神经网络都可以看做由词嵌入（embeddings）和两种模块：编码器（encoder）、解码器（decoder）组成。

 以文本分类任务为例，下图展示了一个BiLSTM+Attention实现文本分类器的模型流程图：


 ![](./docs/source/figures/text_classification.png)

 fastNLP 在 modules 模块中内置了两种模块的诸多组件，可以帮助用户快速搭建自己所需的网络。 两种模块的功能和常见组件如下:
 fastNLP 在 embeddings 模块中内置了几种不同的embedding：静态embedding（GloVe、word2vec）、上下文相关embedding
 （ELMo、BERT）、字符embedding（基于CNN或者LSTM的CharEmbedding）

 与此同时，fastNLP 在 modules 模块中内置了两种模块的诸多组件，可以帮助用户快速搭建自己所需的网络。 两种模块的功能和常见组件如下:

 <table>
 <tr>
@@ -81,7 +92,7 @@ fastNLP 在 modules 模块中内置了两种模块的诸多组件，可以帮助

 ## 项目结构

 ![](./docs/source/figures/workflow.png)
 <img src="./docs/source/figures/workflow.png" width="60%" height="60%">

 fastNLP的大致工作流程如上图所示，而项目结构如下：

@@ -102,9 +113,13 @@ fastNLP的大致工作流程如上图所示，而项目结构如下：
    <td><b> fastNLP.modules </b></td>
    <td> 实现了用于搭建神经网络模型的诸多组件 </td>
 </tr>
 <tr>
    <td><b> fastNLP.embeddings </b></td>
    <td> 实现了将序列index转为向量序列的功能，包括读取预训练embedding等 </td>
 </tr>
 <tr>
    <td><b> fastNLP.io </b></td>
    <td> 实现了读写功能，包括数据读入，模型读写等 </td>
    <td> 实现了读写功能，包括数据读入与预处理，模型读写，自动下载等 </td>
 </tr>
 </table>

--- a/docs/Makefile
+++ b/docs/Makefile
@@ -14,13 +14,13 @@ help:
 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

 apidoc:
 	$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ)
 	$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ) && python3 format.py

 server:
 	cd build/html && python -m http.server

 dev:
 	rm -rf build/html && make html && make server
 	rm -rf build && make html && make server

 .PHONY: help Makefile

--- a/docs/count.py
+++ b/docs/count.py
@@ -0,0 +1,65 @@
 import os
 import sys


 def find_all_modules():
    modules = {}
    children = {}
    to_doc = set()
    root = '../fastNLP'
    for path, dirs, files in os.walk(root):
        for file in files:
            if file.endswith('.py'):
                name = ".".join(path.split('/')[1:])
                if file.split('.')[0] != "__init__":
                    name = name + '.' + file.split('.')[0]
                __import__(name)
                m = sys.modules[name]
                modules[name] = m
                try:
                    m.__all__
                except:
                    print(name, "__all__ missing")
                    continue
                if m.__doc__ is None:
                    print(name, "__doc__ missing")
                    continue
                if "undocumented" not in m.__doc__:
                    to_doc.add(name)
    for module in to_doc:
        t = ".".join(module.split('.')[:-1])
        if t in to_doc:
            if t not in children:
                children[t] = set()
            children[t].add(module)
    for m in children:
        children[m] = sorted(children[m])
    return modules, to_doc, children


 def create_rst_file(modules, name, children):
    m = modules[name]
    with open("./source/" + name + ".rst", "w") as fout:
        t = "=" * len(name)
        fout.write(name + "\n")
        fout.write(t + "\n")
        fout.write("\n")
        fout.write(".. automodule:: " + name + "\n")
        if len(m.__all__) > 0:
            fout.write("   :members: " + ", ".join(m.__all__) + "\n")
            fout.write("   :inherited-members:\n")
        fout.write("\n")
        if name in children:
            fout.write("子模块\n------\n\n.. toctree::\n\n")
            for module in children[name]:
                fout.write("   " + module + "\n")


 def main():
    modules, to_doc, children = find_all_modules()
    for name in to_doc:
        create_rst_file(modules, name, children)


 if __name__ == "__main__":
    main()
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -48,12 +48,14 @@ extensions = [
 autodoc_default_options = {
    'member-order': 'bysource',
    'special-members': '__init__',
    'undoc-members': True,
    'undoc-members': False,
 }

 autoclass_content = "class"

 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']

 # template_bridge
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
@@ -113,7 +115,7 @@ html_static_path = ['_static']
 # -- Options for HTMLHelp output ---------------------------------------------

 # Output file base name for HTML help builder.
 htmlhelp_basename = 'fastNLPdoc'
 htmlhelp_basename = 'fastNLP doc'

 # -- Options for LaTeX output ------------------------------------------------

--- a/docs/source/fastNLP.core.batch.rst
+++ b/docs/source/fastNLP.core.batch.rst
@@ -2,6 +2,6 @@ fastNLP.core.batch
 ==================

 .. automodule:: fastNLP.core.batch
    :members:
    :undoc-members:
    :show-inheritance:
   :members: BatchIter, DataSetIter, TorchLoaderIter
   :inherited-members:

--- a/docs/source/fastNLP.core.callback.rst
+++ b/docs/source/fastNLP.core.callback.rst
@@ -2,6 +2,6 @@ fastNLP.core.callback
 =====================

 .. automodule:: fastNLP.core.callback
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Callback, GradientClipCallback, EarlyStopCallback, FitlogCallback, EvaluateCallback, LRScheduler, ControlC, LRFinder, TensorboardCallback, WarmupCallback, SaveModelCallback, EchoCallback, TesterCallback, CallbackException, EarlyStopError
   :inherited-members:

--- a/docs/source/fastNLP.core.const.rst
+++ b/docs/source/fastNLP.core.const.rst
@@ -2,6 +2,6 @@ fastNLP.core.const
 ==================

 .. automodule:: fastNLP.core.const
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Const
   :inherited-members:

--- a/docs/source/fastNLP.core.dataset.rst
+++ b/docs/source/fastNLP.core.dataset.rst
@@ -2,6 +2,6 @@ fastNLP.core.dataset
 ====================

 .. automodule:: fastNLP.core.dataset
    :members:
    :undoc-members:
    :show-inheritance:
   :members: DataSet
   :inherited-members:

--- a/docs/source/fastNLP.core.field.rst
+++ b/docs/source/fastNLP.core.field.rst
@@ -2,6 +2,6 @@ fastNLP.core.field
 ==================

 .. automodule:: fastNLP.core.field
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Padder, AutoPadder, EngChar2DPadder
   :inherited-members:

--- a/docs/source/fastNLP.core.instance.rst
+++ b/docs/source/fastNLP.core.instance.rst
@@ -2,6 +2,6 @@ fastNLP.core.instance
 =====================

 .. automodule:: fastNLP.core.instance
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Instance
   :inherited-members:

--- a/docs/source/fastNLP.core.losses.rst
+++ b/docs/source/fastNLP.core.losses.rst
@@ -2,6 +2,6 @@ fastNLP.core.losses
 ===================

 .. automodule:: fastNLP.core.losses
    :members:
    :undoc-members:
    :show-inheritance:
   :members: LossBase, LossFunc, LossInForward, CrossEntropyLoss, BCELoss, L1Loss, NLLLoss
   :inherited-members:

--- a/docs/source/fastNLP.core.metrics.rst
+++ b/docs/source/fastNLP.core.metrics.rst
@@ -2,6 +2,6 @@ fastNLP.core.metrics
 ====================

 .. automodule:: fastNLP.core.metrics
    :members:
    :undoc-members:
    :show-inheritance:
   :members: MetricBase, AccuracyMetric, SpanFPreRecMetric, ExtractiveQAMetric
   :inherited-members:

--- a/docs/source/fastNLP.core.optimizer.rst
+++ b/docs/source/fastNLP.core.optimizer.rst
@@ -2,6 +2,6 @@ fastNLP.core.optimizer
 ======================

 .. automodule:: fastNLP.core.optimizer
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Optimizer, SGD, Adam, AdamW
   :inherited-members:

--- a/docs/source/fastNLP.core.rst
+++ b/docs/source/fastNLP.core.rst
@@ -2,15 +2,13 @@ fastNLP.core
 ============

 .. automodule:: fastNLP.core
    :members:
    :undoc-members:
    :show-inheritance:
   :members: DataSet, Instance, FieldArray, Padder, AutoPadder, EngChar2DPadder, Vocabulary, DataSetIter, BatchIter, TorchLoaderIter, Const, Tester, Trainer, cache_results, seq_len_to_mask, get_seq_len, logger, Callback, GradientClipCallback, EarlyStopCallback, FitlogCallback, EvaluateCallback, LRScheduler, ControlC, LRFinder, TensorboardCallback, WarmupCallback, SaveModelCallback, EchoCallback, TesterCallback, CallbackException, EarlyStopError, LossFunc, CrossEntropyLoss, L1Loss, BCELoss, NLLLoss, LossInForward, AccuracyMetric, SpanFPreRecMetric, ExtractiveQAMetric, Optimizer, SGD, Adam, AdamW, SequentialSampler, BucketSampler, RandomSampler, Sampler
   :inherited-members:

 子模块
 ----------
 ------

 .. toctree::
   :titlesonly:

   fastNLP.core.batch
   fastNLP.core.callback
@@ -26,4 +24,3 @@ fastNLP.core
   fastNLP.core.trainer
   fastNLP.core.utils
   fastNLP.core.vocabulary

--- a/docs/source/fastNLP.core.sampler.rst
+++ b/docs/source/fastNLP.core.sampler.rst
@@ -2,6 +2,6 @@ fastNLP.core.sampler
 ====================

 .. automodule:: fastNLP.core.sampler
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Sampler, BucketSampler, SequentialSampler, RandomSampler
   :inherited-members:

--- a/docs/source/fastNLP.core.tester.rst
+++ b/docs/source/fastNLP.core.tester.rst
@@ -2,6 +2,6 @@ fastNLP.core.tester
 ===================

 .. automodule:: fastNLP.core.tester
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Tester
   :inherited-members:

--- a/docs/source/fastNLP.core.trainer.rst
+++ b/docs/source/fastNLP.core.trainer.rst
@@ -2,6 +2,6 @@ fastNLP.core.trainer
 ====================

 .. automodule:: fastNLP.core.trainer
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Trainer
   :inherited-members:

--- a/docs/source/fastNLP.core.utils.rst
+++ b/docs/source/fastNLP.core.utils.rst
@@ -2,6 +2,6 @@ fastNLP.core.utils
 ==================

 .. automodule:: fastNLP.core.utils
    :members:
    :undoc-members:
    :show-inheritance:
   :members: cache_results, seq_len_to_mask, get_seq_len
   :inherited-members:

--- a/docs/source/fastNLP.core.vocabulary.rst
+++ b/docs/source/fastNLP.core.vocabulary.rst
@@ -2,6 +2,6 @@ fastNLP.core.vocabulary
 =======================

 .. automodule:: fastNLP.core.vocabulary
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Vocabulary, VocabularyOption
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.bert_embedding.rst
+++ b/docs/source/fastNLP.embeddings.bert_embedding.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.bert_embedding
 =================================

 .. automodule:: fastNLP.embeddings.bert_embedding
   :members: BertEmbedding, BertWordPieceEncoder
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.char_embedding.rst
+++ b/docs/source/fastNLP.embeddings.char_embedding.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.char_embedding
 =================================

 .. automodule:: fastNLP.embeddings.char_embedding
   :members: CNNCharEmbedding, LSTMCharEmbedding
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.contextual_embedding.rst
+++ b/docs/source/fastNLP.embeddings.contextual_embedding.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.contextual_embedding
 =======================================

 .. automodule:: fastNLP.embeddings.contextual_embedding
   :members: ContextualEmbedding
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.elmo_embedding.rst
+++ b/docs/source/fastNLP.embeddings.elmo_embedding.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.elmo_embedding
 =================================

 .. automodule:: fastNLP.embeddings.elmo_embedding
   :members: ElmoEmbedding
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.embedding.rst
+++ b/docs/source/fastNLP.embeddings.embedding.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.embedding
 ============================

 .. automodule:: fastNLP.embeddings.embedding
   :members: Embedding, TokenEmbedding
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.rst
+++ b/docs/source/fastNLP.embeddings.rst
@@ -0,0 +1,20 @@
 fastNLP.embeddings
 ==================

 .. automodule:: fastNLP.embeddings
   :members: Embedding, TokenEmbedding, StaticEmbedding, ElmoEmbedding, BertEmbedding, BertWordPieceEncoder, StackEmbedding, LSTMCharEmbedding, CNNCharEmbedding, get_embeddings
   :inherited-members:

 子模块
 ------

 .. toctree::

   fastNLP.embeddings.bert_embedding
   fastNLP.embeddings.char_embedding
   fastNLP.embeddings.contextual_embedding
   fastNLP.embeddings.elmo_embedding
   fastNLP.embeddings.embedding
   fastNLP.embeddings.stack_embedding
   fastNLP.embeddings.static_embedding
   fastNLP.embeddings.utils
--- a/docs/source/fastNLP.embeddings.stack_embedding.rst
+++ b/docs/source/fastNLP.embeddings.stack_embedding.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.stack_embedding
 ==================================

 .. automodule:: fastNLP.embeddings.stack_embedding
   :members: StackEmbedding
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.static_embedding.rst
+++ b/docs/source/fastNLP.embeddings.static_embedding.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.static_embedding
 ===================================

 .. automodule:: fastNLP.embeddings.static_embedding
   :members: StaticEmbedding
   :inherited-members:

--- a/docs/source/fastNLP.embeddings.utils.rst
+++ b/docs/source/fastNLP.embeddings.utils.rst
@@ -0,0 +1,7 @@
 fastNLP.embeddings.utils
 ========================

 .. automodule:: fastNLP.embeddings.utils
   :members: get_embeddings
   :inherited-members:

--- a/docs/source/fastNLP.io.base_loader.rst
+++ b/docs/source/fastNLP.io.base_loader.rst
@@ -1,7 +0,0 @@
 fastNLP.io.base\_loader
 =======================

 .. automodule:: fastNLP.io.base_loader
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.io.data_bundle.rst
+++ b/docs/source/fastNLP.io.data_bundle.rst
@@ -0,0 +1,7 @@
 fastNLP.io.data_bundle
 ======================

 .. automodule:: fastNLP.io.data_bundle
   :members: DataBundle
   :inherited-members:

--- a/docs/source/fastNLP.io.dataset_loader.rst
+++ b/docs/source/fastNLP.io.dataset_loader.rst
@@ -1,7 +1,6 @@
 fastNLP.io.dataset\_loader
 ==========================
 fastNLP.io.dataset_loader
 =========================

 .. automodule:: fastNLP.io.dataset_loader
    :members:
    :undoc-members:
    :show-inheritance:
   :members: CSVLoader, JsonLoader

--- a/docs/source/fastNLP.io.embed_loader.rst
+++ b/docs/source/fastNLP.io.embed_loader.rst
@@ -1,7 +1,7 @@
 fastNLP.io.embed\_loader
 ========================
 fastNLP.io.embed_loader
 =======================

 .. automodule:: fastNLP.io.embed_loader
    :members:
    :undoc-members:
    :show-inheritance:
   :members: EmbedLoader, EmbeddingOption
   :inherited-members:

--- a/docs/source/fastNLP.io.file_utils.rst
+++ b/docs/source/fastNLP.io.file_utils.rst
@@ -0,0 +1,7 @@
 fastNLP.io.file_utils
 =====================

 .. automodule:: fastNLP.io.file_utils
   :members: cached_path, get_filepath, get_cache_path, split_filename_suffix, get_from_cache
   :inherited-members:

--- a/docs/source/fastNLP.io.loader.rst
+++ b/docs/source/fastNLP.io.loader.rst
@@ -0,0 +1,7 @@
 fastNLP.io.loader
 =================

 .. automodule:: fastNLP.io.loader
   :members: Loader, YelpLoader, YelpFullLoader, YelpPolarityLoader, IMDBLoader, SSTLoader, SST2Loader, ConllLoader, Conll2003Loader, Conll2003NERLoader, OntoNotesNERLoader, CTBLoader, MsraNERLoader, PeopleDailyNERLoader, WeiboNERLoader, CSVLoader, JsonLoader, CWSLoader, MNLILoader, QuoraLoader, SNLILoader, QNLILoader, RTELoader
   :inherited-members:

--- a/docs/source/fastNLP.io.model_io.rst
+++ b/docs/source/fastNLP.io.model_io.rst
@@ -1,7 +1,7 @@
 fastNLP.io.model\_io
 ====================
 fastNLP.io.model_io
 ===================

 .. automodule:: fastNLP.io.model_io
    :members:
    :undoc-members:
    :show-inheritance:
   :members: ModelLoader, ModelSaver
   :inherited-members:

--- a/docs/source/fastNLP.io.pipe.rst
+++ b/docs/source/fastNLP.io.pipe.rst
@@ -0,0 +1,7 @@
 fastNLP.io.pipe
 ===============

 .. automodule:: fastNLP.io.pipe
   :members: Pipe, CWSPipe, YelpFullPipe, YelpPolarityPipe, SSTPipe, SST2Pipe, IMDBPipe, Conll2003NERPipe, OntoNotesNERPipe, MsraNERPipe, WeiboNERPipe, PeopleDailyPipe, Conll2003Pipe, MatchingBertPipe, RTEBertPipe, SNLIBertPipe, QuoraBertPipe, QNLIBertPipe, MNLIBertPipe, MatchingPipe, RTEPipe, SNLIPipe, QuoraPipe, QNLIPipe, MNLIPipe
   :inherited-members:

--- a/docs/source/fastNLP.io.rst
+++ b/docs/source/fastNLP.io.rst
@@ -2,18 +2,18 @@ fastNLP.io
 ==========

 .. automodule:: fastNLP.io
    :members:
    :undoc-members:
    :show-inheritance:
   :members: DataBundle, EmbedLoader, Loader, YelpLoader, YelpFullLoader, YelpPolarityLoader, IMDBLoader, SSTLoader, SST2Loader, ConllLoader, Conll2003Loader, Conll2003NERLoader, OntoNotesNERLoader, CTBLoader, MsraNERLoader, WeiboNERLoader, PeopleDailyNERLoader, CSVLoader, JsonLoader, CWSLoader, MNLILoader, QuoraLoader, SNLILoader, QNLILoader, RTELoader, Pipe, YelpFullPipe, YelpPolarityPipe, SSTPipe, SST2Pipe, IMDBPipe, Conll2003Pipe, Conll2003NERPipe, OntoNotesNERPipe, MsraNERPipe, PeopleDailyPipe, WeiboNERPipe, CWSPipe, MatchingBertPipe, RTEBertPipe, SNLIBertPipe, QuoraBertPipe, QNLIBertPipe, MNLIBertPipe, MatchingPipe, RTEPipe, SNLIPipe, QuoraPipe, QNLIPipe, MNLIPipe, ModelLoader, ModelSaver
   :inherited-members:

 子模块
 ----------
 ------

 .. toctree::
   :titlesonly:

   fastNLP.io.base_loader
   fastNLP.io.dataset_loader
   fastNLP.io.data_bundle
   fastNLP.io.embed_loader
   fastNLP.io.file_utils
   fastNLP.io.loader
   fastNLP.io.model_io

   fastNLP.io.pipe
   fastNLP.io.utils
--- a/docs/source/fastNLP.io.utils.rst
+++ b/docs/source/fastNLP.io.utils.rst
@@ -0,0 +1,7 @@
 fastNLP.io.utils
 ================

 .. automodule:: fastNLP.io.utils
   :members: check_loader_paths
   :inherited-members:

--- a/docs/source/fastNLP.models.biaffine_parser.rst
+++ b/docs/source/fastNLP.models.biaffine_parser.rst
@@ -1,7 +1,7 @@
 fastNLP.models.biaffine\_parser
 ===============================
 fastNLP.models.biaffine_parser
 ==============================

 .. automodule:: fastNLP.models.biaffine_parser
    :members:
    :undoc-members:
    :show-inheritance:
   :members: BiaffineParser, GraphParser
   :inherited-members:

--- a/docs/source/fastNLP.models.cnn_text_classification.rst
+++ b/docs/source/fastNLP.models.cnn_text_classification.rst
@@ -1,7 +1,7 @@
 fastNLP.models.cnn\_text\_classification
 ========================================
 fastNLP.models.cnn_text_classification
 ======================================

 .. automodule:: fastNLP.models.cnn_text_classification
    :members:
    :undoc-members:
    :show-inheritance:
   :members: CNNText
   :inherited-members:

--- a/docs/source/fastNLP.models.rst
+++ b/docs/source/fastNLP.models.rst
@@ -2,19 +2,16 @@ fastNLP.models
 ==============

 .. automodule:: fastNLP.models
    :members:
    :undoc-members:
    :show-inheritance:
   :members: CNNText, SeqLabeling, AdvSeqLabel, ESIM, StarTransEnc, STSeqLabel, STNLICls, STSeqCls, BiaffineParser, GraphParser
   :inherited-members:

 子模块
 ----------
 ------

 .. toctree::
   :titlesonly:

   fastNLP.models.biaffine_parser
   fastNLP.models.cnn_text_classification
   fastNLP.models.sequence_labeling
   fastNLP.models.snli
   fastNLP.models.star_transformer

--- a/docs/source/fastNLP.models.sequence_labeling.rst
+++ b/docs/source/fastNLP.models.sequence_labeling.rst
@@ -1,7 +1,7 @@
 fastNLP.models.sequence\_labeling
 =================================
 fastNLP.models.sequence_labeling
 ================================

 .. automodule:: fastNLP.models.sequence_labeling
    :members:
    :undoc-members:
    :show-inheritance:
   :members: SeqLabeling, AdvSeqLabel
   :inherited-members:

--- a/docs/source/fastNLP.models.snli.rst
+++ b/docs/source/fastNLP.models.snli.rst
@@ -2,6 +2,6 @@ fastNLP.models.snli
 ===================

 .. automodule:: fastNLP.models.snli
    :members:
    :undoc-members:
    :show-inheritance:
   :members: ESIM
   :inherited-members:

--- a/docs/source/fastNLP.models.star_transformer.rst
+++ b/docs/source/fastNLP.models.star_transformer.rst
@@ -1,7 +1,7 @@
 fastNLP.models.star\_transformer
 ================================
 fastNLP.models.star_transformer
 ===============================

 .. automodule:: fastNLP.models.star_transformer
    :members:
    :undoc-members:
    :show-inheritance:
   :members: StarTransEnc, STNLICls, STSeqCls, STSeqLabel
   :inherited-members:

--- a/docs/source/fastNLP.modules.decoder.crf.rst
+++ b/docs/source/fastNLP.modules.decoder.crf.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.decoder.CRF
 ===========================

 .. automodule:: fastNLP.modules.decoder.crf
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.decoder.mlp.rst
+++ b/docs/source/fastNLP.modules.decoder.mlp.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.decoder.MLP
 ===========================

 .. automodule:: fastNLP.modules.decoder.mlp
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.decoder.rst
+++ b/docs/source/fastNLP.modules.decoder.rst
@@ -2,17 +2,6 @@ fastNLP.modules.decoder
 =======================

 .. automodule:: fastNLP.modules.decoder
    :members:
    :undoc-members:
    :show-inheritance:

 子模块
 ----------

 .. toctree::
   :titlesonly:

   fastNLP.modules.decoder.crf
   fastNLP.modules.decoder.mlp
   fastNLP.modules.decoder.utils
   :members: MLP, ConditionalRandomField, viterbi_decode, allowed_transitions
   :inherited-members:

--- a/docs/source/fastNLP.modules.decoder.utils.rst
+++ b/docs/source/fastNLP.modules.decoder.utils.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.decoder.utils
 =============================

 .. automodule:: fastNLP.modules.decoder.utils
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.bert.rst
+++ b/docs/source/fastNLP.modules.encoder.bert.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.bert
 ============================

 .. automodule:: fastNLP.modules.encoder.bert
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.char_encoder.rst
+++ b/docs/source/fastNLP.modules.encoder.char_encoder.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.char\_encoder
 =====================================

 .. automodule:: fastNLP.modules.encoder.char_encoder
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.conv_maxpool.rst
+++ b/docs/source/fastNLP.modules.encoder.conv_maxpool.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.conv\_maxpool
 =====================================

 .. automodule:: fastNLP.modules.encoder.conv_maxpool
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.embedding.rst
+++ b/docs/source/fastNLP.modules.encoder.embedding.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.embedding
 =================================

 .. automodule:: fastNLP.modules.encoder.embedding
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.lstm.rst
+++ b/docs/source/fastNLP.modules.encoder.lstm.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.lstm
 ============================

 .. automodule:: fastNLP.modules.encoder.lstm
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.rst
+++ b/docs/source/fastNLP.modules.encoder.rst
@@ -2,22 +2,6 @@ fastNLP.modules.encoder
 =======================

 .. automodule:: fastNLP.modules.encoder
    :members:
    :undoc-members:
    :show-inheritance:

 子模块
 ----------

 .. toctree::
   :titlesonly:

   fastNLP.modules.encoder.bert
   fastNLP.modules.encoder.char_encoder
   fastNLP.modules.encoder.conv_maxpool
   fastNLP.modules.encoder.embedding
   fastNLP.modules.encoder.lstm
   fastNLP.modules.encoder.star_transformer
   fastNLP.modules.encoder.transformer
   fastNLP.modules.encoder.variational_rnn
   :members: ConvolutionCharEncoder, LSTMCharEncoder, ConvMaxpool, LSTM, StarTransformer, TransformerEncoder, VarRNN, VarLSTM, VarGRU, MaxPool, MaxPoolWithMask, AvgPool, AvgPoolWithMask, MultiHeadAttention
   :inherited-members:

--- a/docs/source/fastNLP.modules.encoder.star_transformer.rst
+++ b/docs/source/fastNLP.modules.encoder.star_transformer.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.star\_transformer
 =========================================

 .. automodule:: fastNLP.modules.encoder.star_transformer
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.transformer.rst
+++ b/docs/source/fastNLP.modules.encoder.transformer.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.transformer
 ===================================

 .. automodule:: fastNLP.modules.encoder.transformer
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.encoder.variational_rnn.rst
+++ b/docs/source/fastNLP.modules.encoder.variational_rnn.rst
@@ -1,7 +0,0 @@
 fastNLP.modules.encoder.variational\_rnn
 ========================================

 .. automodule:: fastNLP.modules.encoder.variational_rnn
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/source/fastNLP.modules.rst
+++ b/docs/source/fastNLP.modules.rst
@@ -2,15 +2,14 @@ fastNLP.modules
 ===============

 .. automodule:: fastNLP.modules
    :members:
    :undoc-members:
    :show-inheritance:
   :members: ConvolutionCharEncoder, LSTMCharEncoder, ConvMaxpool, LSTM, StarTransformer, TransformerEncoder, VarRNN, VarLSTM, VarGRU, MaxPool, MaxPoolWithMask, AvgPool, AvgPoolWithMask, MultiHeadAttention, MLP, ConditionalRandomField, viterbi_decode, allowed_transitions, TimestepDropout
   :inherited-members:

 子模块
 -----------
 ------

 .. toctree::
    :titlesonly:

    fastNLP.modules.decoder
    fastNLP.modules.encoder
   fastNLP.modules.decoder
   fastNLP.modules.encoder
   fastNLP.modules.utils
--- a/docs/source/fastNLP.modules.utils.rst
+++ b/docs/source/fastNLP.modules.utils.rst
@@ -0,0 +1,7 @@
 fastNLP.modules.utils
 =====================

 .. automodule:: fastNLP.modules.utils
   :members: initial_parameter, summary
   :inherited-members:

--- a/docs/source/fastNLP.rst
+++ b/docs/source/fastNLP.rst
@@ -1,20 +1,17 @@
 API 文档
 ===============
 fastNLP
 =======

 .. automodule:: fastNLP
    :members:
    :undoc-members:
    :show-inheritance:
   :members: Instance, FieldArray, DataSetIter, BatchIter, TorchLoaderIter, Vocabulary, DataSet, Const, Trainer, Tester, Callback, GradientClipCallback, EarlyStopCallback, TensorboardCallback, LRScheduler, ControlC, LRFinder, Padder, AutoPadder, EngChar2DPadder, AccuracyMetric, SpanFPreRecMetric, ExtractiveQAMetric, Optimizer, SGD, Adam, AdamW, Sampler, SequentialSampler, BucketSampler, RandomSampler, LossFunc, CrossEntropyLoss, L1Loss, BCELoss, NLLLoss, LossInForward, cache_results, logger
   :inherited-members:

 内部模块
 -----------
 子模块
 ------

 .. toctree::
    :titlesonly:
    :maxdepth: 3

    fastNLP.core
    fastNLP.io
    fastNLP.modules
    fastNLP.models

   fastNLP.core
   fastNLP.embeddings
   fastNLP.io
   fastNLP.models
   fastNLP.modules
--- a/docs/source/figures/text_classification.png
+++ b/docs/source/figures/text_classification.png
--- a/docs/source/figures/workflow.png
+++ b/docs/source/figures/workflow.png
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -1,60 +1,28 @@
 fastNLP 中文文档
 =====================

 fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个命名实体识别（NER）、中文分词或文本分类任务；
 也可以使用他构建许多复杂的网络模型，进行科研。它具有如下的特性:
 `fastNLP <https://github.com/fastnlp/fastNLP/>`_ 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个序列标注
 （NER、POS-Tagging等）、中文分词、文本分类、Matching、指代消解、摘要等任务
 （详见 `reproduction <https://github.com/fastnlp/fastNLP/tree/master/reproduction>`_ ）；
 也可以使用它构建许多复杂的网络模型，进行科研。它具有如下的特性：

 - 统一的Tabular式数据容器，让数据预处理过程简洁明了。内置多种数据集的DataSet Loader，省去预处理代码。
 - 各种方便的NLP工具，例如预处理embedding加载; 中间数据cache等;
 - 详尽的中文文档以供查阅；
 - 提供诸多高级模块，例如Variational LSTM, Transformer, CRF等;
 - 封装CNNText，Biaffine等模型可供直接使用;
 - 便捷且具有扩展性的训练器; 提供多种内置callback函数，方便实验记录、异常捕获等。
 - 统一的Tabular式数据容器，让数据预处理过程简洁明了。内置多种数据集的 :mod:`~fastNLP.io.data_loader` ，省去预处理代码;
 - 多种训练、测试组件，例如训练器 :class:`~fastNLP.Trainer` ；测试器 :class:`~fastNLP.Tester` ；以及各种评测 :mod:`~fastNLP.core.metrics` 等等;
 - 各种方便的NLP工具，例如预处理 :mod:`embedding<fastNLP.embeddings>` 加载（包括ELMo和BERT）; 中间数据存储 :func:`cache <fastNLP.cache_results>` 等;
 - 提供诸多高级模块 :mod:`~fastNLP.modules`，例如 :class:`~fastNLP.modules.VarLSTM` , :class:`Transformer<fastNLP.modules.TransformerEncoder>` , :class:`CRF<fastNLP.modules.ConditionalRandomField>` 等;
 - 在序列标注、中文分词、文本分类、Matching、指代消解、摘要等任务上封装了各种 :mod:`~fastNLP.models` 可供直接使用;
 - 训练器便捷且具有扩展性，提供多种内置 :mod:`~fastNLP.core.callback` 函数，方便实验记录、异常捕获等。


 内置组件
 ------------

 大部分用于的 NLP 任务神经网络都可以看做由编码（encoder）、聚合（aggregator）、解码（decoder）三种模块组成。

 .. image:: figures/text_classification.png

 fastNLP 在 :mod:`~fastNLP.modules` 模块中内置了三种模块的诸多组件，可以帮助用户快速搭建自己所需的网络。
 三种模块的功能和常见组件如下:

 +-----------------------+-----------------------+-----------------------+
 | module type           | functionality         | example               |
 +=======================+=======================+=======================+
 | encoder               | 将输入编码为具有具    | embedding, RNN, CNN,  |
 |                       | 有表示能力的向量      | transformer           |
 +-----------------------+-----------------------+-----------------------+
 | aggregator            | 从多个向量中聚合信息  | self-attention,       |
 |                       |                       | max-pooling           |
 +-----------------------+-----------------------+-----------------------+
 | decoder               | 将具有某种表示意义的  | MLP, CRF              |
 |                       | 向量解码为需要的输出  |                       |
 |                       | 形式                  |                       |
 +-----------------------+-----------------------+-----------------------+


 内置模型
 ----------------

 fastNLP 在 :mod:`~fastNLP.models` 模块中内置了如 :class:`~fastNLP.models.CNNText` 、
 :class:`~fastNLP.models.SeqLabeling` 等完整的模型，以供用户直接使用。

 .. todo::
    这些模型的介绍如下表所示：（模型名称 + 介绍 + 任务上的结果）

 用户手册
 ----------------

 .. toctree::
   :maxdepth: 1
   :maxdepth: 2

    安装指南 </user/installation>
    快速入门 </user/quickstart>
    详细指南 </user/tutorials>
    详细教程 </user/tutorials>

 API 文档
 -------------
@@ -67,11 +35,11 @@ API 文档
   
   fastNLP

 fitlog
 ------
 fitlog文档
 ----------

 用户可以 `点此 <https://fitlog.readthedocs.io/zh/latest/>`_  查看fitlog的文档。
 fitlog 是由我们团队开发，用于帮助用户记录日志并管理代码的工具
 您可以 `点此 <https://fitlog.readthedocs.io/zh/latest/>`_  查看fitlog的文档。
 fitlog 是由我们团队开发的日志记录+代码管理的工具。

 索引与搜索
 ==================
--- a/docs/source/modules.rst
+++ b/docs/source/modules.rst
@@ -2,7 +2,6 @@ fastNLP
 =======

 .. toctree::
   :titlesonly:
   :maxdepth: 4

   fastNLP
--- a/docs/source/tutorials/tutorial_1_data_preprocess.rst
+++ b/docs/source/tutorials/tutorial_1_data_preprocess.rst
@@ -1,5 +1,5 @@
 ==============================
 数据格式及预处理教程
 使用DataSet预处理文本
 ==============================

 :class:`~fastNLP.DataSet` 是fastNLP中用于承载数据的容器。可以将DataSet看做是一个表格，
@@ -60,7 +60,7 @@
            seq_len=3)
        ])

 在初步构建完数据集之后，我们可可以通过 `for` 循环遍历 :class:`~fastNLP.DataSet` 中的内容。
 在初步构建完数据集之后，我们可以通过 `for` 循环遍历 :class:`~fastNLP.DataSet` 中的内容。

 .. code-block:: python

--- a/docs/source/tutorials/tutorial_2_load_dataset.rst
+++ b/docs/source/tutorials/tutorial_2_load_dataset.rst
@@ -1,57 +1,53 @@
 =========================
 数据集加载教程
 =========================
 =======================================
 使用Loader和Pipe加载并处理数据集
 =======================================

 这一部分是一个关于如何加载数据集的教程

 教程目录：

    - `Part I: 数据集信息`_
    - `Part II: 数据集的使用方式`_
    - `Part III: 不同数据类型的DataSetLoader`_
    - `Part IV: DataSetLoader举例`_
    - `Part V: fastNLP封装好的数据集加载器`_
    - `Part I: 数据集容器DataBundle`_
    - `Part II: 加载数据集的基类Loader`_
    - `Part III: 不同格式类型的基础Loader`_
    - `Part IV: 使用Pipe对数据集进行预处理`_
    - `Part V: fastNLP封装好的Loader和Pipe`_


 ----------------------------
 Part I: 数据集信息
 ----------------------------
 ------------------------------------
 Part I: 数据集容器DataBundle
 ------------------------------------

 在fastNLP中，我们使用 :class:`~fastNLP.io.base_loader.DataInfo` 来存储数据集信息。 :class:`~fastNLP.io.base_loader.DataInfo`
 类包含了两个重要内容： `datasets` 和 `vocabs` 。
 在fastNLP中，我们使用 :class:`~fastNLP.io.data_bundle.DataBundle` 来存储数据集信息。
 :class:`~fastNLP.io.data_bundle.DataBundle` 类包含了两个重要内容： `datasets` 和 `vocabs` 。

 `datasets` 是一个 `key` 为数据集名称（如 `train` ， `dev` ，和 `test` 等）， `value` 为 :class:`~fastNLP.DataSet` 的字典。

 `vocabs` 是一个 `key` 为词表名称（如 :attr:`fastNLP.Const.INPUT` 表示输入文本的词表名称， :attr:`fastNLP.Const.TARGET` 表示目标
 的真实标签词表的名称，等等）， `value` 为词表内容（ :class:`~fastNLP.Vocabulary` ）的字典。

 ----------------------------
 Part II: 数据集的使用方式
 ----------------------------
 -------------------------------------
 Part II: 加载数据集的基类Loader
 -------------------------------------

 在fastNLP中，我们采用 :class:`~fastNLP.io.base_loader.DataSetLoader` 来作为加载数据集的基类。
 :class:`~fastNLP.io.base_loader.DataSetLoader` 定义了各种DataSetLoader所需的API接口，开发者应该继承它实现各种的DataSetLoader。
 在各种数据集的DataSetLoader当中，至少应该编写如下内容:
 在fastNLP中，我们采用 :class:`~fastNLP.io.loader.Loader` 来作为加载数据集的基类。
 :class:`~fastNLP.io.loader.Loader` 定义了各种Loader所需的API接口，开发者应该继承它实现各种的Loader。
 在各种数据集的Loader当中，至少应该编写如下内容:

    - _load 函数：从一个数据文件中读取数据到一个 :class:`~fastNLP.DataSet`
    - load 函数（可以使用基类的方法）：从一个或多个数据文件中读取数据到一个或多个 :class:`~fastNLP.DataSet`
    - process 函数：一个或多个从数据文件中读取数据，并处理成可以训练的 :class:`~fastNLP.io.DataInfo`
    - _load 函数：从一个数据文件中读取数据，返回一个 :class:`~fastNLP.DataSet`
    - load 函数：从文件或者文件夹中读取数据并组装成 :class:`~fastNLP.io.data_bundle.DataBundle`

    **\*process函数中可以调用load函数或_load函数**

 DataSetLoader的_load或者load函数返回的 :class:`~fastNLP.DataSet` 当中，内容为数据集的文本信息，process函数返回的
 :class:`~fastNLP.io.DataInfo` 当中， `datasets` 的内容为已经index好的、可以直接被 :class:`~fastNLP.Trainer`
 接受的内容。
 Loader的load函数返回的 :class:`~fastNLP.io.data_bundle.DataBundle` 里面包含了数据集的原始数据。

 --------------------------------------------------------
 Part III: 不同数据类型的DataSetLoader
 Part III: 不同格式类型的基础Loader
 --------------------------------------------------------

 :class:`~fastNLP.io.dataset_loader.CSVLoader`
 :class:`~fastNLP.io.loader.CSVLoader`
    读取CSV类型的数据集文件。例子如下：

    .. code-block:: python

        from fastNLP.io.loader import CSVLoader
        data_set_loader = CSVLoader(
            headers=('words', 'target'), sep='\t'
        )
@@ -67,17 +63,18 @@ Part III: 不同数据类型的DataSetLoader
        The performances are an absolute joy .	4


 :class:`~fastNLP.io.dataset_loader.JsonLoader`
 :class:`~fastNLP.io.loader.JsonLoader`
    读取Json类型的数据集文件，数据必须按行存储，每行是一个包含各类属性的Json对象。例子如下：

    .. code-block:: python

        data_set_loader = JsonLoader(
        from fastNLP.io.loader import JsonLoader
        oader = JsonLoader(
            fields={'sentence1': 'words1', 'sentence2': 'words2', 'gold_label': 'target'}
        )
        # 表示将Json对象中'sentence1'、'sentence2'和'gold_label'对应的值赋给'words1'、'words2'、'target'这三个fields

        data_set = data_set_loader._load('path/to/your/file')
        data_set = loader._load('path/to/your/file')

    数据集内容样例如下 ::

@@ -86,108 +83,68 @@ Part III: 不同数据类型的DataSetLoader
        {"annotator_labels": ["entailment"], "captionID": "3416050480.jpg#4", "gold_label": "entailment", "pairID": "3416050480.jpg#4r1e", "sentence1": "A person on a horse jumps over a broken down airplane.", "sentence1_binary_parse": "( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) )", "sentence1_parse": "(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))", "sentence2": "A person is outdoors, on a horse.", "sentence2_binary_parse": "( ( A person ) ( ( ( ( is outdoors ) , ) ( on ( a horse ) ) ) . ) )", "sentence2_parse": "(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (ADVP (RB outdoors)) (, ,) (PP (IN on) (NP (DT a) (NN horse)))) (. .)))"}

 ------------------------------------------
 Part IV: DataSetLoader举例
 Part IV: 使用Pipe对数据集进行预处理
 ------------------------------------------

 以Matching任务为例子：

    :class:`~fastNLP.io.data_loader.matching.MatchingLoader`
        我们在fastNLP当中封装了一个Matching任务数据集的数据加载类： :class:`~fastNLP.io.data_loader.matching.MatchingLoader` .

        在MatchingLoader类当中我们封装了一个对数据集中的文本内容进行进一步的预处理的函数：
        :meth:`~fastNLP.io.data_loader.matching.MatchingLoader.process`
        这个函数具有各种预处理option，如：
        - 是否将文本转成全小写
        - 是否需要序列长度信息，需要什么类型的序列长度信息
        - 是否需要用BertTokenizer来获取序列的WordPiece信息
        - 等等

        具体内容参见 :meth:`fastNLP.io.MatchingLoader.process` 。

    :class:`~fastNLP.io.data_loader.matching.SNLILoader`
        一个关于SNLI数据集的DataSetLoader。SNLI数据集来自
        `SNLI Data Set <https://nlp.stanford.edu/projects/snli/snli_1.0.zip>`_ .

        在 :class:`~fastNLP.io.data_loader.matching.SNLILoader` 的 :meth:`~fastNLP.io.data_loader.matching.SNLILoader._load`
        函数中，我们用以下代码将数据集内容从文本文件读入内存
 在fastNLP中，我们采用 :class:`~fastNLP.io.pipe.Pipe` 来作为加载数据集的基类。
 :class:`~fastNLP.io.pipe.Pipe` 定义了各种Pipe所需的API接口，开发者应该继承它实现各种的Pipe。
 在各种数据集的Pipe当中，至少应该编写如下内容:

        .. code-block:: python

                def _load(self, path):
                    ds = JsonLoader._load(self, path)  # SNLI数据集原始文件为Json格式，可以采用JsonLoader来读取数据集文件

                    parentheses_table = str.maketrans({'(': None, ')': None})
                    # 字符串匹配格式：SNLI数据集的文本中由括号分割开的，组成树结构，因此
                    # 我们将这些括号去除。

                    ds.apply(lambda ins: ins[Const.INPUTS(0)].translate(parentheses_table).strip().split(),
                             new_field_name=Const.INPUTS(0))
                    # 把第一句话的内容用上面的字符串匹配格式进行替换，并将句子分割为一个由单词组成的list
                    ds.apply(lambda ins: ins[Const.INPUTS(1)].translate(parentheses_table).strip().split(),
                             new_field_name=Const.INPUTS(1))
                    # 对第二句话的内容进行同样的预处理
                    ds.drop(lambda x: x[Const.TARGET] == '-')  # 将标签为'-'的样本丢掉
                    return ds

 ------------------------------------------
 Part V: fastNLP封装好的数据集加载器
 ------------------------------------------
    - process 函数：对输入的 :class:`~fastNLP.io.data_bundle.DataBundle` 进行处理（如构建词表、
      将dataset的文本内容转成index等等），然后返回该 :class:`~fastNLP.io.data_bundle.DataBundle`
    - process_from_file 函数：输入数据集所在文件夹，读取内容并组装成 :class:`~fastNLP.io.data_bundle.DataBundle` ，
      然后调用相对应的process函数对数据进行预处理

 fastNLP封装好的数据集加载器可以适用于多种类型的任务：
 以SNLI数据集为例，写一个自定义Pipe的例子如下：

    - `文本分类任务`_
    - `序列标注任务`_
    - `Matching任务`_
    - `指代消解任务`_
    - `摘要任务`_
 .. code-block:: python

    from fastNLP.io.loader import SNLILoader
    from fastNLP.io.pipe import MatchingPipe

 文本分类任务
 -------------------
    class MySNLIPipe(MatchingPipe):

 文本分类任务
        def process(self, data_bundle):
            data_bundle = super(MySNLIPipe, self).process(data_bundle)
            # MatchingPipe类里封装了一个关于matching任务的process函数，可以直接继承使用
            # 如果有需要进行额外的预处理操作可以在这里加入您的代码
            return data_bundle

        def process_from_file(self, paths=None):
            data_bundle = SNLILoader().load(paths) # 使用SNLILoader读取原始数据集
            # SNLILoader的load函数中，paths如果为None则会自动下载
            return self.process(data_bundle)  # 调用相对应的process函数对data_bundle进行处理

 调用Pipe示例：

 序列标注任务
 -------------------
 .. code-block:: python

 序列标注任务
    from fastNLP.io.pipe import SNLIBertPipe
    data_bundle = SNLIBertPipe(lower=True, tokenizer=arg.tokenizer).process_from_file()
    print(data_bundle)

 输出的内容是::

 Matching任务
 -------------------
    In total 3 datasets:
            train has 549367 instances.
            dev has 9842 instances.
            test has 9824 instances.
    In total 2 vocabs:
            words has 34184 entries.
            target has 3 entries.

 :class:`~fastNLP.io.data_loader.matching.SNLILoader`
    一个关于SNLI数据集的DataSetLoader。SNLI数据集来自
    `SNLI Data Set <https://nlp.stanford.edu/projects/snli/snli_1.0.zip>`_ .
 这里表示一共有3个数据集和2个词表。其中：

 :class:`~fastNLP.io.data_loader.matching.MNLILoader`
    一个关于MultiNLI数据集的DataSetLoader。MultiNLI数据集来自 `GLUE benchmark <https://gluebenchmark.com/tasks>`_
    - 3个数据集分别为train、dev、test数据集，分别有549367、9842、9824个instance
    - 2个词表分别为words词表与target词表。其中words词表为句子文本所构建的词表，一共有34184个单词；
      target词表为目标标签所构建的词表，一共有3种标签。（注：如果有多个输入，则句子文本所构建的词表将
      会被命名为words1以对应相对应的列名）

 :class:`~fastNLP.io.data_loader.matching.QNLILoader`
    一个关于QNLI数据集的DataSetLoader。QNLI数据集来自 `GLUE benchmark <https://gluebenchmark.com/tasks>`_

 :class:`~fastNLP.io.data_loader.matching.RTELoader`
    一个关于Recognizing Textual Entailment数据集(RTE)的DataSetLoader。RTE数据集来自
    `GLUE benchmark <https://gluebenchmark.com/tasks>`_

 :class:`~fastNLP.io.data_loader.matching.QuoraLoader`
    一个关于Quora数据集的DataSetLoader。




 指代消解任务
 -------------------

 指代消解任务



 摘要任务
 -------------------
 ------------------------------------------
 Part V: fastNLP封装好的Loader和Pipe
 ------------------------------------------

 摘要任务
 fastNLP封装了多种任务/数据集的Loader和Pipe并提供自动下载功能，具体参见文档

 `fastNLP可加载的embedding与数据集 <https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?c=A1A0A0>`_

--- a/docs/source/tutorials/tutorial_3_embedding.rst
+++ b/docs/source/tutorials/tutorial_3_embedding.rst
@@ -12,6 +12,7 @@
    - `Part IV: 使用预训练的Contextual Embedding(ELMo & BERT)`_
    - `Part V: 使用character-level的embedding`_
    - `Part VI: 叠加使用多个embedding`_
    - `Part VII: fastNLP支持的预训练Embedding`_



@@ -29,18 +30,20 @@ fastNLP的embedding包括了预训练embedding和随机初始化embedding。
 Part II: 使用随机初始化的embedding
 ---------------------------------------

 使用随机初始化的embedding参见 :class:`~fastNLP.modules.encoder.embedding.Embedding` 。
 使用随机初始化的embedding参见 :class:`~fastNLP.embeddings.embedding.Embedding` 。

 可以传入词表大小和embedding维度：

 .. code-block:: python

    from fastNLP import Embedding
    embed = Embedding(10000, 50)

 也可以传入一个初始化的参数矩阵：

 .. code-block:: python

    from fastNLP import Embedding
    embed = Embedding(init_embed)

 其中的init_embed可以是torch.FloatTensor、torch.nn.Embedding或者numpy.ndarray。
@@ -53,12 +56,13 @@ Part III: 使用预训练的静态embedding
 在使用预训练的embedding之前，需要根据数据集的内容构建一个词表 :class:`~fastNLP.core.vocabulary.Vocabulary` ，在
 预训练embedding类初始化的时候需要将这个词表作为参数传入。

 在fastNLP中，我们提供了 :class:`~fastNLP.modules.encoder.embedding.StaticEmbedding` 这一个类。
 通过 :class:`~fastNLP.modules.encoder.embedding.StaticEmbedding` 可以加载预训练好的静态
 在fastNLP中，我们提供了 :class:`~fastNLP.embeddings.StaticEmbedding` 这一个类。
 通过 :class:`~fastNLP.embeddings.StaticEmbedding` 可以加载预训练好的静态
 Embedding，例子如下：

 .. code-block:: python

    from fastNLP import StaticEmbedding
    embed = StaticEmbedding(vocab, model_dir_or_name='en-glove-6b-50', requires_grad=True)

 vocab为根据数据集构建的词表，model_dir_or_name可以是一个路径，也可以是embedding模型的名称：
@@ -67,112 +71,50 @@ vocab为根据数据集构建的词表，model_dir_or_name可以是一个路径
    和word2vec类型的权重文件都支持)

    2 如果传入的是模型名称，那么fastNLP将会根据名称查找embedding模型，如果在cache目录下找到模型则会
    自动加载；如果找不到则会自动下载。可以通过环境变量 ``FASTNLP_CACHE_DIR`` 来自定义cache目录，如::
    自动加载；如果找不到则会自动下载到cache目录。默认的cache目录为 `~/.fastNLP` 文件夹。可以通过环境
    变量 ``FASTNLP_CACHE_DIR`` 来自定义cache目录，如::

        $ FASTNLP_CACHE_DIR=~/fastnlp_cache_dir python your_python_file.py

 这个命令表示fastNLP将会在 `~/fastnlp_cache_dir` 这个目录下寻找模型，找不到则会自动将模型下载到这个目录

 目前支持的静态embedding模型有：

    ==========================    ================================
    模型名称                        模型
    --------------------------    --------------------------------
    en                            glove.840B.300d
    --------------------------    --------------------------------
    en-glove-840d-300             glove.840B.300d
    --------------------------    --------------------------------
    en-glove-6b-50                glove.6B.50d
    --------------------------    --------------------------------
    en-word2vec-300               谷歌word2vec 300维
    --------------------------    --------------------------------
    en-fasttext                   英文fasttext 300维
    --------------------------    --------------------------------
    cn                            腾讯中文词向量 200维
    --------------------------    --------------------------------
    cn-fasttext                   中文fasttext 300维
    ==========================    ================================



 -----------------------------------------------------------
 Part IV: 使用预训练的Contextual Embedding(ELMo & BERT)
 -----------------------------------------------------------

 在fastNLP中，我们提供了ELMo和BERT的embedding： :class:`~fastNLP.modules.encoder.embedding.ElmoEmbedding`
 和 :class:`~fastNLP.modules.encoder.embedding.BertEmbedding` 。
 在fastNLP中，我们提供了ELMo和BERT的embedding： :class:`~fastNLP.embeddings.ElmoEmbedding`
 和 :class:`~fastNLP.embeddings.BertEmbedding` 。

 与静态embedding类似，ELMo的使用方法如下：

 .. code-block:: python

    from fastNLP import ElmoEmbedding
    embed = ElmoEmbedding(vocab, model_dir_or_name='small', requires_grad=False)

 目前支持的ElmoEmbedding模型有：

    ==========================    ================================
    模型名称                        模型
    --------------------------    --------------------------------
    small                         allennlp ELMo的small
    --------------------------    --------------------------------
    medium                        allennlp ELMo的medium
    --------------------------    --------------------------------
    original                      allennlp ELMo的original
    --------------------------    --------------------------------
    5.5b-original                 allennlp ELMo的5.5B original
    ==========================    ================================

 BERT-embedding的使用方法如下：

 .. code-block:: python

    from fastNLP import BertEmbedding
    embed = BertEmbedding(
        vocab, model_dir_or_name='en-base-cased', requires_grad=False, layers='4,-2,-1'
    )

 其中layers变量表示需要取哪几层的encode结果。

 目前支持的BertEmbedding模型有：

    ==========================    ====================================
    模型名称                        模型
    --------------------------    ------------------------------------
    en                            bert-base-cased
    --------------------------    ------------------------------------
    en-base-uncased               bert-base-uncased
    --------------------------    ------------------------------------
    en-base-cased                 bert-base-cased
    --------------------------    ------------------------------------
    en-large-uncased              bert-large-uncased
    --------------------------    ------------------------------------
    en-large-cased                bert-large-cased
    --------------------------    ------------------------------------
    --------------------------    ------------------------------------
    en-large-cased-wwm            bert-large-cased-whole-word-mask
    --------------------------    ------------------------------------
    en-large-uncased-wwm          bert-large-uncased-whole-word-mask
    --------------------------    ------------------------------------
    en-base-cased-mrpc            bert-base-cased-finetuned-mrpc
    --------------------------    ------------------------------------
    --------------------------    ------------------------------------
    multilingual                  bert-base-multilingual-cased
    --------------------------    ------------------------------------
    multilingual-base-uncased     bert-base-multilingual-uncased
    --------------------------    ------------------------------------
    multilingual-base-cased       bert-base-multilingual-cased
    ==========================    ====================================

 -----------------------------------------------------
 Part V: 使用character-level的embedding
 -----------------------------------------------------

 除了预训练的embedding以外，fastNLP还提供了CharEmbedding： :class:`~fastNLP.modules.encoder.embedding.CNNCharEmbedding` 和
 :class:`~fastNLP.modules.encoder.embedding.LSTMCharEmbedding` 。
 除了预训练的embedding以外，fastNLP还提供了CharEmbedding： :class:`~fastNLP.embeddings.CNNCharEmbedding` 和
 :class:`~fastNLP.embeddings.LSTMCharEmbedding` 。

 CNNCharEmbedding的使用例子如下：

 .. code-block:: python

    from fastNLP import CNNCharEmbedding
    embed = CNNCharEmbedding(vocab, embed_size=100, char_emb_size=50)

 这表示这个CNNCharEmbedding当中character的embedding维度大小为50，返回的embedding结果维度大小为100。
@@ -181,22 +123,23 @@ CNNCharEmbedding的使用例子如下：

 .. code-block:: python

    from fastNLP import LSTMCharEmbedding
    embed = LSTMCharEmbedding(vocab, embed_size=100, char_emb_size=50)

 这表示这个LSTMCharEmbedding当中character的embedding维度大小为50，返回的embedding结果维度大小为100。



 -----------------------------------------------------
 Part VI: 叠加使用多个embedding
 -----------------------------------------------------

 在fastNLP中，我们使用 :class:`~fastNLP.modules.encoder.embedding.StackEmbedding` 来叠加多个embedding
 在fastNLP中，我们使用 :class:`~fastNLP.embeddings.StackEmbedding` 来叠加多个embedding

 例子如下：

 .. code-block:: python

    from fastNLP import StaticEmbedding, StackEmbedding
    embed_1 = StaticEmbedding(vocab, model_dir_or_name='en-glove-6b-50', requires_grad=True)
    embed_2 = StaticEmbedding(vocab, model_dir_or_name='en-word2vec-300', requires_grad=True)

@@ -208,7 +151,17 @@ StackEmbedding会把多个embedding的结果拼接起来，如上面例子的sta

 .. code-block:: python

    from fastNLP import StaticEmbedding, StackEmbedding, ElmoEmbedding
    elmo_embedding = ElmoEmbedding(vocab, model_dir_or_name='medium', layers='0,1,2', requires_grad=False)
    glove_embedding = StaticEmbedding(vocab, model_dir_or_name='en-glove-6b-50', requires_grad=True)

    stack_embed = StackEmbedding([elmo_embedding, glove_embedding])

 ------------------------------------------
 Part VII: fastNLP支持的预训练Embedding
 ------------------------------------------

 fastNLP支持多种预训练Embedding并提供自动下载功能，具体参见文档

 `fastNLP可加载的embedding与数据集 <https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?c=A1A0A0>`_

--- a/docs/source/tutorials/tutorial_4_loss_optimizer.rst
+++ b/docs/source/tutorials/tutorial_4_loss_optimizer.rst
@@ -1,8 +1,9 @@
 ==============================================================================
 Loss 和 optimizer 教程 ———— 以文本分类为例
 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试
 ==============================================================================

 我们使用和 :doc:`/user/quickstart` 中一样的任务来进行详细的介绍。给出一段评价性文字，预测其情感倾向是积极（label=1）、消极（label=0）还是中性（label=2），使用 :class:`~fastNLP.Trainer`  和  :class:`~fastNLP.Tester`  来进行快速训练和测试，损失函数之前的内容与 :doc:`/tutorials/tutorial_5_datasetiter` 中的完全一样，如已经阅读过可以跳过。
 我们使用和 :doc:`/user/quickstart` 中一样的任务来进行详细的介绍。给出一段评价性文字，预测其情感倾向是积极（label=1）、
 消极（label=0）还是中性（label=2），使用 :class:`~fastNLP.Trainer`  和  :class:`~fastNLP.Tester`  来进行快速训练和测试。

 --------------
 数据处理
@@ -157,6 +158,7 @@ Vocabulary 的使用
 损失函数
    训练模型需要提供一个损失函数
    ,fastNLP中提供了直接可以导入使用的四种loss，分别为：
    
    * :class:`~fastNLP.CrossEntropyLoss`：包装了torch.nn.functional.cross_entropy()函数，返回交叉熵损失（可以运用于多分类场景）  
    * :class:`~fastNLP.BCELoss`：包装了torch.nn.functional.binary_cross_entropy()函数，返回二分类的交叉熵  
    * :class:`~fastNLP.L1Loss`：包装了torch.nn.functional.l1_loss()函数，返回L1 损失  
@@ -208,7 +210,7 @@ Vocabulary 的使用

        #使用CNNText的时候第一个参数输入一个tuple,作为模型定义embedding的参数
        #还可以传入 kernel_nums, kernel_sizes, padding, dropout的自定义值
        model_cnn = CNNText((len(vocab),EMBED_DIM), num_classes=3, padding=2, dropout=0.1)
        model_cnn = CNNText((len(vocab),EMBED_DIM), num_classes=3, dropout=0.1)

        #如果在定义trainer的时候没有传入optimizer参数，模型默认的优化器为torch.optim.Adam且learning rate为lr=4e-3
        #这里只使用了optimizer_1作为优化器输入，感兴趣可以尝试optimizer_2或者其他优化器作为输入
--- a/docs/source/tutorials/tutorial_5_datasetiter.rst
+++ b/docs/source/tutorials/tutorial_5_datasetiter.rst
@@ -1,8 +1,10 @@
 ==============================================================================
 DataSetIter 教程 ———— 以文本分类为例
 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程
 ==============================================================================

 我们使用和 :doc:`/user/quickstart` 中一样的任务来进行详细的介绍。给出一段评价性文字，预测其情感倾向是积极（label=1）、消极（label=0）还是中性（label=2），使用:class:`~fastNLP.DataSetIter` 类来编写自己的训练过程。自己编写训练过程之前的内容与 :doc:`/tutorials/tutorial_4_loss_optimizer` 中的完全一样，如已经阅读过可以跳过。
 我们使用和 :doc:`/user/quickstart` 中一样的任务来进行详细的介绍。给出一段评价性文字，预测其情感倾向是积极（label=1）、
 消极（label=0）还是中性（label=2），使用 :class:`~fastNLP.DataSetIter` 类来编写自己的训练过程。
 自己编写训练过程之前的内容与 :doc:`/tutorials/tutorial_4_loss_optimizer` 中的完全一样，如已经阅读过可以跳过。

 --------------
 数据处理
@@ -190,7 +192,7 @@ sampler
        import time

        embed_dim = 100
        model = CNNText((len(vocab),embed_dim), num_classes=3, padding=2, dropout=0.1)
        model = CNNText((len(vocab),embed_dim), num_classes=3, dropout=0.1)

        def train(epoch, data, devdata):
            optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
--- a/docs/source/tutorials/tutorial_6_seq_labeling.rst
+++ b/docs/source/tutorials/tutorial_6_seq_labeling.rst
@@ -1,5 +1,5 @@
 =====================
 序列标注教程
 快速实现序列标注模型
 =====================

 这一部分的内容主要展示如何使用fastNLP 实现序列标注任务。你可以使用fastNLP的各个组件快捷，方便地完成序列标注任务，达到出色的效果。
@@ -45,7 +45,7 @@ fastNLP可以方便地载入各种类型的数据。同时，针对常见的数

 数据处理
 ----------------------------
 我们进一步处理数据。将数据和词表封装在 :class:`~fastNLP.DataInfo` 类中。data是DataInfo的实例。
 我们进一步处理数据。将数据和词表封装在 :class:`~fastNLP.DataBundle` 类中。data是DataBundle的实例。
 我们输入模型的数据包括char embedding，以及word embedding。在数据处理部分，我们尝试完成词表的构建。
 使用fastNLP中的Vocabulary类来构建词表。

--- a/docs/source/tutorials/tutorial_7_modules_models.rst
+++ b/docs/source/tutorials/tutorial_7_modules_models.rst
@@ -1,5 +1,5 @@
 ======================================
 Modules 和 models 的教程
 使用Modules和Models快速搭建自定义模型
 ======================================

 :mod:`~fastNLP.modules` 和 :mod:`~fastNLP.models` 用于构建 fastNLP 所需的神经网络模型，它可以和 torch.nn 中的模型一起使用。
@@ -181,7 +181,7 @@ FastNLP 完全支持使用 pyTorch 编写的模型，但与 pyTorch 中编写模
      )
    )

 FastNLP 中包含的各种模块如下表，您可以点击具体的名称查看详细的 API:
 FastNLP 中包含的各种模块如下表，您可以点击具体的名称查看详细的 API，也可以通过 :doc:`/fastNLP.modules` 进行了解。

 .. csv-table::
   :header: 名称, 介绍
@@ -189,7 +189,6 @@ FastNLP 中包含的各种模块如下表，您可以点击具体的名称查看
   :class:`~fastNLP.modules.ConvolutionCharEncoder` , char级别的卷积 encoder
   :class:`~fastNLP.modules.LSTMCharEncoder` , char级别基于LSTM的 encoder
   :class:`~fastNLP.modules.ConvMaxpool` , 结合了Convolution和Max-Pooling于一体的模块
   :class:`~fastNLP.modules.Embedding` , 基础的Embedding模块
   :class:`~fastNLP.modules.LSTM` , LSTM模块, 轻量封装了PyTorch的LSTM
   :class:`~fastNLP.modules.StarTransformer` , Star-Transformer 的encoder部分
   :class:`~fastNLP.modules.TransformerEncoder` , Transformer的encoder模块，不包含embedding层
@@ -198,8 +197,11 @@ FastNLP 中包含的各种模块如下表，您可以点击具体的名称查看
   :class:`~fastNLP.modules.VarGRU` , Variational Dropout GRU 模块
   :class:`~fastNLP.modules.MaxPool` , Max-pooling模块
   :class:`~fastNLP.modules.MaxPoolWithMask` , 带mask矩阵的max pooling。在做 max-pooling的时候不会考虑mask值为0的位置。
   :class:`~fastNLP.modules.AvgPool` , Average-pooling模块
   :class:`~fastNLP.modules.AvgPoolWithMask` , 带mask矩阵的average pooling。在做 average-pooling的时候不会考虑mask值为0的位置。
   :class:`~fastNLP.modules.MultiHeadAttention` , MultiHead Attention 模块
   :class:`~fastNLP.modules.MLP` , 简单的多层感知器模块
   :class:`~fastNLP.modules.ConditionalRandomField` , 条件随机场模块
   :class:`~fastNLP.modules.viterbi_decode` , 给定一个特征矩阵以及转移分数矩阵，计算出最佳的路径以及对应的分数 （与 :class:`~fastNLP.modules.ConditionalRandomField` 配合使用）
   :class:`~fastNLP.modules.allowed_transitions` , 给定一个id到label的映射表，返回所有可以跳转的列表（与 :class:`~fastNLP.modules.ConditionalRandomField` 配合使用）
   :class:`~fastNLP.modules.TimestepDropout` , 简单包装过的Dropout 组件
--- a/docs/source/tutorials/tutorial_8_metrics.rst
+++ b/docs/source/tutorials/tutorial_8_metrics.rst
@@ -1,6 +1,6 @@
 =====================
 Metric 教程
 =====================
 ===============================
 使用Metric快速评测你的模型
 ===============================

 在进行训练时，fastNLP提供了各种各样的 :mod:`~fastNLP.core.metrics` 。
 如 :doc:`/user/quickstart`  中所介绍的，:class:`~fastNLP.AccuracyMetric` 类的对象被直接传到 :class:`~fastNLP.Trainer` 中用于训练
--- a/docs/source/tutorials/tutorial_9_callback.rst
+++ b/docs/source/tutorials/tutorial_9_callback.rst
@@ -1,6 +1,6 @@
 ==============================================================================
 Callback 教程
 ==============================================================================
 ===================================================
 使用Callback自定义你的训练过程
 ===================================================

 在训练时，我们常常要使用trick来提高模型的性能（如调节学习率），或者要打印训练中的信息。
 这里我们提供Callback类，在Trainer中插入代码，完成一些自定义的操作。
@@ -44,10 +44,10 @@ Callback的构建和使用

    这里，:class:`~fastNLP.Callback` 中所有以 ``on_`` 开头的类方法会在 :class:`~fastNLP.Trainer` 的训练中在特定时间调用。
    如 on_train_begin() 会在训练开始时被调用，on_epoch_end() 会在每个 epoch 结束时调用。
    具体有哪些类方法，参见文档。
    具体有哪些类方法，参见文档 :class:`~fastNLP.Callback` 。

    另外，为了使用方便，可以在 :class:`~fastNLP.Callback` 内部访问 :class:`~fastNLP.Trainer` 中的属性，如 optimizer, epoch, step，分别对应训练时的优化器，当前epoch数，和当前的总step数。
    具体可访问的属性，参见文档。
    具体可访问的属性，参见文档 :class:`~fastNLP.Callback` 。

 使用Callback
    在定义好 :class:`~fastNLP.Callback` 之后，就能将它传入Trainer的 ``callbacks`` 参数，在实际训练时使用。
--- a/docs/source/user/tutorials.rst
+++ b/docs/source/user/tutorials.rst
@@ -1,18 +1,20 @@
 ===================
 fastNLP详细使用教程
 ===================
 ========================
 fastNLP 详细使用教程
 ========================

 这里是更详细的使用教程。对于大部分的用户，我们建议你从第一篇开始顺序阅读；如果你只想了解其中的一部分，也可以进行选读。

 .. toctree::
   :maxdepth: 1

   1. 使用DataSet预处理文本 </tutorials/tutorial_1_data_preprocess>
   2. 使用DataSetLoader加载数据集 </tutorials/tutorial_2_load_dataset>
   3. 使用Embedding模块将文本转成向量 </tutorials/tutorial_3_embedding>
   4. 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试 </tutorials/tutorial_4_loss_optimizer>
   5. 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程 </tutorials/tutorial_5_datasetiter>
   6. 快速实现序列标注模型 </tutorials/tutorial_6_seq_labeling>
   7. 使用Modules和Models快速搭建自定义模型 </tutorials/tutorial_7_modules_models>
   8. 使用Metric快速评测你的模型 </tutorials/tutorial_8_metrics>
   9. 使用Callback自定义你的训练过程 </tutorials/tutorial_9_callback>
   10. 使用fitlog 辅助 fastNLP 进行科研 </tutorials/tutorial_10_fitlog>
   使用DataSet预处理文本 </tutorials/tutorial_1_data_preprocess>
   使用Loader和Pipe加载并处理数据集 </tutorials/tutorial_2_load_dataset>
   使用Embedding模块将文本转成向量 </tutorials/tutorial_3_embedding>
   动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试 </tutorials/tutorial_4_loss_optimizer>
   动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程 </tutorials/tutorial_5_datasetiter>
   快速实现序列标注模型 </tutorials/tutorial_6_seq_labeling>
   使用Modules和Models快速搭建自定义模型 </tutorials/tutorial_7_modules_models>
   使用Metric快速评测你的模型 </tutorials/tutorial_8_metrics>
   使用Callback自定义你的训练过程 </tutorials/tutorial_9_callback>
   使用fitlog 辅助 fastNLP 进行科研 </tutorials/tutorial_10_fitlog>

--- a/fastNLP/init.py
+++ b/fastNLP/init.py
@@ -1,22 +1,24 @@
 """
 fastNLP 由 :mod:`~fastNLP.core` 、 :mod:`~fastNLP.io` 、:mod:`~fastNLP.modules`、:mod:`~fastNLP.models`
 等子模块组成，你可以点进去查看每个模块的文档。
 fastNLP 由 :mod:`~fastNLP.core` 、 :mod:`~fastNLP.io` 、:mod:`~fastNLP.embeddings` 、 :mod:`~fastNLP.modules`、
 :mod:`~fastNLP.models` 等子模块组成，你可以查看每个模块的文档。

 - :mod:`~fastNLP.core` 是fastNLP 的核心模块，包括 DataSet、 Trainer、 Tester 等组件。详见文档 :doc:`/fastNLP.core`
 - :mod:`~fastNLP.io` 是实现输入输出的模块，包括了数据集的读取，模型的存取等功能。详见文档 :doc:`/fastNLP.io`
 - :mod:`~fastNLP.embeddings` 提供用于构建复杂网络模型所需的各种embedding。详见文档 :doc:`/fastNLP.embeddings`
 - :mod:`~fastNLP.modules`  包含了用于搭建神经网络模型的诸多组件，可以帮助用户快速搭建自己所需的网络。详见文档 :doc:`/fastNLP.modules`
 - :mod:`~fastNLP.models` 包含了一些使用 fastNLP 实现的完整网络模型，包括CNNText、SeqLabeling等常见模型。详见文档 :doc:`/fastNLP.models`
 - :mod:`~fastNLP.models` 包含了一些使用 fastNLP 实现的完整网络模型，包括 :class:`~fastNLP.models.CNNText` 、 :class:`~fastNLP.models.SeqLabeling` 等常见模型。详见文档 :doc:`fastNLP.models`

 fastNLP 中最常用的组件可以直接从 fastNLP 包中 import ，他们的文档如下：
 """
 __all__ = [
    "Instance",
    "FieldArray",

    
    
    "DataSetIter",
    "BatchIter",
    "TorchLoaderIter",

    
    "Vocabulary",
    "DataSet",
    "Const",
@@ -30,6 +32,7 @@ __all__ = [
    "TensorboardCallback",
    "LRScheduler",
    "ControlC",
    "LRFinder",
    
    "Padder",
    "AutoPadder",
@@ -42,7 +45,8 @@ __all__ = [
    "Optimizer",
    "SGD",
    "Adam",
    
    "AdamW",

    "Sampler",
    "SequentialSampler",
    "BucketSampler",
@@ -50,15 +54,19 @@ __all__ = [
    
    "LossFunc",
    "CrossEntropyLoss",
    "L1Loss", "BCELoss",
    "L1Loss",
    "BCELoss",
    "NLLLoss",
    "LossInForward",
    
    "cache_results"
    "cache_results",

    'logger'
 ]
 __version__ = '0.4.5'

 from .core import *
 from . import embeddings
 from . import models
 from . import modules
 from .io import data_loader
 from .core import *
 from .io import loader, pipe
--- a/fastNLP/core/init.py
+++ b/fastNLP/core/init.py
@@ -1,30 +1,94 @@
 """
 core 模块里实现了 fastNLP 的核心框架，常用的功能都可以从 fastNLP 包中直接 import。当然你也同样可以从 core 模块的子模块中 import，
 例如 Batch 组件有两种 import 的方式::
 例如 :class:`~fastNLP.DataSetIter` 组件有两种 import 的方式::
    
    # 直接从 fastNLP 中 import
    from fastNLP import Batch
    from fastNLP import DataSetIter
    
    # 从 core 模块的子模块 batch 中 import
    from fastNLP.core.batch import Batch
    # 从 core 模块的子模块 batch 中 import DataSetIter
    from fastNLP.core.batch import DataSetIter

 对于常用的功能，你只需要在 :doc:`fastNLP` 中查看即可。如果想了解各个子模块的具体作用，您可以在下面找到每个子模块的具体文档。

 .. todo::
    介绍core 的子模块的分工，好像必要性不大
    
 """
 __all__ = [
    "DataSet",
    
    "Instance",
    
    "FieldArray",
    "Padder",
    "AutoPadder",
    "EngChar2DPadder",
    
    "Vocabulary",
    
    "DataSetIter",
    "BatchIter",
    "TorchLoaderIter",
    
    "Const",
    
    "Tester",
    "Trainer",
    
    "cache_results",
    "seq_len_to_mask",
    "get_seq_len",
    "logger",
    
    "Callback",
    "GradientClipCallback",
    "EarlyStopCallback",
    "FitlogCallback",
    "EvaluateCallback",
    "LRScheduler",
    "ControlC",
    "LRFinder",
    "TensorboardCallback",
    "WarmupCallback",
    'SaveModelCallback',
    "EchoCallback",
    "TesterCallback",
    "CallbackException",
    "EarlyStopError",
    
    "LossFunc",
    "CrossEntropyLoss",
    "L1Loss",
    "BCELoss",
    "NLLLoss",
    "LossInForward",
    
    "AccuracyMetric",
    "SpanFPreRecMetric",
    "ExtractiveQAMetric",
    
    "Optimizer",
    "SGD",
    "Adam",
    "AdamW",
    
    "SequentialSampler",
    "BucketSampler",
    "RandomSampler",
    "Sampler",
 ]

 from ._logger import logger
 from .batch import DataSetIter, BatchIter, TorchLoaderIter
 from .callback import Callback, GradientClipCallback, EarlyStopCallback, TensorboardCallback, LRScheduler, ControlC
 from .callback import Callback, GradientClipCallback, EarlyStopCallback, FitlogCallback, EvaluateCallback, \
    LRScheduler, ControlC, LRFinder, TensorboardCallback, WarmupCallback, SaveModelCallback, EchoCallback, \
    TesterCallback, CallbackException, EarlyStopError
 from .const import Const
 from .dataset import DataSet
 from .field import FieldArray, Padder, AutoPadder, EngChar2DPadder
 from .instance import Instance
 from .losses import LossFunc, CrossEntropyLoss, L1Loss, BCELoss, NLLLoss, LossInForward
 from .metrics import AccuracyMetric, SpanFPreRecMetric, ExtractiveQAMetric
 from .optimizer import Optimizer, SGD, Adam
 from .optimizer import Optimizer, SGD, Adam, AdamW
 from .sampler import SequentialSampler, BucketSampler, RandomSampler, Sampler
 from .tester import Tester
 from .trainer import Trainer
 from .utils import cache_results, seq_len_to_mask
 from .utils import cache_results, seq_len_to_mask, get_seq_len
 from .vocabulary import Vocabulary
--- a/fastNLP/core/_logger.py
+++ b/fastNLP/core/_logger.py
@@ -0,0 +1,155 @@
 """undocumented"""

 __all__ = [
    'logger',
 ]

 import logging
 import logging.config
 import os
 import sys
 import warnings

 ROOT_NAME = 'fastNLP'

 try:
    import fitlog
 except ImportError:
    fitlog = None
 try:
    from tqdm.auto import tqdm
 except ImportError:
    tqdm = None

 if tqdm is not None:
    class TqdmLoggingHandler(logging.Handler):
        def __init__(self, level=logging.INFO):
            super().__init__(level)
        
        def emit(self, record):
            try:
                msg = self.format(record)
                tqdm.write(msg)
                self.flush()
            except (KeyboardInterrupt, SystemExit):
                raise
            except:
                self.handleError(record)
 else:
    class TqdmLoggingHandler(logging.StreamHandler):
        def __init__(self, level=logging.INFO):
            super().__init__(sys.stdout)
            self.setLevel(level)


 def _get_level(level):
    if isinstance(level, int):
        pass
    else:
        level = level.lower()
        level = {'info': logging.INFO, 'debug': logging.DEBUG,
                 'warn': logging.WARN, 'warning': logging.WARN,
                 'error': logging.ERROR}[level]
    return level


 def _add_file_handler(logger, path, level='INFO'):
    for h in logger.handlers:
        if isinstance(h, logging.FileHandler):
            if os.path.abspath(path) == h.baseFilename:
                # file path already added
                return
    
    # File Handler
    if os.path.exists(path):
        assert os.path.isfile(path)
        warnings.warn('log already exists in {}'.format(path))
    dirname = os.path.abspath(os.path.dirname(path))
    os.makedirs(dirname, exist_ok=True)
    
    file_handler = logging.FileHandler(path, mode='a')
    file_handler.setLevel(_get_level(level))
    file_formatter = logging.Formatter(fmt='%(asctime)s - %(module)s - [%(levelname)s] - %(message)s',
                                       datefmt='%Y/%m/%d %H:%M:%S')
    file_handler.setFormatter(file_formatter)
    logger.addHandler(file_handler)


 def _set_stdout_handler(logger, stdout='tqdm', level='INFO'):
    level = _get_level(level)
    if stdout not in ['none', 'plain', 'tqdm']:
        raise ValueError('stdout must in one of {}'.format(['none', 'plain', 'tqdm']))
    # make sure to initialize logger only once
    stream_handler = None
    for i, h in enumerate(logger.handlers):
        if isinstance(h, (logging.StreamHandler, TqdmLoggingHandler)):
            stream_handler = h
            break
    if stream_handler is not None:
        logger.removeHandler(stream_handler)
    
    # Stream Handler
    if stdout == 'plain':
        stream_handler = logging.StreamHandler(sys.stdout)
    elif stdout == 'tqdm':
        stream_handler = TqdmLoggingHandler(level)
    else:
        stream_handler = None
    
    if stream_handler is not None:
        stream_formatter = logging.Formatter('%(message)s')
        stream_handler.setLevel(level)
        stream_handler.setFormatter(stream_formatter)
        logger.addHandler(stream_handler)


 class FastNLPLogger(logging.getLoggerClass()):
    def __init__(self, name):
        super().__init__(name)
    
    def add_file(self, path='./log.txt', level='INFO'):
        """add log output file and level"""
        _add_file_handler(self, path, level)
    
    def set_stdout(self, stdout='tqdm', level='INFO'):
        """set stdout format and level"""
        _set_stdout_handler(self, stdout, level)


 logging.setLoggerClass(FastNLPLogger)


 # print(logging.getLoggerClass())
 # print(logging.getLogger())

 def _init_logger(path=None, stdout='tqdm', level='INFO'):
    """initialize logger"""
    level = _get_level(level)
    
    # logger = logging.getLogger()
    logger = logging.getLogger(ROOT_NAME)
    logger.propagate = False
    logger.setLevel(level)
    
    _set_stdout_handler(logger, stdout, level)
    
    # File Handler
    if path is not None:
        _add_file_handler(logger, path, level)
    
    return logger


 def _get_logger(name=None, level='INFO'):
    level = _get_level(level)
    if name is None:
        name = ROOT_NAME
    assert isinstance(name, str)
    if not name.startswith(ROOT_NAME):
        name = '{}.{}'.format(ROOT_NAME, name)
    logger = logging.getLogger(name)
    logger.setLevel(level)
    return logger


 logger = _init_logger(path=None)
--- a/fastNLP/core/_parallel_utils.py
+++ b/fastNLP/core/_parallel_utils.py
@@ -1,10 +1,14 @@
 """undocumented"""

 __all__ = []

 import threading

 import torch
 from torch import nn
 from torch.nn.parallel.parallel_apply import get_a_var

 from torch.nn.parallel.scatter_gather import scatter_kwargs, gather
 from torch.nn.parallel.replicate import replicate
 from torch.nn.parallel.scatter_gather import scatter_kwargs, gather


 def parallel_apply(modules, func_name, inputs, kwargs_tup=None, devices=None):
@@ -26,11 +30,11 @@ def parallel_apply(modules, func_name, inputs, kwargs_tup=None, devices=None):
        assert len(modules) == len(devices)
    else:
        devices = [None] * len(modules)

    
    lock = threading.Lock()
    results = {}
    grad_enabled = torch.is_grad_enabled()

    
    def _worker(i, module, input, kwargs, device=None):
        torch.set_grad_enabled(grad_enabled)
        if device is None:
@@ -46,20 +50,20 @@ def parallel_apply(modules, func_name, inputs, kwargs_tup=None, devices=None):
        except Exception as e:
            with lock:
                results[i] = e

    
    if len(modules) > 1:
        threads = [threading.Thread(target=_worker,
                                    args=(i, module, input, kwargs, device))
                   for i, (module, input, kwargs, device) in
                   enumerate(zip(modules, inputs, kwargs_tup, devices))]

        
        for thread in threads:
            thread.start()
        for thread in threads:
            thread.join()
    else:
        _worker(0, modules[0], inputs[0], kwargs_tup[0], devices[0])

    
    outputs = []
    for i in range(len(inputs)):
        output = results[i]
@@ -78,6 +82,7 @@ def _data_parallel_wrapper(func_name, device_ids, output_device):
    :param output_device: nn.DataParallel中的output_device
    :return:
    """
    
    def wrapper(network, *inputs, **kwargs):
        inputs, kwargs = scatter_kwargs(inputs, kwargs, device_ids, dim=0)
        if len(device_ids) == 1:
@@ -85,4 +90,18 @@ def _data_parallel_wrapper(func_name, device_ids, output_device):
        replicas = replicate(network, device_ids[:len(inputs)])
        outputs = parallel_apply(replicas, func_name, inputs, kwargs, device_ids[:len(replicas)])
        return gather(outputs, output_device)
    
    return wrapper


 def _model_contains_inner_module(model):
    """

    :param nn.Module model: 模型文件，判断是否内部包含model.module, 多用于check模型是否是nn.DataParallel,
        nn.parallel.DistributedDataParallel。主要是在做形参匹配的时候需要使用最内部的model的function。
    :return: bool
    """
    if isinstance(model, nn.Module):
        if isinstance(model, (nn.DataParallel, nn.parallel.DistributedDataParallel)):
            return True
    return False
--- a/fastNLP/core/batch.py
+++ b/fastNLP/core/batch.py
@@ -1,24 +1,23 @@
 """
 batch 模块实现了 fastNLP 所需的 Batch 类。
 batch 模块实现了 fastNLP 所需的 :class:`~fastNLP.core.batch.DataSetIter` 类。

 """
 __all__ = [
    "BatchIter",
    "DataSetIter",
    "TorchLoaderIter",
 ]

 import atexit
 from queue import Empty, Full

 import numpy as np
 import torch
 import torch.multiprocessing as mp
 import torch.utils.data
 from numbers import Number

 from .sampler import SequentialSampler
 from .dataset import DataSet

 from ._logger import logger
 _python_is_exit = False


@@ -49,6 +48,11 @@ class DataSetGetter:
        return len(self.dataset)

    def collate_fn(self, batch: list):
        """

        :param batch: [[idx1, x_dict1, y_dict1], [idx2, x_dict2, y_dict2], [xx, xx, xx]]
        :return:
        """
        # TODO 支持在DataSet中定义collate_fn，因为有时候可能需要不同的field之间融合，比如BERT的场景
        batch_x = {n:[] for n in self.inputs.keys()}
        batch_y = {n:[] for n in self.targets.keys()}
@@ -71,7 +75,7 @@ class DataSetGetter:
                        try:
                            data, flag = _to_tensor(data, f.dtype)
                        except TypeError as e:
                            print(f"Field {n} cannot be converted to torch.tensor.")
                            logger.error(f"Field {n} cannot be converted to torch.tensor.")
                            raise e
                    batch_dict[n] = data
            return batch_dict
@@ -94,9 +98,13 @@ class DataSetGetter:

 class SamplerAdapter(torch.utils.data.Sampler):
    def __init__(self, sampler, dataset):
        super().__init__(dataset)
        self.sampler = sampler
        self.dataset = dataset

    def __len__(self):
        return len(self.dataset)

    def __iter__(self):
        return iter(self.sampler(self.dataset))

@@ -166,15 +174,19 @@ class DataSetIter(BatchIter):
                 timeout=0, worker_init_fn=None):
        super().__init__()
        assert isinstance(dataset, DataSet)
        sampler = SamplerAdapter(sampler=sampler or SequentialSampler(), dataset=dataset)
        if not isinstance(sampler, torch.utils.data.Sampler):
            self.sampler = SamplerAdapter(sampler=sampler or SequentialSampler(), dataset=dataset)
        else:
            self.sampler = sampler
        dataset = DataSetGetter(dataset, as_numpy)
        collate_fn = dataset.collate_fn if hasattr(dataset, 'collate_fn') else None
        self.dataiter = torch.utils.data.DataLoader(
            dataset=dataset, batch_size=batch_size, sampler=sampler,
            dataset=dataset, batch_size=batch_size, sampler=self.sampler,
            collate_fn=collate_fn, num_workers=num_workers,
            pin_memory=pin_memory, drop_last=drop_last,
            timeout=timeout, worker_init_fn=worker_init_fn)
        self.num_batches = self.get_num_batches(len(dataset), batch_size, drop_last)
        # 以sampler的数量为准，因为DistributedSampler的时候每个进程上并不是所有的数据都用上了
        self.num_batches = self.get_num_batches(len(self.dataiter.sampler), batch_size, drop_last)
        self.batch_size = batch_size


@@ -183,7 +195,7 @@ class TorchLoaderIter(BatchIter):
        super().__init__()
        assert isinstance(dataset, torch.utils.data.DataLoader)
        self.dataiter = dataset
        self.num_batches = self.get_num_batches(len(dataset), dataset.batch_size, dataset.drop_last)
        self.num_batches = self.get_num_batches(len(dataset.sampler), dataset.batch_size, dataset.drop_last)
        self.batch_size = dataset.batch_size


@@ -201,6 +213,13 @@ class OnlineDataIter(BatchIter):


 def _to_tensor(batch, field_dtype):
    """

    :param batch: np.array()
    :param field_dtype: 数据类型
    :return: batch, flag. 如果传入的数据支持转为tensor，返回的batch就是tensor，且flag为True；如果传入的数据不支持转为tensor，
        返回的batch就是原来的数据，且flag为False
    """
    try:
        if field_dtype is not None and isinstance(field_dtype, type)\
                and issubclass(field_dtype, Number) \
--- a/fastNLP/core/callback.py
+++ b/fastNLP/core/callback.py
@@ -2,11 +2,11 @@ r"""
 callback模块实现了 fastNLP 中的许多 callback 类，用于增强 :class:`~fastNLP.Trainer` 类。

 虽然Trainer本身已经集成了一些功能，但仍然不足以囊括训练过程中可能需要到的功能，
 比如负采样，learning rate decay, Early Stop等。
 为了解决这个问题fastNLP引入了callback的机制，Callback 是一种在Trainer训练过程中特定阶段会运行的函数集合。
 关于Trainer的详细文档，请参见 :doc:`trainer 模块<fastNLP.core.trainer>`
 比如负采样，learning rate decay 和 early stop等。
 为了解决这个问题，fastNLP引入了callback的机制，:class:`~fastNLP.Callback` 是一种在Trainer训练过程中特定阶段会运行的函数集合。
 关于 :class:`~fastNLP.Trainer` 的详细文档，请参见 :doc:`trainer 模块<fastNLP.core.trainer>`

 我们将 :meth:`~fastNLP.Train.train` 这个函数内部分为以下的阶段，在对应阶段会触发相应的调用::
 我们将 :meth:`~fastNLP.Trainer.train` 这个函数内部分为以下的阶段，在对应阶段会触发相应的调用::

    callback.on_train_begin()  # 开始进行训练
    for i in range(1, n_epochs+1):
@@ -31,8 +31,8 @@ callback模块实现了 fastNLP 中的许多 callback 类，用于增强 :class:
    callback.on_train_end() # 训练结束
    callback.on_exception() # 这是一个特殊的步骤，在训练过程中遭遇exception会跳转到这里。

 如下面的例子所示，我们可以使用内置的 callback 类，或者继承 :class:`~fastNLP.core.callback.Callback`
 定义自己的 callback 类::
 如下面的例子所示，我们可以使用内置的 callback 组件，或者继承 :class:`~fastNLP.core.callback.Callback`
 定义自己的 callback 组件::
    
    from fastNLP import Callback, EarlyStopCallback, Trainer, CrossEntropyLoss, AccuracyMetric
    from fastNLP.models import CNNText
@@ -51,12 +51,19 @@ callback模块实现了 fastNLP 中的许多 callback 类，用于增强 :class:
 """
 __all__ = [
    "Callback",

    "GradientClipCallback",
    "EarlyStopCallback",
    "TensorboardCallback",
    "FitlogCallback",
    "EvaluateCallback",
    "LRScheduler",
    "ControlC",
    "LRFinder",
    "TensorboardCallback",
    "WarmupCallback",
    "SaveModelCallback",
    "EchoCallback",
    "TesterCallback",
    
    "CallbackException",
    "EarlyStopError"
@@ -76,9 +83,9 @@ try:
 except:
    tensorboardX_flag = False

 from ..io.model_io import ModelSaver, ModelLoader
 from .dataset import DataSet
 from .tester import Tester
 from ._logger import logger

 try:
    import fitlog
@@ -100,7 +107,8 @@ class Callback(object):
    def __init__(self):
        super(Callback, self).__init__()
        self._trainer = None  # 在Trainer内部被重新赋值
    
        self._disabled = False

    @property
    def trainer(self):
        """
@@ -158,7 +166,19 @@ class Callback(object):
    def batch_per_epoch(self):
        """每个epoch一共有多少个batch，只有在on_epoch_begin之后才能调用该属性。"""
        return self._trainer.batch_per_epoch
    

    @property
    def is_master(self):
        return self._trainer.is_master

    @property
    def disabled(self):
        return self._disabled

    @property
    def logger(self):
        return getattr(self._trainer, 'logger', logger)

    def on_train_begin(self):
        """
        在Train过程开始之前调用。
@@ -250,6 +270,14 @@ class Callback(object):
        :return:
        """
        pass

    def on_validation(self):
        """
        如果Trainer中设置了验证，则会在每次需要验证时调用该函数

        :return:
        """
        pass
    
    def on_epoch_end(self):
        """
@@ -281,6 +309,8 @@ def _transfer(func):
    def wrapper(manager, *arg):
        returns = []
        for callback in manager.callbacks:
            if callback.disabled:
                continue
            returns.append(getattr(callback, func.__name__)(*arg))
        return returns
    
@@ -297,22 +327,28 @@ class CallbackManager(Callback):
        """
        super(CallbackManager, self).__init__()
        # set attribute of trainer environment
        
        self._env = env
        self.callbacks = []
        if callbacks is not None:
            if isinstance(callbacks, list):
                if all([isinstance(cb, Callback) for cb in callbacks]) is True:
                    self.callbacks.extend(callbacks)
                else:
                    obj = [not isinstance(cb, Callback) for cb in callbacks][0]
                    raise TypeError(f"Expect sub-classes of Callback. Got {type(obj)}")
        if callbacks:
            self.callbacks = self.prepare_callbacks(callbacks)

    def prepare_callbacks(self, callbacks):
        if not callbacks:
            return []
        if isinstance(callbacks, list):
            if all([isinstance(cb, Callback) for cb in callbacks]) is True:
                pass
            else:
                raise TypeError(f"Expect callbacks in CallbackManager(callbacks) to be list. Got {type(callbacks)}.")
        
        for env_name, env_val in env.items():
            for callback in self.callbacks:
                obj = [not isinstance(cb, Callback) for cb in callbacks][0]
                raise TypeError(f"Expect sub-classes of Callback. Got {type(obj)}")
        else:
            raise TypeError(f"Expect callbacks in CallbackManager(callbacks) to be list. Got {type(callbacks)}.")

        for env_name, env_val in self._env.items():
            for callback in callbacks:
                setattr(callback, '_' + env_name, env_val)  # Callback.trainer
    
        return callbacks

    @_transfer
    def on_train_begin(self):
        pass
@@ -352,6 +388,10 @@ class CallbackManager(Callback):
    @_transfer
    def on_valid_end(self, eval_result, metric_key, optimizer, is_better_eval):
        pass

    @_transfer
    def on_validation(self):
        pass
    
    @_transfer
    def on_epoch_end(self):
@@ -366,6 +406,33 @@ class CallbackManager(Callback):
        pass


 class DistCallbackManager(CallbackManager):
    def __init__(self, env, callbacks_all=None, callbacks_master=None):
        super(DistCallbackManager, self).__init__(env)
        assert 'trainer' in env
        self._trainer = env['trainer']
        self.callbacks_master = []
        self.callbacks_all = []
        self.add_callback(callbacks_all, master=False)
        self.add_callback(callbacks_master, master=True)

    def patch_callback(self, callbacks, disabled):
        if not callbacks:
            return
        if not isinstance(callbacks, (list, tuple)):
            callbacks = [callbacks]
        for cb in callbacks:
            cb._disabled = disabled

    def add_callback(self, cb, master=False):
        if master:
            self.patch_callback(cb, not self.is_master)
            self.callbacks_master += self.prepare_callbacks(cb)
        else:
            self.callbacks_all += self.prepare_callbacks(cb)
        self.callbacks = self.callbacks_all + self.callbacks_master


 class GradientClipCallback(Callback):
    """
    别名：:class:`fastNLP.GradientClipCallback` :class:`fastNLP.core.callback.GradientClipCallback`
@@ -403,6 +470,9 @@ class GradientClipCallback(Callback):
    def on_backward_end(self):
        if self.step%self.update_every==0:
            if self.parameters is None:
                if getattr(self.trainer, 'fp16', ''):
                    from apex import amp
                    self.clip_fun(amp.master_params(self.optimizer), self.clip_value)
                self.clip_fun(self.model.parameters(), self.clip_value)
            else:
                self.clip_fun(self.parameters, self.clip_value)
@@ -434,7 +504,7 @@ class EarlyStopCallback(Callback):
    
    def on_exception(self, exception):
        if isinstance(exception, EarlyStopError):
            print("Early Stopping triggered in epoch {}!".format(self.epoch))
            logger.info("Early Stopping triggered in epoch {}!".format(self.epoch))
        else:
            raise exception  # 抛出陌生Error

@@ -448,10 +518,9 @@ class FitlogCallback(Callback):
        并将验证结果写入到fitlog中。这些数据集的结果是根据dev上最好的结果报道的，即如果dev在第3个epoch取得了最佳，则
        fitlog中记录的关于这些数据集的结果就是来自第三个epoch的结果。

    :param ~fastNLP.DataSet,dict(~fastNLP.DataSet) data: 传入DataSet对象，会使用多个Trainer中的metric对数据进行验证。如果需要传入多个
        DataSet请通过dict的方式传入，dict的key将作为对应dataset的name传递给fitlog。若tester不为None时，data需要通过
        dict的方式传入。如果仅传入DataSet, 则被命名为test
    :param ~fastNLP.Tester tester: Tester对象，将在on_valid_end时调用。tester中的DataSet会被称为为`test`
    :param ~fastNLP.DataSet,Dict[~fastNLP.DataSet] data: 传入DataSet对象，会使用多个Trainer中的metric对数据进行验证。如果需要
        传入多个DataSet请通过dict的方式传入，dict的key将作为对应dataset的name传递给fitlog。data的结果的名称以'data'开头。
    :param ~fastNLP.Tester,Dict[~fastNLP.Tester] tester: Tester对象，将在on_valid_end时调用。tester的结果的名称以'tester'开头
    :param int log_loss_every: 多少个step记录一次loss(记录的是这几个batch的loss平均值)，如果数据集较大建议将该值设置得
        大一些，不然会导致log文件巨大。默认为0, 即不要记录loss。
    :param int verbose: 是否在终端打印evaluation的结果，0不打印。
@@ -465,21 +534,24 @@ class FitlogCallback(Callback):
        self._log_exception = log_exception
        assert isinstance(log_loss_every, int) and log_loss_every>=0
        if tester is not None:
            assert isinstance(tester, Tester), "Only fastNLP.Tester allowed."
            assert isinstance(data, dict) or data is None, "If tester is not None, only dict[DataSet] allowed for data."
            if data is not None:
                assert 'test' not in data, "Cannot use `test` as DataSet key, when tester is passed."
            setattr(tester, 'verbose', 0)
            self.testers['test'] = tester
        
            if isinstance(tester, dict):
                for name, test in tester.items():
                    if not isinstance(test, Tester):
                        raise TypeError(f"{name} in tester is not a valid fastNLP.Tester.")
                    self.testers['tester-' + name] = test
            if isinstance(tester, Tester):
                self.testers['tester-test'] = tester
            for tester in self.testers.values():
                setattr(tester, 'verbose', 0)

        if isinstance(data, dict):
            for key, value in data.items():
                assert isinstance(value, DataSet), f"Only DataSet object is allowed, not {type(value)}."
            for key, value in data.items():
                self.datasets[key] = value
                self.datasets['data-' + key] = value
        elif isinstance(data, DataSet):
            self.datasets['test'] = data
        else:
            self.datasets['data-test'] = data
        elif data is not None:
            raise TypeError("data receives dict[DataSet] or DataSet object.")
        
        self.verbose = verbose
@@ -492,8 +564,11 @@ class FitlogCallback(Callback):
        
        if len(self.datasets) > 0:
            for key, data in self.datasets.items():
                tester = Tester(data=data, model=self.model, batch_size=self.batch_size, metrics=self.trainer.metrics,
                                verbose=0)
                tester = Tester(data=data, model=self.model,
                                batch_size=self.trainer.kwargs.get('dev_batch_size', self.batch_size),
                                metrics=self.trainer.metrics,
                                verbose=0,
                                use_tqdm=self.trainer.test_use_tqdm)
                self.testers[key] = tester
        fitlog.add_progress(total_steps=self.n_steps)
    
@@ -533,6 +608,68 @@ class FitlogCallback(Callback):
            fitlog.add_other(repr(exception), name='except_info')


 class EvaluateCallback(Callback):
    """
    别名: :class:`fastNLP.EvaluateCallback` :class:`fastNLP.core.callback.EvaluateCallback`

    该callback用于扩展Trainer训练过程中只能对dev数据进行验证的问题。

    :param ~fastNLP.DataSet,Dict[~fastNLP.DataSet] data: 传入DataSet对象，会使用多个Trainer中的metric对数据进行验证。如果需要传入多个
        DataSet请通过dict的方式传入。
    :param ~fastNLP.Tester,Dict[~fastNLP.DataSet] tester: Tester对象，将在on_valid_end时调用。
    """

    def __init__(self, data=None, tester=None):
        super().__init__()
        self.datasets = {}
        self.testers = {}
        if tester is not None:
            if isinstance(tester, dict):
                for name, test in tester.items():
                    if not isinstance(test, Tester):
                        raise TypeError(f"{name} in tester is not a valid fastNLP.Tester.")
                    self.testers['tester-' + name] = test
            if isinstance(tester, Tester):
                self.testers['tester-test'] = tester
            for tester in self.testers.values():
                setattr(tester, 'verbose', 0)

        if isinstance(data, dict):
            for key, value in data.items():
                assert isinstance(value, DataSet), f"Only DataSet object is allowed, not {type(value)}."
            for key, value in data.items():
                self.datasets['data-' + key] = value
        elif isinstance(data, DataSet):
            self.datasets['data-test'] = data
        elif data is not None:
            raise TypeError("data receives dict[DataSet] or DataSet object.")

    def on_train_begin(self):
        if len(self.datasets) > 0 and self.trainer.dev_data is None:
            raise RuntimeError("Trainer has no dev data, you cannot pass extra DataSet to do evaluation.")

        if len(self.datasets) > 0:
            for key, data in self.datasets.items():
                tester = Tester(data=data, model=self.model,
                                batch_size=self.trainer.kwargs.get('dev_batch_size', self.batch_size),
                                metrics=self.trainer.metrics, verbose=0,
                                use_tqdm=self.trainer.test_use_tqdm)
                self.testers[key] = tester

    def on_valid_end(self, eval_result, metric_key, optimizer, better_result):
        if len(self.testers) > 0:
            for key, tester in self.testers.items():
                try:
                    eval_result = tester.test()
                    # self.pbar.write("Evaluation on {}:".format(key))
                    self.logger.info("Evaluation on {}:".format(key))
                    # self.pbar.write(tester._format_eval_results(eval_result))
                    self.logger.info(tester._format_eval_results(eval_result))
                except Exception:
                    # self.pbar.write("Exception happens when evaluate on DataSet named `{}`.".format(key))
                    self.logger.info("Exception happens when evaluate on DataSet named `{}`.".format(key))


 class LRScheduler(Callback):
    """
    别名：:class:`fastNLP.LRScheduler` :class:`fastNLP.core.callback.LRScheduler`
@@ -586,7 +723,7 @@ class SmoothValue(object):
        self.smooth = None
    
    def add_value(self, val: float) -> None:
        "Add `val` to calculate updated smoothed value."
        """Add `val` to calculate updated smoothed value."""
        self.n += 1
        self.mov_avg = self.beta * self.mov_avg + (1 - self.beta) * val
        self.smooth = self.mov_avg / (1 - self.beta ** self.n)
@@ -614,8 +751,7 @@ class LRFinder(Callback):
        self.smooth_value = SmoothValue(0.8)
        self.opt = None
        self.find = None
        self.loader = ModelLoader()
    

    @property
    def lr_gen(self):
        scale = (self.end_lr - self.start_lr) / self.batch_per_epoch
@@ -630,7 +766,7 @@ class LRFinder(Callback):
            self.opt = self.trainer.optimizer  # pytorch optimizer
            self.opt.param_groups[0]["lr"] = self.start_lr
            # save model
            ModelSaver("tmp").save_pytorch(self.trainer.model, param_only=True)
            torch.save(self.model.state_dict(), 'tmp')
            self.find = True
    
    def on_backward_begin(self, loss):
@@ -659,7 +795,9 @@ class LRFinder(Callback):
            self.opt.param_groups[0]["lr"] = self.best_lr
            self.find = False
            # reset model
            ModelLoader().load_pytorch(self.trainer.model, "tmp")
            states = torch.load('tmp')
            self.model.load_state_dict(states)
            os.remove('tmp')
            self.pbar.write("Model reset. \nFind best lr={}".format(self.best_lr))


@@ -850,14 +988,14 @@ class SaveModelCallback(Callback):
            try:
                _save_model(self.model, model_name=name, save_dir=self.save_dir, only_param=self.only_param)
            except Exception as e:
                print(f"The following exception:{e} happens when save model to {self.save_dir}.")
                logger.error(f"The following exception:{e} happens when save model to {self.save_dir}.")
        if delete_pair:
            try:
                delete_model_path = os.path.join(self.save_dir, delete_pair[1])
                if os.path.exists(delete_model_path):
                    os.remove(delete_model_path)
            except Exception as e:
                print(f"Fail to delete model {name} at {self.save_dir} caused by exception:{e}.")
                logger.error(f"Fail to delete model {name} at {self.save_dir} caused by exception:{e}.")

    def on_exception(self, exception):
        if self.save_on_exception:
@@ -884,3 +1022,70 @@ class EarlyStopError(CallbackException):
    
    def __init__(self, msg):
        super(EarlyStopError, self).__init__(msg)


 class EchoCallback(Callback):
    def __init__(self, name, out=sys.stdout):
        super(EchoCallback, self).__init__()
        self.name = name
        self.out = out

    def __getattribute__(self, item):
        if item.startswith('on_'):
            logger.info('{}.{} has been called at pid: {}'.format(self.name, item, os.getpid()),
                  file=self.out)
        return super(EchoCallback, self).__getattribute__(item)


 class TesterCallback(Callback):
    def __init__(self, data, model, metrics, metric_key=None, batch_size=16, num_workers=None):
        super(TesterCallback, self).__init__()
        self.tester = Tester(data, model,
                             metrics=metrics, batch_size=batch_size,
                             num_workers=num_workers, verbose=0)
        # parse metric_key
        # increase_better is True. It means the exp result gets better if the indicator increases.
        # It is true by default.
        self.increase_better = True
        if metric_key is not None:
            self.increase_better = False if metric_key[0] == "-" else True
            self.metric_key = metric_key[1:] if metric_key[0] == "+" or metric_key[0] == "-" else metric_key
        else:
            self.metric_key = None
        self.score = None

    def on_validation(self):
        cur_score = self.tester.test()
        eval_str = "Evaluation at Epoch {}/{}. Step:{}/{}. - {}".format(
                    self.epoch, self.n_epochs, self.step, self.n_steps,
                    self.tester._format_eval_results(cur_score))
        self.logger.info(eval_str)
        is_better = self.compare_better(cur_score)
        if is_better:
            self.score = cur_score
        return cur_score, is_better

    def _get_score(self, metric_dict, key):
        for metric in metric_dict.items():
            if key in metric:
                return metric[key]
        return None

    def compare_better(self, a):
        if self.score is None:
            return True
        if self.metric_key is None:
            self.metric_key = list(list(self.score.values())[0].keys())[0]
        k = self.metric_key
        score = self._get_score(self.score, k)
        new_score = self._get_score(a, k)
        if score is None or new_score is None:
            return False
        if self.increase_better:
            return score <= new_score
        else:
            return score >= new_score

    def on_train_end(self):
        self.logger.info('Evaluate on training ends.')
        self.on_validation()
--- a/fastNLP/core/const.py
+++ b/fastNLP/core/const.py
@@ -1,3 +1,13 @@
 """
 .. todo::
    doc
 """

 __all__ = [
    "Const"
 ]


 class Const:
    """
    fastNLP中field命名常量。
@@ -7,12 +17,14 @@ class Const:
        
    具体列表::

        INPUT       模型的序列输入      words（复数words1, words2）
        CHAR_INPUT  模型character输入  chars（复数chars1， chars2）
        INPUT_LEN   序列长度           seq_len（复数seq_len1，seq_len2）
        OUTPUT      模型输出           pred（复数pred1， pred2）
        TARGET      真实目标           target（复数target1，target2）
        LOSS        损失函数           loss （复数loss1，loss2）
        INPUT       模型的序列输入      words（具有多列words时，依次使用words1, words2, ）
        CHAR_INPUT  模型character输入  chars（具有多列chars时，依次使用chars1， chars2）
        INPUT_LEN   序列长度           seq_len（具有多列seq_len时，依次使用seq_len1，seq_len2）
        OUTPUT      模型输出           pred（具有多列pred时，依次使用pred1， pred2）
        TARGET      真实目标           target（具有多列target时，依次使用target1，target2）
        LOSS        损失函数           loss （具有多列loss时，依次使用loss1，loss2）
        RAW_WORD    原文的词           raw_words  (具有多列raw_words时，依次使用raw_words1, raw_words2)
        RAW_CHAR    原文的字           raw_chars  (具有多列raw_chars时，依次使用raw_chars1, raw_chars2)

    """
    INPUT = 'words'
@@ -21,37 +33,49 @@ class Const:
    OUTPUT = 'pred'
    TARGET = 'target'
    LOSS = 'loss'

    RAW_WORD = 'raw_words'
    RAW_CHAR = 'raw_chars'
    
    @staticmethod
    def INPUTS(i):
        """得到第 i 个 ``INPUT`` 的命名"""
        i = int(i) + 1
        return Const.INPUT + str(i)

    
    @staticmethod
    def CHAR_INPUTS(i):
        """得到第 i 个 ``CHAR_INPUT`` 的命名"""
        i = int(i) + 1
        return Const.CHAR_INPUT + str(i)

    
    @staticmethod
    def RAW_WORDS(i):
        i = int(i) + 1
        return Const.RAW_WORD + str(i)
    
    @staticmethod
    def RAW_CHARS(i):
        i = int(i) + 1
        return Const.RAW_CHAR + str(i)
    
    @staticmethod
    def INPUT_LENS(i):
        """得到第 i 个 ``INPUT_LEN`` 的命名"""
        i = int(i) + 1
        return Const.INPUT_LEN + str(i)

    
    @staticmethod
    def OUTPUTS(i):
        """得到第 i 个 ``OUTPUT`` 的命名"""
        i = int(i) + 1
        return Const.OUTPUT + str(i)

    
    @staticmethod
    def TARGETS(i):
        """得到第 i 个 ``TARGET`` 的命名"""
        i = int(i) + 1
        return Const.TARGET + str(i)

    
    @staticmethod
    def LOSSES(i):
        """得到第 i 个 ``LOSS`` 的命名"""
--- a/fastNLP/core/dataset.py
+++ b/fastNLP/core/dataset.py
@@ -1,7 +1,7 @@
 """
 :class:`~fastNLP.core.dataset.DataSet` 是fastNLP中用于承载数据的容器。可以将DataSet看做是一个表格，
 每一行是一个sample (在fastNLP中被称为 :mod:`~.instance` )，
 每一列是一个feature (在fastNLP中称为 :mod:`.field` )。
 每一行是一个sample (在fastNLP中被称为 :mod:`~fastNLP.core.instance` )，
 每一列是一个feature (在fastNLP中称为 :mod:`~fastNLP.core.field` )。

 .. csv-table:: Following is a demo layout of DataSet
   :header: "sentence", "words", "seq_len"
@@ -13,57 +13,64 @@

 在fastNLP内部每一行是一个 :class:`~fastNLP.Instance` 对象； 每一列是一个 :class:`~fastNLP.FieldArray` 对象。

 1 DataSet的创建
    创建DataSet主要有以下的3种方式
 ----------------------------
 1.DataSet的创建
 ----------------------------

 1.1 传入dict
 创建DataSet主要有以下的3种方式

  Example::
 1.1 传入dict
 ----------------------------

    from fastNLP import DataSet
    data = {'sentence':["This is the first instance .", "Second instance .", "Third instance ."],
            'words': [['this', 'is', 'the', 'first', 'instance', '.'], ['Second', 'instance', '.'], ['Third', 'instance', '.'],
            'seq_len': [6, 3, 3]}
    dataset = DataSet(data)
    # 传入的dict的每个key的value应该为具有相同长度的list
    .. code-block::

 1.2 通过构建Instance
        from fastNLP import DataSet
        data = {'sentence':["This is the first instance .", "Second instance .", "Third instance ."],
                'words': [['this', 'is', 'the', 'first', 'instance', '.'], ['Second', 'instance', '.'], ['Third', 'instance', '.'],
                'seq_len': [6, 3, 3]}
        dataset = DataSet(data)
        # 传入的dict的每个key的value应该为具有相同长度的list

  Example::
 1.2 通过 Instance 构建
 ----------------------------

    from fastNLP import DataSet
    from fastNLP import Instance
    dataset = DataSet()
    instance = Instance(sentence="This is the first instance",
                        words=['this', 'is', 'the', 'first', 'instance', '.'],
                        seq_len=6)
    dataset.append(instance)
    # 可以继续append更多内容，但是append的instance应该和第一个instance拥有完全相同的field
    .. code-block::

 1.3 通过list(Instance)
        from fastNLP import DataSet
        from fastNLP import Instance
        dataset = DataSet()
        instance = Instance(sentence="This is the first instance",
                            words=['this', 'is', 'the', 'first', 'instance', '.'],
                            seq_len=6)
        dataset.append(instance)
        # 可以继续append更多内容，但是append的instance应该和第一个instance拥有完全相同的field

   Example::
 1.3 通过 List[Instance] 构建
 --------------------------------------

    from fastNLP import DataSet
    from fastNLP import Instance
    instances = []
    instances.append(Instance(sentence="This is the first instance",
                        words=['this', 'is', 'the', 'first', 'instance', '.'],
                        seq_len=6))
    instances.append(Instance(sentence="Second instance .",
                        words=['Second', 'instance', '.'],
                        seq_len=3))
    dataset = DataSet(instances)
    .. code-block::

 2 DataSet与预处理
    常见的预处理有如下几种
        from fastNLP import DataSet
        from fastNLP import Instance
        instances = []
        winstances.append(Instance(sentence="This is the first instance",
                            ords=['this', 'is', 'the', 'first', 'instance', '.'],
                            seq_len=6))
        instances.append(Instance(sentence="Second instance .",
                            words=['Second', 'instance', '.'],
                            seq_len=3))
        dataset = DataSet(instances)
        
 --------------------------------------
 2.DataSet与预处理
 --------------------------------------

 2.1 从某个文本文件读取内容 #
 常见的预处理有如下几种

    .. todo::
        引用DataLoader
 2.1 从某个文本文件读取内容
 --------------------------------------

    Example::
    .. code-block::

        from fastNLP import DataSet
        from fastNLP import Instance
@@ -78,9 +85,13 @@
                sent, label = line.strip().split('\t')
                dataset.append(Instance(sentence=sent, label=label))

    .. note::
        直接读取特定数据集的数据请参考  :doc:`/tutorials/tutorial_2_load_dataset`

 2.2 对DataSet中的内容处理
 --------------------------------------

    Example::
    .. code-block::

        from fastNLP import DataSet
        data = {'sentence':["This is the first instance .", "Second instance .", "Third instance ."]}
@@ -97,8 +108,9 @@
        dataset.apply(get_words, new_field_name='words')

 2.3 删除DataSet的内容
 --------------------------------------

    Example::
    .. code-block::

        from fastNLP import DataSet
        dataset = DataSet({'a': list(range(-5, 5))})
@@ -113,15 +125,17 @@


 2.4 遍历DataSet的内容
 --------------------------------------

    Example::
    .. code-block::

        for instance in dataset:
            # do something

 2.5 一些其它操作
 --------------------------------------

    Example::
    .. code-block::

        #  检查是否存在名为'a'的field
        dataset.has_field('a')  # 或 ('a' in dataset)
@@ -129,21 +143,25 @@
        dataset.rename_field('a', 'b')
        #  DataSet的长度
        len(dataset)
        
 --------------------------------------
 3.DataSet与自然语言处理(NLP)
 --------------------------------------

 3 DataSet与自然语言处理(NLP)
    在目前深度学习的模型中，大都依赖于随机梯度下降法(SGD)进行模型的优化。随机梯度下降需要将数据切分成一个一个的Batch，
    一个Batch进行一次前向计算(forward)与梯度后向传播(backward)。在自然语言处理的场景下，往往还需要对数据进行pad。这是
    由于句子的长度一般是不同的，但是一次Batch中的每个field都必须是一个tensor，所以需要将所有句子都补齐到相同的长度。
 在目前深度学习的模型中，大都依赖于随机梯度下降法(SGD)进行模型的优化。随机梯度下降需要将数据切分成一个个的 batch，
 一个batch进行一次前向计算(forward)与梯度后向传播(backward)。在自然语言处理的场景下，往往还需要对数据进行pad。这是
 由于句子的长度一般是不同的，但是一次batch中的每个field都必须是一个tensor，所以需要将所有句子都补齐到相同的长度。

 3.1 DataSet与Batch
 3.1 DataSet与DataSetIter
 --------------------------------------

    我们先看fastNLP中如何将数据分成一个一个的Batch的例子, 这里我们使用随机生成的数据来模拟一个二分类文本分类任务，
    我们先看fastNLP中如何将数据分成一个一个的batch的例子, 这里我们使用随机生成的数据来模拟一个二分类文本分类任务，
    words和characters是输入，labels是文本类别

    Example::
    .. code-block::

        from fastNLP import DataSet
        from fastNLP import Batch
        from fastNLP import DataSetIter
        from fastNLP import SequentialSampler
        from fastNLP import EngChar2DPadder

@@ -163,7 +181,7 @@
        d.set_target('label')
        d.set_input('words', 'chars')

        for batch_x, batch_y in Batch(d, sampler=SequentialSampler(), batch_size=2):
        for batch_x, batch_y in DataSetIter(d, sampler=SequentialSampler(), batch_size=2):
            print("batch_x:", batch_x)
            print("batch_y:", batch_y)
            break
@@ -182,23 +200,26 @@
            #      [ 0,  0,  0,  0,  0]]])}
            # {'label': tensor([0, 0])}

    其中 :class:`~fastNLP.Batch` 是用于从DataSet中按照batch_size为大小取出batch的迭代器，
    :class:`~fastNLP.SequentialSampler` 用于指示 Batch 以怎样的
    其中 :class:`~fastNLP.DataSetIter` 是用于从DataSet中按照batch_size为大小取出batch的迭代器，
    :class:`~fastNLP.SequentialSampler` 用于指示 :class:`~fastNLP.DataSetIter` 以怎样的
    顺序从DataSet中取出instance以组成一个batch，
    更详细的说明请参照 :class:`~fastNLP.Batch` 和 :class:`~fastNLP.SequentialSampler` 文档。
    更详细的说明请参照 :class:`~fastNLP.DataSetIter` 和 :class:`~fastNLP.SequentialSampler` 文档。

    通过DataSet.set_input('words', 'chars'), fastNLP将认为'words'和'chars'这两个field都是input，并将它们都放入迭代器
    生成的第一个dict中; DataSet.set_target('labels'), fastNLP将认为'labels'这个field是target，并将其放入到迭代器的第
    通过 ``DataSet.set_input('words', 'chars')`` , fastNLP将认为 `words` 和 `chars` 这两个field都是input，并将它们都放入迭代器
    生成的第一个dict中; ``DataSet.set_target('labels')`` , fastNLP将认为 `labels` 这个field是target，并将其放入到迭代器的第
    二个dict中。如上例中所打印结果。分为input和target的原因是由于它们在被 :class:`~fastNLP.Trainer` 所使用时会有所差异，
    详见  :class:`~fastNLP.Trainer`

    当把某个field设置为'target'或者'input'的时候(两者不是互斥的，可以同时设为input和target)，fastNLP不仅仅只是将其放
    置到不同的dict中，而还会对被设置为input或target的field进行类型检查。类型检查的目的是为了看能否把该field转为
    pytorch的torch.LongTensor或torch.FloatTensor类型(也可以在Batch中设置输出numpy类型，参考 :class:`~fastNLP.Batch` )，如上例所示，
    fastNLP已将words，chars和label转为了Tensor类型。如果field在每个instance都拥有相同的维度(不能超过两维)，且最内层
    的元素都为相同的type(int, float, np.int*, np.float*)，则fastNLP默认将对该field进行pad。也支持全为str的field作为
    target和input，这种情况下，fastNLP默认不进行pad。另外，当某个field已经被设置为了target或者input后，之后append的
    instance对应的field必须要和前面已有的内容一致，否则会报错。
    当把某个field设置为 `target` 或者 `input` 的时候(两者不是互斥的，可以同时设为两种)，fastNLP不仅仅只是将其放
    置到不同的dict中，而还会对被设置为 `input` 或 `target` 的 field 进行类型检查。类型检查的目的是为了看能否把该 field 转为
    pytorch的 :class:`torch.LongTensor` 或 :class:`torch.FloatTensor` 类型
    (也可以在 :class:`~fastNLP.DataSetIter` 中设置输出numpy类型，参考 :class:`~fastNLP.DataSetIter` )。
    
    如上例所示，fastNLP已将 `words` ，`chars` 和 `label` 转为了 :class:`Tensor` 类型。
    如果 field 在每个 `instance` 都拥有相同的维度(不能超过两维)，且最内层的元素都为相同的 type(int, float, np.int*, np.float*)，
    则fastNLP默认将对该 field 进行pad。也支持全为str的field作为target和input，这种情况下，fastNLP默认不进行pad。
    另外，当某个 field 已经被设置为了 target 或者 input 后，之后 `append` 的
    `instance` 对应的 field 必须要和前面已有的内容一致，否则会报错。

    可以查看field的dtype::
        
@@ -217,6 +238,7 @@
    错误::

        from fastNLP import DataSet
        
        d = DataSet({'data': [1, 'a']})
        d.set_input('data')
        >> RuntimeError: Mixed data types in Field data: [<class 'str'>, <class 'int'>]
@@ -231,6 +253,7 @@
    当某个field被设置为忽略type之后，fastNLP将不对其进行pad。

 3.2 DataSet与pad
 --------------------------------------

    在fastNLP里，pad是与一个field绑定的。即不同的field可以使用不同的pad方式，比如在英文任务中word需要的pad和
    character的pad方式往往是不同的。fastNLP是通过一个叫做 :class:`~fastNLP.Padder` 的子类来完成的。
@@ -240,7 +263,7 @@
    如果 :class:`~fastNLP.AutoPadder` 或 :class:`~fastNLP.EngChar2DPadder` 无法满足需求，
    也可以自己写一个 :class:`~fastNLP.Padder` 。

    Example::
    .. code-block::

        from fastNLP import DataSet
        from fastNLP import EngChar2DPadder
@@ -268,6 +291,7 @@ import _pickle as pickle
 import warnings

 import numpy as np
 from copy import deepcopy

 from .field import AutoPadder
 from .field import FieldArray
@@ -275,6 +299,8 @@ from .instance import Instance
 from .utils import _get_func_signature
 from .field import AppendToTargetOrInputException
 from .field import SetInputOrTargetException
 from .const import Const
 from ._logger import logger

 class DataSet(object):
    """
@@ -326,7 +352,11 @@ class DataSet(object):
                    self.idx])
                assert self.idx < len(self.dataset.field_arrays[item]), "index:{} out of range".format(self.idx)
                return self.dataset.field_arrays[item][self.idx]
            

            def items(self):
                ins = self.dataset[self.idx]
                return ins.items()

            def __repr__(self):
                return self.dataset[self.idx].__repr__()
        
@@ -405,7 +435,7 @@ class DataSet(object):
        """
        将一个instance对象append到DataSet后面。

        :param instance: :class:`~fastNLP.Instance` 类型。若DataSet不为空，则instance应该拥有和DataSet完全一样的field。
        :param ~fastNLP.Instance instance: 若DataSet不为空，则instance应该拥有和DataSet完全一样的field。

        """
        if len(self.field_arrays) == 0:
@@ -423,7 +453,7 @@ class DataSet(object):
                try:
                    self.field_arrays[name].append(field)
                except AppendToTargetOrInputException as e:
                    print(f"Cannot append to field:{name}.")
                    logger.error(f"Cannot append to field:{name}.")
                    raise e
    
    def add_fieldarray(self, field_name, fieldarray):
@@ -431,7 +461,7 @@ class DataSet(object):
        将fieldarray添加到DataSet中.

        :param str field_name: 新加入的field的名称
        :param fieldarray: :class:`~fastNLP.FieldArray` 类型。需要加入DataSet的field的内容
        :param ~fastNLP.core.FieldArray fieldarray: 需要加入DataSet的field的内容
        :return:
        """
        if not isinstance(fieldarray, FieldArray):
@@ -447,8 +477,7 @@ class DataSet(object):
        
        :param str field_name: 新增的field的名称
        :param list fields: 需要新增的field的内容
        :param None, padder: :class:`~fastNLP.Padder` 类型，
                    如果为None,则不进行pad，默认使用 :class:`~fastNLP.AutoPadder` 自动判断是否需要做pad。
        :param None,~fastNLP.Padder padder: 如果为None,则不进行pad，默认使用 :class:`~fastNLP.AutoPadder` 自动判断是否需要做pad。
        :param bool is_input: 新加入的field是否是input
        :param bool is_target: 新加入的field是否是target
        :param bool ignore_type: 是否忽略对新加入的field的类型检查
@@ -465,7 +494,7 @@ class DataSet(object):
        """
        删除第index个instance

        :param int index: 需要删除的instance的index，从0开始
        :param int index: 需要删除的instance的index，序号从0开始。
        """
        assert isinstance(index, int), "Only integer supported."
        if len(self) <= index:
@@ -475,6 +504,7 @@ class DataSet(object):
        else:
            for field in self.field_arrays.values():
                field.pop(index)
        return self
    
    def delete_field(self, field_name):
        """
@@ -483,7 +513,22 @@ class DataSet(object):
        :param str field_name: 需要删除的field的名称.
        """
        self.field_arrays.pop(field_name)
    
        return self

    def copy_field(self, field_name, new_field_name):
        """
        深度copy名为field_name的field到new_field_name

        :param str field_name: 需要copy的field。
        :param str new_field_name: copy生成的field名称
        :return: self
        """
        if not self.has_field(field_name):
            raise KeyError(f"Field:{field_name} not found in DataSet.")
        fieldarray = deepcopy(self.get_field(field_name))
        self.add_fieldarray(field_name=new_field_name, fieldarray=fieldarray)
        return self

    def has_field(self, field_name):
        """
        判断DataSet中是否有名为field_name这个field
@@ -510,7 +555,7 @@ class DataSet(object):
        """
        返回一个dict，key为field_name, value为对应的 :class:`~fastNLP.FieldArray`

        :return: dict: 返回如上所述的字典
        :return dict: 返回如上所述的字典
        """
        return self.field_arrays
    
@@ -518,7 +563,7 @@ class DataSet(object):
        """
        返回一个list，包含所有 field 的名字

        :return: list: 返回如上所述的列表
        :return list: 返回如上所述的列表
        """
        return sorted(self.field_arrays.keys())
    
@@ -544,7 +589,7 @@ class DataSet(object):
            raise KeyError("DataSet has no field named {}.".format(old_name))
        return self
    
    def set_target(self, *field_names, flag=True):
    def set_target(self, *field_names, flag=True, use_1st_ins_infer_dim_type=True):
        """
        将field_names的field设置为target

@@ -555,19 +600,23 @@ class DataSet(object):

        :param str field_names: field的名称
        :param bool flag: 将field_name的target状态设置为flag
        :param bool use_1st_ins_infer_dim_type: 如果为True，将不会check该列是否所有数据都是同样的维度，同样的类型。将直接使用第一
            行的数据进行类型和维度推断本列的数据的类型和维度。
        """
        assert isinstance(flag, bool), "Only bool type supported."
        for name in field_names:
            if name in self.field_arrays:
                try:
                    self.field_arrays[name]._use_1st_ins_infer_dim_type = bool(use_1st_ins_infer_dim_type)
                    self.field_arrays[name].is_target = flag
                except SetInputOrTargetException as e:
                    print(f"Cannot set field:{name} as target.")
                    logger.error(f"Cannot set field:{name} as target.")
                    raise e
            else:
                raise KeyError("{} is not a valid field name.".format(name))
        return self
    
    def set_input(self, *field_names, flag=True):
    def set_input(self, *field_names, flag=True, use_1st_ins_infer_dim_type=True):
        """
        将field_names的field设置为input::

@@ -576,16 +625,20 @@ class DataSet(object):

        :param str field_names: field的名称
        :param bool flag: 将field_name的input状态设置为flag
        :param bool use_1st_ins_infer_dim_type: 如果为True，将不会check该列是否所有数据都是同样的维度，同样的类型。将直接使用第一
            行的数据进行类型和维度推断本列的数据的类型和维度。
        """
        for name in field_names:
            if name in self.field_arrays:
                try:
                    self.field_arrays[name]._use_1st_ins_infer_dim_type = bool(use_1st_ins_infer_dim_type)
                    self.field_arrays[name].is_input = flag
                except SetInputOrTargetException as e:
                    print(f"Cannot set field:{name} as input, exception happens at the {e.index} value.")
                    logger.error(f"Cannot set field:{name} as input, exception happens at the {e.index} value.")
                    raise e
            else:
                raise KeyError("{} is not a valid field name.".format(name))
        return self
    
    def set_ignore_type(self, *field_names, flag=True):
        """
@@ -602,6 +655,7 @@ class DataSet(object):
                self.field_arrays[name].ignore_type = flag
            else:
                raise KeyError("{} is not a valid field name.".format(name))
        return self
    
    def set_padder(self, field_name, padder):
        """
@@ -612,11 +666,12 @@ class DataSet(object):
            dataset.set_padder('chars', padder)  # 则chars这个field会使用EngChar2DPadder进行pad操作

        :param str field_name: 设置field的padding方式为padder
        :param None, Padder padder: 设置为None即删除padder, 即对该field不进行pad操作。
        :param None,~fastNLP.Padder padder: 设置为None即删除padder, 即对该field不进行pad操作。
        """
        if field_name not in self.field_arrays:
            raise KeyError("There is no field named {}.".format(field_name))
        self.field_arrays[field_name].set_padder(padder)
        return self
    
    def set_pad_val(self, field_name, pad_val):
        """
@@ -628,6 +683,7 @@ class DataSet(object):
        if field_name not in self.field_arrays:
            raise KeyError("There is no field named {}.".format(field_name))
        self.field_arrays[field_name].set_pad_val(pad_val)
        return self
    
    def get_input_name(self):
        """
@@ -660,7 +716,7 @@ class DataSet(object):
            2. is_target: bool, 如果为True则将名为 `new_field_name` 的field设置为target

            3. ignore_type: bool, 如果为True则将名为 `new_field_name` 的field的ignore_type设置为true, 忽略其类型
        :return: list(Any), 里面的元素为func的返回值，所以list长度为DataSet的长度
        :return List[Any]:   里面的元素为func的返回值，所以list长度为DataSet的长度

        """
        assert len(self) != 0, "Null DataSet cannot use apply_field()."
@@ -673,7 +729,7 @@ class DataSet(object):
                results.append(func(ins[field_name]))
        except Exception as e:
            if idx != -1:
                print("Exception happens at the `{}`th instance.".format(idx))
                logger.error("Exception happens at the `{}`th(from 1) instance.".format(idx+1))
            raise e
        if not (new_field_name is None) and len(list(filter(lambda x: x is not None, results))) == 0:  # all None
            raise ValueError("{} always return None.".format(_get_func_signature(func=func)))
@@ -687,7 +743,7 @@ class DataSet(object):
        """
        将results作为加入到新的field中，field名称为new_field_name

        :param list(str) results: 一般是apply*()之后的结果
        :param List[str] results: 一般是apply*()之后的结果
        :param str new_field_name: 新加入的field的名称
        :param dict kwargs: 用户apply*()时传入的自定义参数
        :return:
@@ -730,7 +786,7 @@ class DataSet(object):

            3. ignore_type: bool, 如果为True则将 `new_field_name` 的field的ignore_type设置为true, 忽略其类型
            
        :return: list(Any), 里面的元素为func的返回值，所以list长度为DataSet的长度
        :return List[Any]: 里面的元素为func的返回值，所以list长度为DataSet的长度
        """
        assert len(self) != 0, "Null DataSet cannot use apply()."
        idx = -1
@@ -738,10 +794,11 @@ class DataSet(object):
            results = []
            for idx, ins in enumerate(self._inner_iter()):
                results.append(func(ins))
        except Exception as e:
        except BaseException as e:
            if idx != -1:
                print("Exception happens at the `{}`th instance.".format(idx))
                logger.error("Exception happens at the `{}`th instance.".format(idx))
            raise e

        # results = [func(ins) for ins in self._inner_iter()]
        if not (new_field_name is None) and len(list(filter(lambda x: x is not None, results))) == 0:  # all None
            raise ValueError("{} always return None.".format(_get_func_signature(func=func)))
@@ -751,7 +808,7 @@ class DataSet(object):
        
        return results

    def add_seq_len(self, field_name:str, new_field_name='seq_len'):
    def add_seq_len(self, field_name:str, new_field_name=Const.INPUT_LEN):
        """
        将使用len()直接对field_name中每个元素作用，将其结果作为seqence length, 并放入seq_len这个field。

@@ -795,7 +852,7 @@ class DataSet(object):

        :param float ratio: 0<ratio<1, 返回的第一个DataSet拥有 `(1-ratio)` 这么多数据，第二个DataSet拥有`ratio`这么多数据
        :param bool shuffle: 在split前是否shuffle一下
        :return: [DataSet, DataSet]
        :return: [ :class:`~fastNLP.读取后的DataSet` , :class:`~fastNLP.读取后的DataSet` ]
        """
        assert isinstance(ratio, float)
        assert 0 < ratio < 1
@@ -817,48 +874,6 @@ class DataSet(object):
        
        return train_set, dev_set
    
    @classmethod
    def read_csv(cls, csv_path, headers=None, sep=",", dropna=True):
        """
        .. warning::
            此方法会在下个版本移除，请使用 :class:`fastNLP.io.CSVLoader`
        
        从csv_path路径下以csv的格式读取数据。

        :param str csv_path: 从哪里读取csv文件
        :param list[str] headers: 如果为None，则使用csv文件的第一行作为header; 如果传入list(str), 则元素的个数必须
            与csv文件中每行的元素个数相同。
        :param str sep: 分割符
        :param bool dropna: 是否忽略与header数量不一致行。
        :return: 一个 :class:`~fastNLP.DataSet` 类型的对象
        """
        warnings.warn('DataSet.read_csv is deprecated, use CSVLoader instead',
                      category=DeprecationWarning)
        with open(csv_path, "r", encoding='utf-8') as f:
            start_idx = 0
            if headers is None:
                headers = f.readline().rstrip('\r\n')
                headers = headers.split(sep)
                start_idx += 1
            else:
                assert isinstance(headers, (list, tuple)), "headers should be list or tuple, not {}.".format(
                    type(headers))
            _dict = {}
            for col in headers:
                _dict[col] = []
            for line_idx, line in enumerate(f, start_idx):
                contents = line.rstrip('\r\n').split(sep)
                if len(contents) != len(headers):
                    if dropna:
                        continue
                    else:
                        # TODO change error type
                        raise ValueError("Line {} has {} parts, while header has {} parts." \
                                         .format(line_idx, len(contents), len(headers)))
                for header, content in zip(headers, contents):
                    _dict[header].append(content)
        return cls(_dict)
    
    def save(self, path):
        """
        保存DataSet.
@@ -870,11 +885,11 @@ class DataSet(object):
    
    @staticmethod
    def load(path):
        """
        r"""
        从保存的DataSet pickle文件的路径中读取DataSet

        :param str path: 从哪里读取DataSet
        :return: 一个 :class:`~fastNLP.DataSet` 类型的对象
        :return: 读取后的 :class:`~fastNLP.读取后的DataSet`。
        """
        with open(path, 'rb') as f:
            d = pickle.load(f)
--- a/fastNLP/core/dist_trainer.py
+++ b/fastNLP/core/dist_trainer.py
@@ -0,0 +1,356 @@
 """undocumented
 正在开发中的分布式训练代码
 """
 import logging
 import os
 import time
 from datetime import datetime

 import torch
 import torch.cuda
 import torch.distributed as dist
 import torch.optim
 from pkg_resources import parse_version
 from torch.nn.parallel import DistributedDataParallel as DDP
 from torch.utils.data.distributed import DistributedSampler
 from tqdm import tqdm

 from ._logger import logger
 from .batch import DataSetIter, BatchIter
 from .callback import DistCallbackManager, CallbackException, TesterCallback
 from .dataset import DataSet
 from .losses import _prepare_losser
 from .optimizer import Optimizer
 from .utils import _build_args
 from .utils import _get_func_signature
 from .utils import _move_dict_value_to_device

 __all__ = [
    'get_local_rank',
    'DistTrainer',
 ]


 def get_local_rank():
    if 'LOCAL_RANK' in os.environ:
        return int(os.environ['LOCAL_RANK'])
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('--local_rank', type=int)
    args, _ = parser.parse_known_args()
    if 'local_rank' in args and args.local_rank:
        os.environ['LOCAL_RANK'] = str(args.local_rank) # for multiple calls for this function
        return args.local_rank
    raise RuntimeError('Please use "python -m torch.distributed.launch --nproc_per_node=N train_script.py')


 class DistTrainer():
    """
    Distributed Trainer that support distributed and mixed precision training
    """
    def __init__(self, train_data, model, optimizer=None, loss=None,
                 callbacks_all=None, callbacks_master=None,
                 batch_size_per_gpu=8, n_epochs=1,
                 num_workers=1, drop_last=False,
                 dev_data=None, metrics=None, metric_key=None,
                 update_every=1, print_every=10, validate_every=-1,
                 save_every=-1, save_path=None, device='auto',
                 fp16='', backend=None, init_method=None):

        assert device in ['auto', 'cuda', 'cpu'], "Please set correct device in [auto', 'cuda', 'cpu']"
        if device == 'auto':
            device = 'cuda' if torch.cuda.is_available() else 'cpu'
        if backend is None:
            backend = 'nccl' if device == 'cuda' else 'gloo'

        # init distributed
        if device == 'cuda':
            torch.cuda.set_device(get_local_rank())
            self.device = torch.device("cuda", get_local_rank())
        else:
            self.device = torch.device(device)

        dist.init_process_group(backend=backend, init_method=init_method)
        self.world_size = dist.get_world_size()
        self.rank = dist.get_rank() # unique id for each process

        self.model = model
        self.train_data = train_data
        self.batch_size_per_gpu = int(batch_size_per_gpu)
        self.n_epochs = int(n_epochs)
        self.num_data_workers = int(num_workers)
        self.drop_last = drop_last
        self.update_every = int(update_every)
        self.print_every = int(print_every)
        self.validate_every = int(validate_every)
        self.save_every = int(save_every)
        self.save_path = save_path
        self.losser = _prepare_losser(loss)
        self.fp16 = fp16
        self.init_method = init_method
        self.backend = backend
        self.local_rank = get_local_rank()
        self._forward_func = model.forward
        self.callback_manager = DistCallbackManager(
            env={"trainer": self}, callbacks_all=callbacks_all,
            callbacks_master=callbacks_master)
        self.metric_key = metric_key

        model.to(self.device)
        optimizer = self._get_optimizer(optimizer)

        # init fp16, must before DataParallel init
        if len(self.fp16):
            assert isinstance(self.fp16, str), "Please set Apex AMP optimization level selected in ['O0', 'O1', 'O2', 'O3']"
            try:
                from apex import amp
            except ImportError:
                raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
            assert torch.backends.cudnn.enabled, "Amp requires cudnn backend to be enabled."
            assert device == 'cuda', "Amp requires cuda device"
            model, optimizer = amp.initialize(model, optimizer, opt_level=self.fp16)

        # init DataParallel
        if parse_version(torch.__version__)>=parse_version('1.1'):
            self.model = DDP(model, device_ids=[self.local_rank],
                             output_device=self.local_rank, find_unused_parameters=True)
        else:
            self.model = DDP(model, device_ids=[self.local_rank],
                             output_device=self.local_rank)

        self.optimizer = optimizer
        self.sampler = DistributedSampler(self.train_data)
        self.data_iterator = self._get_data_iter(self.train_data)
        self.n_steps = self._get_n_steps()

        # for evaluation, only run eval on master proc
        if dev_data and metrics:
            cb = TesterCallback(
                dev_data, model, metrics,
                batch_size=batch_size_per_gpu, num_workers=num_workers)
            self.callback_manager.add_callback([cb], master=True)

        # Setup logging
        dist.barrier()
        self.start_time = datetime.now().strftime('%m_%d_%Y-%H_%M')
        if self.save_path:
            self.cp_save_path = os.path.join(self.save_path, 'checkpoints', self.start_time)
        else:
            self.cp_save_path = None

        # use INFO in the master, WARN for others
        logger.setLevel(logging.INFO  if self.is_master else logging.WARNING)
        self.logger = logger
        self.logger.info("Setup Distributed Trainer")
        self.logger.warning("Process pid: {}, rank: {}, local rank: {}, device: {}, fp16: {}".format(
                        os.getpid(), self.rank, self.local_rank, self.device, self.fp16 if self.fp16 else False))
        self.logger.info("Num of processes: {}".format(self.world_size))
        self.logger.info("Use device: {}".format(device))
        self.logger.info("Training with fp16: {}, optimization level: {}".format(
                        len(self.fp16) > 0, self.fp16 if self.fp16 else None))

    def _get_n_steps(self):
        batch_size = self.world_size * self.batch_size_per_gpu
        return (len(self.train_data) // batch_size + int(
            len(self.train_data) % batch_size != 0)) * int(self.drop_last == 0) * self.n_epochs

    def _get_data_iter(self, dataset):
        if isinstance(dataset, DataSet):
            return DataSetIter(
                dataset=dataset, batch_size=self.batch_size_per_gpu,
                num_workers=self.num_data_workers, sampler=self.sampler,
                drop_last=self.drop_last
            )
        elif isinstance(dataset, BatchIter):
            return dataset
        else:
            raise TypeError("train_data type {} not support".format(type(dataset)))

    def _get_optimizer(self, optimizer):
        if isinstance(optimizer, torch.optim.Optimizer):
            return optimizer
        elif isinstance(optimizer, Optimizer):
            return optimizer.construct_from_pytorch(self.model.parameters())
        elif optimizer is None:
            return torch.optim.Adam(self.model.parameters(), lr=4e-3)
        else:
            raise TypeError("optimizer can only be torch.optim.Optimizer type, not {}.".format(type(optimizer)))

    @property
    def is_master(self):
        return self.rank == 0

    def train(self, on_exception='auto'):
        try:
            self.logger.info("###### Training epochs started ######")
            self.logger.info('Total epochs: %d'% self.n_epochs)
            self.logger.info('Total steps: %d'% self.n_steps)
            self.logger.info('Num instances per GPU %d'% self.batch_size_per_gpu)
            self.logger.info('Total batch_size: %d'% self.batch_size_per_gpu * dist.get_world_size())
            self.logger.info('Total num of samples: %d'% len(self.train_data))
            self.logger.info("Num of callbacks for all workers: {}".format(
                                len(self.callback_manager.callbacks_all)))
            self.logger.info("Num of callbacks for master workers: {}".format(
                                len(self.callback_manager.callbacks_master)))
            self.logger.info("Callbacks for all workers: {}".format(
                    [repr(cb) for cb in self.callback_manager.callbacks_all]))
            self.logger.info("Callbacks for master workers: {}".format(
                    [repr(cb) for cb in self.callback_manager.callbacks_master]))

            start_time = time.time()
            results = {}
            if self.n_epochs <= 0:
                self.logger.info("Training epoch is {}, nothing was done.".format(self.n_epochs))
                results['seconds'] = 0.
                return results

            try:
                self.callback_manager.on_train_begin()
                self._train()
                self.callback_manager.on_train_end()

            except BaseException as e:
                self.callback_manager.on_exception(e)
                if on_exception == 'auto':
                    if not isinstance(e, (CallbackException, KeyboardInterrupt)):
                        raise e
                    else:
                        self.logger.info('Catch {}, ignored.'.format(e.__class__.__name__))
                elif on_exception == 'raise':
                    raise e

            results['seconds'] = round(time.time() - start_time, 2)
            self.logger.info("###### Train finished ######")
            self.logger.info('Total train time: {} seconds.'. format(results['seconds']))
            return results
        finally:
            self.close()

    def _train(self):
        if self.fp16:
            # skip check, done in __init__()
            from apex import amp
        self.step = 0
        self.epoch = 0
        self.pbar = tqdm(total=self.n_steps, postfix='loss:{0:<6.5f}',
                        leave=False, dynamic_ncols=True, disable=not self.is_master)
        pbar = self.pbar
        avg_loss = 0
        data_iterator = self.data_iterator
        self.model.zero_grad()
        for epoch in range(1, self.n_epochs + 1):
            self.epoch = epoch
            pbar.set_description_str(desc="Epoch {}/{}".format(epoch, self.n_epochs))
            # early stopping
            self.callback_manager.on_epoch_begin()
            for batch_x, batch_y in data_iterator:
                self.model.train()
                self.step += 1
                _move_dict_value_to_device(batch_x, batch_y, device=self.device)
                indices = data_iterator.get_batch_indices()
                # negative sampling; replace unknown; re-weight batch_y
                self.callback_manager.on_batch_begin(batch_x, batch_y, indices)
                prediction = self._data_forward(self.model, batch_x)

                # edit prediction
                self.callback_manager.on_loss_begin(batch_y, prediction)
                loss = self._compute_loss(prediction, batch_y)
                avg_loss += loss.item()

                # Is loss NaN or inf? requires_grad = False
                self.callback_manager.on_backward_begin(loss)

                if self.fp16:
                    with amp.scale_loss(loss, self.optimizer) as scale_loss:
                        scale_loss.backward()
                else:
                    loss.backward()

                self.callback_manager.on_backward_end()

                self._update()
                self.callback_manager.on_step_end()

                if self.step % self.print_every == 0:
                    avg_loss = float(avg_loss) / self.print_every
                    print_output = "loss:{:<6.5f}".format(avg_loss)
                    pbar.update(self.print_every)
                    pbar.set_postfix_str(print_output)
                    avg_loss = 0

                self.callback_manager.on_batch_end()

                if (self.validate_every > 0 and self.step % self.validate_every == 0):
                    self._do_validation()

                if self.cp_save_path and \
                        self.save_every > 0 and \
                        self.step % self.save_every == 0:
                    self.save_check_point()

            # ================= mini-batch end ==================== #
            if self.validate_every < 0:
                self._do_validation()

        if self.save_every < 0 and self.cp_save_path:
            self.save_check_point()
            # lr decay; early stopping
            self.callback_manager.on_epoch_end()
        # =============== epochs end =================== #
        pbar.close()
        self.pbar = None
    # ============ tqdm end ============== #

    def _update(self):
        """Perform weight update on a model.

        """
        if self.step % self.update_every == 0:
            self.optimizer.step()
            self.model.zero_grad()

    def _data_forward(self, network, x):
        x = _build_args(self._forward_func, **x)
        y = network(**x)
        if not isinstance(y, dict):
            raise TypeError(
                f"The return value of {_get_func_signature(self._forward_func)} should be dict, got {type(y)}.")
        return y

    def _compute_loss(self, predict, truth):
        """Compute loss given prediction and ground truth.

        :param predict: prediction dict, produced by model.forward
        :param truth: ground truth dict, produced by batch_y
        :return: a scalar
        """
        loss = self.losser(predict, truth)
        if self.update_every > 1:
            loss = loss / self.update_every
        return loss.mean()

    def save_check_point(self, only_params=False):
        # only master save models
        if self.is_master:
            os.makedirs(self.cp_save_path, exist_ok=True)
            path = os.path.join(self.cp_save_path, 'checkpoint-{}.bin'.format(self.step))
            self.logger.info("Save checkpoint to {}".format(path))
            model_to_save = self.model.module
            if only_params:
                model_to_save = model_to_save.state_dict()
            torch.save(model_to_save, path)

    def _do_validation(self):
        self.callback_manager.on_valid_begin()
        eval_res = self.callback_manager.on_validation()
        eval_res = list(filter(lambda x: x is not None, eval_res))
        if len(eval_res):
            eval_res, is_better = list(zip(*eval_res))
        else:
            eval_res, is_better = None, None
        self.callback_manager.on_valid_end(
            eval_res, self.metric_key, self.optimizer, is_better)
        dist.barrier()

    def close(self):
        dist.destroy_process_group()
--- a/fastNLP/core/field.py
+++ b/fastNLP/core/field.py
@@ -1,36 +1,53 @@
 """
 .. todo::
    doc
 """

 __all__ = [
    "Padder",
    "AutoPadder",
    "EngChar2DPadder",
 ]

 from numbers import Number
 import torch
 import numpy as np
 from typing import Any
 from abc import abstractmethod
 from copy import deepcopy
 from collections import Counter
 from copy import deepcopy
 from numbers import Number
 from typing import Any

 import numpy as np
 import torch

 from ._logger import logger
 from .utils import _is_iterable


 class SetInputOrTargetException(Exception):
    def __init__(self, msg, index=None, field_name=None):
        super().__init__(msg)
        self.msg = msg
        self.index = index  # 标示在哪个数据遭遇到问题了
        self.field_name = field_name # 标示当前field的名称
        self.field_name = field_name  # 标示当前field的名称


 class AppendToTargetOrInputException(Exception):
    def __init__(self, msg, index=None, field_name=None):
        super().__init__(msg)
        self.msg = msg
        self.index = index  # 标示在哪个数据遭遇到问题了
        self.field_name = field_name # 标示当前field的名称
        self.field_name = field_name  # 标示当前field的名称


 class FieldArray:
    def __init__(self, name, content, is_target=False, is_input=False, padder=None, ignore_type=False):
        if len(content)==0:
    def __init__(self, name, content, is_target=False, is_input=False, padder=None, ignore_type=False,
                 use_1st_ins_infer_dim_type=True):
        if len(content) == 0:
            raise RuntimeError("Empty fieldarray is not allowed.")
        _content = content
        try:
            _content = list(_content)
        except BaseException as e:
            print(f"Cannot convert content(of type:{type(content)}) into list.")
            logger.error(f"Cannot convert content(of type:{type(content)}) into list.")
            raise e
        self.name = name
        self.content = _content
@@ -38,36 +55,37 @@ class FieldArray:
        #  根据input的情况设置input，target等
        self._cell_ndim = None  # 多少维度
        self.dtype = None  # 最内层的element都是什么类型的
        self._use_1st_ins_infer_dim_type = bool(use_1st_ins_infer_dim_type)
        self._is_input = False
        self._is_target = False

        
        if is_input:
            self.is_input = is_input
        if is_target:
            self.is_target = is_target

        
        if padder is None:
            padder = AutoPadder(pad_val=0)
        else:
            assert isinstance(padder, Padder), "padder must be of type fastNLP.Padder."
            padder = deepcopy(padder)
        self.set_padder(padder)

    
    @property
    def ignore_type(self):
        return self._ignore_type

    
    @ignore_type.setter
    def ignore_type(self, value):
        if value:
            self._cell_ndim = None
            self.dtype = None
        self._ignore_type = value

    
    @property
    def is_input(self):
        return self._is_input

    
    @is_input.setter
    def is_input(self, value):
        """
@@ -77,16 +95,16 @@ class FieldArray:
        if value is True and \
                self._is_target is False and \
                self._ignore_type is False:
            self._check_dtype_and_ndim()
            self._check_dtype_and_ndim(only_check_1st_ins_dim_type=self._use_1st_ins_infer_dim_type)
        if value is False and self._is_target is False:
            self.dtype = None
            self._cell_ndim = None
        self._is_input = value

    
    @property
    def is_target(self):
        return self._is_target

    
    @is_target.setter
    def is_target(self, value):
        """
@@ -95,70 +113,82 @@ class FieldArray:
        if value is True and \
                self._is_input is False and \
                self._ignore_type is False:
            self._check_dtype_and_ndim()
            self._check_dtype_and_ndim(only_check_1st_ins_dim_type=self._use_1st_ins_infer_dim_type)
        if value is False and self._is_input is False:
            self.dtype = None
            self._cell_ndim = None
        self._is_target = value

    def _check_dtype_and_ndim(self):
    
    def _check_dtype_and_ndim(self, only_check_1st_ins_dim_type=True):
        """
        检查当前content所有的element是否是同一个类型，且是否每个元素具有相同的维度。通过的话，设置_cell_ndim与_ele_type属性；没有
            通过将直接报错.

        :param bool only_check_1st_ins_dim_type: 是否只检查第一个元素的type和dim
        :return:
        """
        cell_0 = self.content[0]
        index = 0
        try:
            type_0, dim_0 = _get_ele_type_and_dim(cell_0)
            for cell in self.content[1:]:
                index += 1
                type_i, dim_i = _get_ele_type_and_dim(cell)
                if type_i!=type_0:
                    raise SetInputOrTargetException("Type:{} in index {} is different from the first element with type:{}."
                                                    ".".format(type_i, index, type_0))
                if dim_0!=dim_i:
                    raise SetInputOrTargetException("Dimension:{} in index {} is different from the first element with "
                                                    "dimension:{}.".format(dim_i, index, dim_0))
            if not only_check_1st_ins_dim_type:
                for cell in self.content[1:]:
                    index += 1
                    type_i, dim_i = _get_ele_type_and_dim(cell)
                    if type_i != type_0:
                        raise SetInputOrTargetException(
                            "Type:{} in index {} is different from the first element with type:{}."
                            ".".format(type_i, index, type_0))
                    if dim_0 != dim_i:
                        raise SetInputOrTargetException(
                            "Dimension:{} in index {} is different from the first element with "
                            "dimension:{}.".format(dim_i, index, dim_0))
            self._cell_ndim = dim_0
            self.dtype = type_0
        except SetInputOrTargetException as e:
            e.index = index
            raise e

    def append(self, val:Any):
    
    def append(self, val: Any):
        """
        :param val: 把该val append到fieldarray。
        :return:
        """
        if (self._is_target or self._is_input) and self._ignore_type is False:
        if (self._is_target or self._is_input) and self._ignore_type is False and not self._use_1st_ins_infer_dim_type:
            type_, dim_ = _get_ele_type_and_dim(val)
            if self.dtype!=type_:
            if self.dtype != type_:
                raise AppendToTargetOrInputException(f"Value(type:{type_}) are of different types with "
                                                     f"previous values(type:{self.dtype}).")
            if self._cell_ndim!=dim_:
            if self._cell_ndim != dim_:
                raise AppendToTargetOrInputException(f"Value(dim:{dim_}) are of different dimensions with "
                                                     f"previous values(dim:{self._cell_ndim}).")
            self.content.append(val)
        else:
            self.content.append(val)

    
    def pop(self, index):
        """
        删除该field中index处的元素
        :param int index: 从0开始的数据下标。
        :return:
        """
        self.content.pop(index)
    
    def __getitem__(self, indices):
        return self.get(indices, pad=False)

    
    def __setitem__(self, idx, val):
        assert isinstance(idx, int)
        if (self._is_target or self._is_input) and self.ignore_type is False:  # 需要检测类型
            type_, dim_ = _get_ele_type_and_dim(val)
            if self.dtype!=type_:
            if self.dtype != type_:
                raise RuntimeError(f"Value(type:{type_}) are of different types with "
                                                     f"other values(type:{self.dtype}).")
            if self._cell_ndim!=dim_:
                                   f"other values(type:{self.dtype}).")
            if self._cell_ndim != dim_:
                raise RuntimeError(f"Value(dim:{dim_}) are of different dimensions with "
                                                     f"previous values(dim:{self._cell_ndim}).")
                                   f"previous values(dim:{self._cell_ndim}).")
        self.content[idx] = val

    
    def get(self, indices, pad=True):
        """
        根据给定的indices返回内容
@@ -171,16 +201,16 @@ class FieldArray:
            return self.content[indices]
        if self.is_input is False and self.is_target is False:
            raise RuntimeError("Please specify either is_input or is_target to True for {}".format(self.name))

        
        contents = [self.content[i] for i in indices]
        if self.padder is None or pad is False:
            return np.array(contents)
        else:
            return self.pad(contents)

    
    def pad(self, contents):
        return self.padder(contents, field_name=self.name, field_ele_dtype=self.dtype, dim=self._cell_ndim)

    
    def set_padder(self, padder):
        """
        设置padder，在这个field进行pad的时候用这个padder进行pad，如果为None则不进行pad。
@@ -192,7 +222,7 @@ class FieldArray:
            self.padder = deepcopy(padder)
        else:
            self.padder = None

    
    def set_pad_val(self, pad_val):
        """
        修改padder的pad_val.
@@ -202,7 +232,7 @@ class FieldArray:
        if self.padder is not None:
            self.padder.set_pad_val(pad_val)
        return self

    
    def __len__(self):
        """
        Returns the size of FieldArray.
@@ -210,7 +240,7 @@ class FieldArray:
        :return int length:
        """
        return len(self.content)

    
    def to(self, other):
        """
        将other的属性复制给本FieldArray(other必须为FieldArray类型).
@@ -220,15 +250,15 @@ class FieldArray:
        :return: :class:`~fastNLP.FieldArray`
        """
        assert isinstance(other, FieldArray), "Only supports fastNLP.FieldArray type, not {}.".format(type(other))

        
        self.ignore_type = other.ignore_type
        self.is_input = other.is_input
        self.is_target = other.is_target
        self.padder = other.padder

        
        return self

    def split(self, sep:str=None, inplace:bool=True):
    
    def split(self, sep: str = None, inplace: bool = True):
        """
        依次对自身的元素使用.split()方法，应该只有当本field的元素为str时，该方法才有用。将返回值

@@ -241,11 +271,11 @@ class FieldArray:
            try:
                new_contents.append(cell.split(sep))
            except Exception as e:
                print(f"Exception happens when process value in index {index}.")
                logger.error(f"Exception happens when process value in index {index}.")
                raise e
        return self._after_process(new_contents, inplace=inplace)

    def int(self, inplace:bool=True):
    
    def int(self, inplace: bool = True):
        """
        将本field中的值调用int(cell). 支持field中内容为以下两种情况(1)['1', '2', ...](即field中每个值为str的)，
            (2) [['1', '2', ..], ['3', ..], ...](即field中每个值为一个list，list中的值会被依次转换。)
@@ -261,10 +291,10 @@ class FieldArray:
                else:
                    new_contents.append(int(cell))
            except Exception as e:
                print(f"Exception happens when process value in index {index}.")
                print(e)
                logger.error(f"Exception happens when process value in index {index}.")
                raise e
        return self._after_process(new_contents, inplace=inplace)

    
    def float(self, inplace=True):
        """
        将本field中的值调用float(cell). 支持field中内容为以下两种情况(1)['1', '2', ...](即field中每个值为str的)，
@@ -281,10 +311,10 @@ class FieldArray:
                else:
                    new_contents.append(float(cell))
            except Exception as e:
                print(f"Exception happens when process value in index {index}.")
                logger.error(f"Exception happens when process value in index {index}.")
                raise e
        return self._after_process(new_contents, inplace=inplace)

    
    def bool(self, inplace=True):
        """
        将本field中的值调用bool(cell). 支持field中内容为以下两种情况(1)['1', '2', ...](即field中每个值为str的)，
@@ -301,11 +331,11 @@ class FieldArray:
                else:
                    new_contents.append(bool(cell))
            except Exception as e:
                print(f"Exception happens when process value in index {index}.")
                logger.error(f"Exception happens when process value in index {index}.")
                raise e

        
        return self._after_process(new_contents, inplace=inplace)

    
    def lower(self, inplace=True):
        """
        将本field中的值调用cell.lower(). 支持field中内容为以下两种情况(1)['1', '2', ...](即field中每个值为str的)，
@@ -322,10 +352,10 @@ class FieldArray:
                else:
                    new_contents.append(cell.lower())
            except Exception as e:
                print(f"Exception happens when process value in index {index}.")
                logger.error(f"Exception happens when process value in index {index}.")
                raise e
        return self._after_process(new_contents, inplace=inplace)

    
    def upper(self, inplace=True):
        """
        将本field中的值调用cell.lower(). 支持field中内容为以下两种情况(1)['1', '2', ...](即field中每个值为str的)，
@@ -342,10 +372,10 @@ class FieldArray:
                else:
                    new_contents.append(cell.upper())
            except Exception as e:
                print(f"Exception happens when process value in index {index}.")
                logger.error(f"Exception happens when process value in index {index}.")
                raise e
        return self._after_process(new_contents, inplace=inplace)

    
    def value_count(self):
        """
        返回该field下不同value的数量。多用于统计label数量
@@ -353,17 +383,18 @@ class FieldArray:
        :return: Counter, key是label，value是出现次数
        """
        count = Counter()

        
        def cum(cell):
            if _is_iterable(cell) and not isinstance(cell, str):
                for cell_ in cell:
                    cum(cell_)
            else:
                count[cell] += 1
        
        for cell in self.content:
            cum(cell)
        return count

    
    def _after_process(self, new_contents, inplace):
        """
        当调用处理函数之后，决定是否要替换field。
@@ -378,14 +409,14 @@ class FieldArray:
                self.is_input = self.is_input
                self.is_target = self.is_input
            except SetInputOrTargetException as e:
                print("The newly generated field cannot be set as input or target.")
                logger.error("The newly generated field cannot be set as input or target.")
                raise e
            return self
        else:
            return new_contents


 def _get_ele_type_and_dim(cell:Any, dim=0):
 def _get_ele_type_and_dim(cell: Any, dim=0):
    """
    识别cell的类别与dimension的数量

@@ -401,13 +432,13 @@ def _get_ele_type_and_dim(cell:Any, dim=0):
    elif isinstance(cell, list):
        dim += 1
        res = [_get_ele_type_and_dim(cell_i, dim) for cell_i in cell]
        types = set([i for i,j in res])
        dims = set([j for i,j in res])
        if len(types)>1:
        types = set([i for i, j in res])
        dims = set([j for i, j in res])
        if len(types) > 1:
            raise SetInputOrTargetException("Mixed types detected: {}.".format(list(types)))
        elif len(types)==0:
        elif len(types) == 0:
            raise SetInputOrTargetException("Empty value encountered.")
        if len(dims)>1:
        if len(dims) > 1:
            raise SetInputOrTargetException("Mixed dimension detected: {}.".format(list(dims)))
        return types.pop(), dims.pop()
    elif isinstance(cell, torch.Tensor):
@@ -418,28 +449,19 @@ def _get_ele_type_and_dim(cell:Any, dim=0):
        # 否则需要继续往下iterate
        dim += 1
        res = [_get_ele_type_and_dim(cell_i, dim) for cell_i in cell]
        types = set([i for i,j in res])
        dims = set([j for i,j in res])
        if len(types)>1:
        types = set([i for i, j in res])
        dims = set([j for i, j in res])
        if len(types) > 1:
            raise SetInputOrTargetException("Mixed types detected: {}.".format(list(types)))
        elif len(types)==0:
        elif len(types) == 0:
            raise SetInputOrTargetException("Empty value encountered.")
        if len(dims)>1:
        if len(dims) > 1:
            raise SetInputOrTargetException("Mixed dimension detected: {}.".format(list(dims)))
        return types.pop(), dims.pop()
    else: # 包含tuple, set, dict以及其它的类型
    else:  # 包含tuple, set, dict以及其它的类型
        raise SetInputOrTargetException(f"Cannot process type:{type(cell)}.")


 def _is_iterable(value):
    # 检查是否是iterable的, duck typing
    try:
        iter(value)
        return True
    except BaseException as e:
        return False


 class Padder:
    """
    别名：:class:`fastNLP.Padder` :class:`fastNLP.core.field.Padder`
@@ -448,28 +470,29 @@ class Padder:
    用于对batch进行padding操作。传入的element是inplace的，即直接修改element可能导致数据变化，建议inplace修改之前deepcopy一份。

    .. py:function:: __call__(self, contents, field_name, field_ele_dtype):
        
        传入的是List内容。假设有以下的DataSet。

        :param list(Any) contents: 传入的element是inplace的，即直接修改element可能导致数据变化，建议inplace修改之前
        :param List[Any] contents: 传入的element是inplace的，即直接修改element可能导致数据变化，建议inplace修改之前
            deepcopy一份。
        :param str, field_name: field的名称。
        :param np.int64,np.float64,np.str,None, field_ele_dtype: 该field的内层元素的类型。如果该field的ignore_type为True，该这个值为None。
        :return: np.array([padded_element])

    """

    
    def __init__(self, pad_val=0, **kwargs):
        self.pad_val = pad_val

    
    def set_pad_val(self, pad_val):
        self.pad_val = pad_val

    
    @abstractmethod
    def __call__(self, contents, field_name, field_ele_dtype, dim:int):
    def __call__(self, contents, field_name, field_ele_dtype, dim: int):
        """
        传入的是List内容。假设有以下的DataSet。

        :param list(Any) contents: 传入的element是inplace的，即直接修改element可能导致数据变化，建议inplace修改之前
        :param List[Any] contents: 传入的element是inplace的，即直接修改element可能导致数据变化，建议inplace修改之前
            deepcopy一份。
        :param str, field_name: field的名称。
        :param np.int64,np.float64,np.str,None, field_ele_dtype: 该field的内层元素的类型。如果该field的ignore_type为True，
@@ -532,23 +555,24 @@ class AutoPadder(Padder):

    3 其它情况不进行处理，返回一个np.array类型。
    """
    
    def __init__(self, pad_val=0):
        super().__init__(pad_val=pad_val)

    
    def __call__(self, contents, field_name, field_ele_dtype, dim):
        if field_ele_dtype:
            if dim>3:
            if dim > 3:
                return np.array(contents)
            if isinstance(field_ele_dtype, type) and \
                    (issubclass(field_ele_dtype, np.number) or issubclass(field_ele_dtype, Number)):
                if dim==0:
                if dim == 0:
                    array = np.array(contents, dtype=field_ele_dtype)
                elif dim==1:
                elif dim == 1:
                    max_len = max(map(len, contents))
                    array = np.full((len(contents), max_len), self.pad_val, dtype=field_ele_dtype)
                    for i, content_i in enumerate(contents):
                        array[i, :len(content_i)] = content_i
                elif dim==2:
                elif dim == 2:
                    max_len = max(map(len, contents))
                    max_word_len = max([max([len(content_ii) for content_ii in content_i]) for
                                        content_i in contents])
@@ -558,20 +582,21 @@ class AutoPadder(Padder):
                            array[i, j, :len(content_ii)] = content_ii
                else:
                    shape = np.shape(contents)
                    if len(shape)==4: # 说明各dimension是相同的大小
                    if len(shape) == 4:  # 说明各dimension是相同的大小
                        array = np.array(contents, dtype=field_ele_dtype)
                    else:
                        raise RuntimeError(f"Field:{field_name} has 3 dimensions, every sample should have the same shape.")
                        raise RuntimeError(
                            f"Field:{field_name} has 3 dimensions, every sample should have the same shape.")
                return array
            elif str(field_ele_dtype).startswith('torch'):
                if dim==0:
                if dim == 0:
                    tensor = torch.tensor(contents).to(field_ele_dtype)
                elif dim==1:
                elif dim == 1:
                    max_len = max(map(len, contents))
                    tensor = torch.full((len(contents), max_len), fill_value=self.pad_val, dtype=field_ele_dtype)
                    for i, content_i in enumerate(contents):
                        tensor[i, :len(content_i)] = torch.tensor(content_i)
                elif dim==2:
                elif dim == 2:
                    max_len = max(map(len, contents))
                    max_word_len = max([max([len(content_ii) for content_ii in content_i]) for
                                        content_i in contents])
@@ -582,15 +607,18 @@ class AutoPadder(Padder):
                            tensor[i, j, :len(content_ii)] = torch.tensor(content_ii)
                else:
                    shapes = set([np.shape(content_i) for content_i in contents])
                    if len(shapes)>1:
                        raise RuntimeError(f"Field:{field_name} has 3 dimensions, every sample should have the same shape.")
                    if len(shapes) > 1:
                        raise RuntimeError(
                            f"Field:{field_name} has 3 dimensions, every sample should have the same shape.")
                    shape = shapes.pop()
                    if len(shape)==3:
                        tensor = torch.full([len(contents)]+list(shape), fill_value=self.pad_val, dtype=field_ele_dtype)
                    if len(shape) == 3:
                        tensor = torch.full([len(contents)] + list(shape), fill_value=self.pad_val,
                                            dtype=field_ele_dtype)
                        for i, content_i in enumerate(contents):
                            tensor[i] = torch.tensor(content_i, dtype=field_ele_dtype)
                    else:
                        raise RuntimeError(f"Field:{field_name} has 3 dimensions, every sample should have the same shape.")
                        raise RuntimeError(
                            f"Field:{field_name} has 3 dimensions, every sample should have the same shape.")
                return tensor
            else:
                return np.array(contents)  # 不进行任何操作
@@ -621,7 +649,7 @@ class EngChar2DPadder(Padder):
        dataset.set_padder('chars', padder)  # chars这个field的设置为了EnChar2DPadder

    """

    
    def __init__(self, pad_val=0, pad_length=0):
        """
        :param pad_val: int, pad的位置使用该index
@@ -629,9 +657,9 @@ class EngChar2DPadder(Padder):
            都pad或截取到该长度.
        """
        super().__init__(pad_val=pad_val)

        
        self.pad_length = pad_length

    
    def __call__(self, contents, field_name, field_ele_dtype, dim):
        """
        期望输入类似于
@@ -650,7 +678,7 @@ class EngChar2DPadder(Padder):
            raise TypeError('dtype of Field:{} should be np.int64 or np.float64 to do 2D padding, get {}.'.format(
                field_name, field_ele_dtype
            ))
        assert dim==2, f"Field:{field_name} has {dim}, EngChar2DPadder only supports input with 2 dimensions."
        assert dim == 2, f"Field:{field_name} has {dim}, EngChar2DPadder only supports input with 2 dimensions."
        if self.pad_length < 1:
            max_char_length = max([max(len(char_lst) for char_lst in word_lst) for word_lst in contents])
        else:
@@ -658,12 +686,12 @@ class EngChar2DPadder(Padder):
        max_sent_length = max(len(word_lst) for word_lst in contents)
        batch_size = len(contents)
        dtype = type(contents[0][0][0])

        
        padded_array = np.full((batch_size, max_sent_length, max_char_length), fill_value=self.pad_val,
                               dtype=dtype)
        for b_idx, word_lst in enumerate(contents):
            for c_idx, char_lst in enumerate(word_lst):
                chars = char_lst[:max_char_length]
                padded_array[b_idx, c_idx, :len(chars)] = chars

        
        return padded_array
--- a/fastNLP/core/instance.py
+++ b/fastNLP/core/instance.py
@@ -35,6 +35,13 @@ class Instance(object):
        :param Any field: 新增field的内容
        """
        self.fields[field_name] = field

    def items(self):
        """
        返回一个迭代器，迭代器返回两个内容，第一个内容是field_name, 第二个内容是field_value
        :return:
        """
        return self.fields.items()
    
    def __getitem__(self, name):
        if name in self.fields:
--- a/fastNLP/core/losses.py
+++ b/fastNLP/core/losses.py
@@ -28,6 +28,7 @@ from .utils import _check_arg_dict_list
 from .utils import _check_function_or_method
 from .utils import _get_func_signature
 from .utils import seq_len_to_mask
 import warnings


 class LossBase(object):
@@ -205,10 +206,14 @@ class CrossEntropyLoss(LossBase):
    
    :param pred: 参数映射表中 `pred` 的映射关系，None表示映射关系为 `pred` -> `pred`
    :param target: 参数映射表中 `target` 的映射关系，None表示映射关系为 `target` -> `target`
    :param seq_len: 句子的长度, 长度之外的token不会计算loss。。
    :param seq_len: 句子的长度, 长度之外的token不会计算loss。
    :param int class_in_dim: 在序列标注的场景中，pred可能的shape为(batch_size, max_len, num_classes)
        或(batch_size, num_classes, max_len)， CrossEntropyLoss需要知道哪一维是class的维度以计算loss。如果为-1，就根据pred的第
        二维是否等于target的第二维来判断是否需要交换pred的第二维和第三维，因为target的第二维是length的维度，如果这一维度上和pred相等，
        那么pred可能第二维也是长度维(存在误判的可能，如果有误判的情况，请显示设置该值)。其它大于0的值则认为该维度是class的维度。
    :param padding_idx: padding的index，在计算loss时将忽略target中标号为padding_idx的内容, 可以通过该值代替
        传入seq_len.
    :param str reduction: 支持'mean'，'sum'和'none'.
    :param str reduction: 支持 `mean` ，`sum` 和 `none` .

    Example::

@@ -216,17 +221,21 @@ class CrossEntropyLoss(LossBase):
        
    """
    
    def __init__(self, pred=None, target=None, seq_len=None, padding_idx=-100, reduction='mean'):
    def __init__(self, pred=None, target=None, seq_len=None, class_in_dim=-1, padding_idx=-100, reduction='mean'):
        super(CrossEntropyLoss, self).__init__()
        self._init_param_map(pred=pred, target=target, seq_len=seq_len)
        self.padding_idx = padding_idx
        assert reduction in ('mean', 'sum', 'none')
        self.reduction = reduction
        self.class_in_dim = class_in_dim
    
    def get_loss(self, pred, target, seq_len=None):
        if pred.dim() > 2:
            if pred.size(1) != target.size(1):
                pred = pred.transpose(1, 2)
            if self.class_in_dim == -1:
                if pred.size(1) != target.size(1):  # 有可能顺序替换了
                    pred = pred.transpose(1, 2)
            else:
                pred = pred.tranpose(-1, pred)
            pred = pred.reshape(-1, pred.size(-1))
            target = target.reshape(-1)
        if seq_len is not None:
@@ -265,9 +274,9 @@ class BCELoss(LossBase):

    二分类交叉熵损失函数
    
    :param pred: 参数映射表中`pred`的映射关系，None表示映射关系为`pred`->`pred`
    :param target: 参数映射表中`target`的映射关系，None表示映射关系为`target`->`target`
    :param str reduction: 支持'mean'，'sum'和'none'.
    :param pred: 参数映射表中 `pred` 的映射关系，None表示映射关系为 `pred` -> `pred`
    :param target: 参数映射表中 `target` 的映射关系，None表示映射关系为 `target` -> `target`
    :param str reduction: 支持 `mean` ，`sum` 和 `none` .
    """
    
    def __init__(self, pred=None, target=None, reduction='mean'):
@@ -286,11 +295,11 @@ class NLLLoss(LossBase):
    
    负对数似然损失函数
    
    :param pred: 参数映射表中`pred`的映射关系，None表示映射关系为`pred`->`pred`
    :param target: 参数映射表中`target`的映射关系，None表示映射关系为`target`->`target`
    :param pred: 参数映射表中 `pred` 的映射关系，None表示映射关系为 `pred` -> `pred`
    :param target: 参数映射表中 `target` 的映射关系，None表示映射关系为 `target` -> `target`
    :param ignore_idx: ignore的index，在计算loss时将忽略target中标号为ignore_idx的内容, 可以通过该值代替
        传入seq_len.
    :param str reduction: 支持'mean'，'sum'和'none'.
    :param str reduction: 支持 `mean` ，`sum` 和 `none` .
    """
    
    def __init__(self, pred=None, target=None, ignore_idx=-100, reduction='mean'):
--- a/fastNLP/core/metrics.py
+++ b/fastNLP/core/metrics.py
@@ -27,14 +27,14 @@ from abc import abstractmethod

 class MetricBase(object):
    """
    所有metrics的基类,，所有的传入到Trainer, Tester的Metric需要继承自该对象，需要覆盖写入evaluate(), get_metric()方法。
    所有metrics的基类,所有的传入到Trainer, Tester的Metric需要继承自该对象，需要覆盖写入evaluate(), get_metric()方法。
    
        evaluate(xxx)中传入的是一个batch的数据。
        
        get_metric(xxx)当所有数据处理完毕，调用该方法得到最终的metric值
        
    以分类问题中，Accuracy计算为例
    假设model的forward返回dict中包含'pred'这个key, 并且该key需要用于Accuracy::
    假设model的forward返回dict中包含 `pred` 这个key, 并且该key需要用于Accuracy::
    
        class Model(nn.Module):
            def __init__(xxx):
@@ -43,7 +43,7 @@ class MetricBase(object):
                # do something
                return {'pred': pred, 'other_keys':xxx} # pred's shape: batch_size x num_classes
                
    假设dataset中'label'这个field是需要预测的值，并且该field被设置为了target
    假设dataset中 `label` 这个field是需要预测的值，并且该field被设置为了target
    对应的AccMetric可以按如下的定义, version1, 只使用这一次::
    
        class AccMetric(MetricBase):
@@ -118,6 +118,7 @@ class MetricBase(object):
    def __init__(self):
        self._param_map = {}  # key is param in function, value is input param.
        self._checked = False
        self._metric_name = self.__class__.__name__

    @property
    def param_map(self):
@@ -135,6 +136,23 @@ class MetricBase(object):
    @abstractmethod
    def get_metric(self, reset=True):
        raise NotImplemented

    def set_metric_name(self, name:str):
        """
        设置metric的名称，默认是Metric的class name.

        :param str name:
        :return: self
        """
        self._metric_name = name
        return self

    def get_metric_name(self):
        """
        返回metric的名称
        :return:
        """
        return self._metric_name
    
    def _init_param_map(self, key_map=None, **kwargs):
        """检查key_map和其他参数map，并将这些映射关系添加到self._param_map
@@ -358,6 +376,7 @@ def _bmes_tag_to_spans(tags, ignore_labels=None):
    """
    给定一个tags的lis，比如['S-song', 'B-singer', 'M-singer', 'E-singer', 'S-moive', 'S-actor']。
    返回[('song', (0, 1)), ('singer', (1, 4)), ('moive', (4, 5)), ('actor', (5, 6))] (左闭右开区间)
    也可以是单纯的['S', 'B', 'M', 'E', 'B', 'M', 'M',...]序列

    :param tags: List[str],
    :param ignore_labels: List[str], 在该list中的label将被忽略
@@ -478,7 +497,7 @@ class SpanFPreRecMetric(MetricBase):
    别名：:class:`fastNLP.SpanFPreRecMetric` :class:`fastNLP.core.metrics.SpanFPreRecMetric`

    在序列标注问题中，以span的方式计算F, pre, rec.
    比如中文Part of speech中，会以character的方式进行标注，句子'中国在亚洲'对应的POS可能为(以BMES为例)
    比如中文Part of speech中，会以character的方式进行标注，句子 `中国在亚洲` 对应的POS可能为(以BMES为例)
    ['B-NN', 'E-NN', 'S-DET', 'B-NN', 'E-NN']。该metric就是为类似情况下的F1计算。
    最后得到的metric结果为::
    
@@ -502,15 +521,15 @@ class SpanFPreRecMetric(MetricBase):

    :param tag_vocab: 标签的 :class:`~fastNLP.Vocabulary` 。支持的标签为"B"(没有label)；或"B-xxx"(xxx为某种label，比如POS中的NN)，
        在解码时，会将相同xxx的认为是同一个label，比如['B-NN', 'E-NN']会被合并为一个'NN'.
    :param str pred: 用该key在evaluate()时从传入dict中取出prediction数据。 为None，则使用'pred'取数据
    :param str target: 用该key在evaluate()时从传入dict中取出target数据。 为None，则使用'target'取数据
    :param str seq_len: 用该key在evaluate()时从传入dict中取出sequence length数据。为None，则使用'seq_len'取数据。
    :param str pred: 用该key在evaluate()时从传入dict中取出prediction数据。 为None，则使用 `pred` 取数据
    :param str target: 用该key在evaluate()时从传入dict中取出target数据。 为None，则使用 `target` 取数据
    :param str seq_len: 用该key在evaluate()时从传入dict中取出sequence length数据。为None，则使用 `seq_len` 取数据。
    :param str encoding_type: 目前支持bio, bmes, bmeso, bioes
    :param list ignore_labels: str 组成的list. 这个list中的class不会被用于计算。例如在POS tagging时传入['NN']，则不会计算'NN'这
        个label
    :param bool only_gross: 是否只计算总的f1, precision, recall的值；如果为False，不仅返回总的f1, pre, rec, 还会返回每个
        label的f1, pre, rec
    :param str f_type: 'micro'或'macro'. 'micro':通过先计算总体的TP，FN和FP的数量，再计算f, precision, recall; 'macro':
    :param str f_type: `micro` 或 `macro` . `micro` :通过先计算总体的TP，FN和FP的数量，再计算f, precision, recall; `macro` :
        分布计算每个类别的f, precision, recall，然后做平均（各类别f的权重相同）
    :param float beta: f_beta分数， :math:`f_{beta} = \frac{(1 + {beta}^{2})*(pre*rec)}{({beta}^{2}*pre + rec)}` .
        常用为beta=0.5, 1, 2. 若为0.5则精确率的权重高于召回率；若为1，则两者平等；若为2，则召回率权重高于精确率。
@@ -624,7 +643,7 @@ class SpanFPreRecMetric(MetricBase):
                f, pre, rec = self._compute_f_pre_rec(tp, fn, fp)
                f_sum += f
                pre_sum += pre
                rec_sum + rec
                rec_sum += rec
                if not self.only_gross and tag != '':  # tag!=''防止无tag的情况
                    f_key = 'f-{}'.format(tag)
                    pre_key = 'pre-{}'.format(tag)
@@ -814,8 +833,8 @@ class ExtractiveQAMetric(MetricBase):
            if not self.right_open:
                e += 1
                te += 1
            if ts == 0 and te == int(not self.right_open):
                if s == 0 and e == int(not self.right_open):
            if ts == 0 and te == 1:
                if s == 0 and e == 1:
                    self.no_ans_correct += 1
                    self.no2no += 1
                else:
--- a/fastNLP/core/optimizer.py
+++ b/fastNLP/core/optimizer.py
@@ -5,7 +5,8 @@ optimizer 模块定义了 fastNLP 中所需的各种优化器，一般做为 :cl
 __all__ = [
    "Optimizer",
    "SGD",
    "Adam"
    "Adam",
    "AdamW"
 ]

 import torch
@@ -48,7 +49,7 @@ class NullOptimizer(Optimizer):
        super().__init__(None)

    def construct_from_pytorch(self, model_params):
        pass
        return self

    def __getattr__(self, item):
        def pass_func(*args, **kwargs):
@@ -103,21 +104,28 @@ class Adam(Optimizer):


 class AdamW(TorchOptimizer):
    r"""对AdamW的实现，该实现应该会在pytorch更高版本中出现，https://github.com/pytorch/pytorch/pull/21250。这里提前加入
    r"""
    别名：:class:`fastNLP.AdamW` :class:`fastNLP.core.optimizer.AdamW`

    对AdamW的实现，该实现应该会在pytorch更高版本中出现，https://github.com/pytorch/pytorch/pull/21250。这里提前加入
    
    .. todo::
        翻译成中文
    
    The original Adam algorithm was proposed in `Adam: A Method for Stochastic Optimization`_.
    The AdamW variant was proposed in `Decoupled Weight Decay Regularization`_.
    Arguments:
        params (iterable): iterable of parameters to optimize or dicts defining
            parameter groups
        lr (float, optional): learning rate (default: 1e-3)
        betas (Tuple[float, float], optional): coefficients used for computing
            running averages of gradient and its square (default: (0.9, 0.99))
        eps (float, optional): term added to the denominator to improve
            numerical stability (default: 1e-8)
        weight_decay (float, optional): weight decay coefficient (default: 1e-2)
        amsgrad (boolean, optional): whether to use the AMSGrad variant of this
            algorithm from the paper `On the Convergence of Adam and Beyond`_
            (default: False)

    :param params (iterable): iterable of parameters to optimize or dicts defining
        parameter groups
    :param lr (float, optional): learning rate (default: 1e-3)
    :param betas (Tuple[float, float], optional): coefficients used for computing
        running averages of gradient and its square (default: (0.9, 0.99))
    :param eps (float, optional): term added to the denominator to improve
        numerical stability (default: 1e-8)
    :param weight_decay (float, optional): weight decay coefficient (default: 1e-2)
        algorithm from the paper `On the Convergence of Adam and Beyond`_
        (default: False)

    .. _Adam\: A Method for Stochastic Optimization:
        https://arxiv.org/abs/1412.6980
    .. _Decoupled Weight Decay Regularization:
@@ -147,9 +155,9 @@ class AdamW(TorchOptimizer):

    def step(self, closure=None):
        """Performs a single optimization step.
        Arguments:
            closure (callable, optional): A closure that reevaluates the model
                and returns the loss.

        :param closure: (callable, optional) A closure that reevaluates the model
            and returns the loss.
        """
        loss = None
        if closure is not None:
--- a/fastNLP/core/predictor.py
+++ b/fastNLP/core/predictor.py
@@ -1,13 +1,15 @@
 """
    ..todo::
        检查这个类是否需要
 """
 """undocumented"""

 __all__ = [
    "Predictor"
 ]

 from collections import defaultdict

 import torch

 from . import DataSetIter
 from . import DataSet
 from . import DataSetIter
 from . import SequentialSampler
 from .utils import _build_args, _move_dict_value_to_device, _get_model_device

@@ -21,7 +23,7 @@ class Predictor(object):

    :param torch.nn.Module network: 用来完成预测任务的模型
    """

    
    def __init__(self, network):
        if not isinstance(network, torch.nn.Module):
            raise ValueError(
@@ -29,7 +31,7 @@ class Predictor(object):
        self.network = network
        self.batch_size = 1
        self.batch_output = []

    
    def predict(self, data: DataSet, seq_len_field_name=None):
        """用已经训练好的模型进行inference.

@@ -41,27 +43,27 @@ class Predictor(object):
            raise ValueError("Only Dataset class is allowed, not {}.".format(type(data)))
        if seq_len_field_name is not None and seq_len_field_name not in data.field_arrays:
            raise ValueError("Field name {} not found in DataSet {}.".format(seq_len_field_name, data))

        
        prev_training = self.network.training
        self.network.eval()
        network_device = _get_model_device(self.network)
        batch_output = defaultdict(list)
        data_iterator = DataSetIter(data, batch_size=self.batch_size, sampler=SequentialSampler(), as_numpy=False)

        
        if hasattr(self.network, "predict"):
            predict_func = self.network.predict
        else:
            predict_func = self.network.forward

        
        with torch.no_grad():
            for batch_x, _ in data_iterator:
                _move_dict_value_to_device(batch_x, _, device=network_device)
                refined_batch_x = _build_args(predict_func, **batch_x)
                prediction = predict_func(**refined_batch_x)

                
                if seq_len_field_name is not None:
                    seq_lens = batch_x[seq_len_field_name].tolist()

                
                for key, value in prediction.items():
                    value = value.cpu().numpy()
                    if len(value.shape) == 1 or (len(value.shape) == 2 and value.shape[1] == 1):
@@ -74,6 +76,6 @@ class Predictor(object):
                            batch_output[key].extend(tmp_batch)
                        else:
                            batch_output[key].append(value)

        
        self.network.train(prev_training)
        return batch_output
--- a/fastNLP/core/sampler.py
+++ b/fastNLP/core/sampler.py
@@ -25,9 +25,9 @@ class Sampler(object):
    
    def __call__(self, data_set):
        """
       :param DataSet data_set: `DataSet` 对象, 需要Sample的数据
       :return result: list(int) 其中元素的下标序列, ``data_set`` 中元素会按 ``result`` 中顺序取出
       """
        :param DataSet data_set: `DataSet` 对象, 需要Sample的数据
        :return result: list(int) 其中元素的下标序列, ``data_set`` 中元素会按 ``result`` 中顺序取出
        """
        raise NotImplementedError


@@ -62,16 +62,27 @@ class BucketSampler(Sampler):
    带Bucket的 `Random Sampler`. 可以随机地取出长度相似的元素

    :param int num_buckets: bucket的数量
    :param int batch_size: batch的大小
    :param int batch_size: batch的大小. 默认为None，Trainer在调用BucketSampler时，会将该值正确设置，如果是非Trainer场景使用，需
        要显示传递该值
    :param str seq_len_field_name: 对应序列长度的 `field` 的名字
    """
    
    def __init__(self, num_buckets=10, batch_size=32, seq_len_field_name='seq_len'):
    def __init__(self, num_buckets=10, batch_size=None, seq_len_field_name='seq_len'):
        self.num_buckets = num_buckets
        self.batch_size = batch_size
        self.seq_len_field_name = seq_len_field_name
    

    def set_batch_size(self, batch_size):
        """

        :param int batch_size: 每个batch的大小
        :return:
        """
        self.batch_size = batch_size

    def __call__(self, data_set):
        if self.batch_size is None:
            raise RuntimeError("batch_size is None.")
        seq_lens = data_set.get_all_fields()[self.seq_len_field_name].content
        total_sample_num = len(seq_lens)
        
--- a/fastNLP/core/tester.py
+++ b/fastNLP/core/tester.py
@@ -1,7 +1,7 @@
 """
 tester模块实现了 fastNLP 所需的Tester类，能在提供数据、模型以及metric的情况下进行性能测试。

 Example::
 .. code-block::

    import numpy as np
    import torch
@@ -32,9 +32,16 @@ Tester在验证进行之前会调用model.eval()提示当前进入了evaluation


 """
 import time

 import torch
 import torch.nn as nn

 try:
    from tqdm.auto import tqdm
 except:
    from .utils import _pseudo_tqdm as tqdm

 from .batch import BatchIter, DataSetIter
 from .dataset import DataSet
 from .metrics import _prepare_metrics
@@ -47,7 +54,9 @@ from .utils import _get_func_signature
 from .utils import _get_model_device
 from .utils import _move_model_to_device
 from ._parallel_utils import _data_parallel_wrapper
 from ._parallel_utils import _model_contains_inner_module
 from functools import partial
 from ._logger import logger

 __all__ = [
    "Tester"
@@ -60,15 +69,14 @@ class Tester(object):

    Tester是在提供数据，模型以及metric的情况下进行性能测试的类。需要传入模型，数据以及metric进行验证。

    :param data: 需要测试的数据集， :class:`~fastNLP.DataSet` 类型
    :param ~fastNLP.DataSet data: 需要测试的数据集
    :param torch.nn.module model: 使用的模型
    :param metrics: :class:`~fastNLP.core.metrics.MetricBase` 或者一个列表的 :class:`~fastNLP.core.metrics.MetricBase`
    :param ~fastNLP.core.metrics.MetricBase,List[~fastNLP.core.metrics.MetricBase] metrics: 测试时使用的metrics
    :param int batch_size: evaluation时使用的batch_size有多大。
    :param str,int,torch.device,list(int) device: 将模型load到哪个设备。默认为None，即Trainer不对模型
        的计算位置进行管理。支持以下的输入:

        1. str: ['cpu', 'cuda', 'cuda:0', 'cuda:1', ...] 依次为'cpu'中, 可见的第一个GPU中, 可见的第一个GPU中,
        可见的第二个GPU中;
        1. str: ['cpu', 'cuda', 'cuda:0', 'cuda:1', ...] 依次为'cpu'中, 可见的第一个GPU中,可见的第一个GPU中,可见的第二个GPU中;

        2. torch.device：将模型装载到torch.device上。

@@ -80,13 +88,12 @@ class Tester(object):

        如果模型是通过predict()进行预测的话，那么将不能使用多卡(DataParallel)进行验证，只会使用第一张卡上的模型。
    :param int verbose: 如果为0不输出任何信息; 如果为1，打印出验证结果。
    :param bool use_tqdm: 是否使用tqdm来显示测试进度; 如果为False，则不会显示任何内容。
    """
    
    def __init__(self, data, model, metrics, batch_size=16, num_workers=0, device=None, verbose=1):
    def __init__(self, data, model, metrics, batch_size=16, num_workers=0, device=None, verbose=1, use_tqdm=True):
        super(Tester, self).__init__()
        
        if not isinstance(data, DataSet):
            raise TypeError(f"The type of data must be `fastNLP.DataSet`, got `{type(data)}`.")

        if not isinstance(model, nn.Module):
            raise TypeError(f"The type of model must be `torch.nn.Module`, got `{type(model)}`.")
        
@@ -96,6 +103,8 @@ class Tester(object):
        self._model = _move_model_to_device(model, device=device)
        self.batch_size = batch_size
        self.verbose = verbose
        self.use_tqdm = use_tqdm
        self.logger = logger

        if isinstance(data, DataSet):
            self.data_iterator = DataSetIter(
@@ -107,19 +116,22 @@ class Tester(object):

        # check predict
        if (hasattr(self._model, 'predict') and callable(self._model.predict)) or \
            (isinstance(self._model, nn.DataParallel) and hasattr(self._model.module, 'predict') and
              callable(self._model.module.predict)):
                (_model_contains_inner_module(self._model) and hasattr(self._model.module, 'predict') and
                 callable(self._model.module.predict)):
            if isinstance(self._model, nn.DataParallel):
                self._predict_func_wrapper = partial(_data_parallel_wrapper('predict',
                                                                    self._model.device_ids,
                                                                    self._model.output_device),
                                                     network=self._model.module)
                self._predict_func = self._model.module.predict  # 用于匹配参数
            elif isinstance(self._model, nn.parallel.DistributedDataParallel):
                self._predict_func = self._model.module.predict
                self._predict_func_wrapper = self._model.module.predict  # 用于调用
            else:
                self._predict_func = self._model.predict
                self._predict_func_wrapper = self._model.predict
        else:
            if isinstance(self._model, nn.DataParallel):
            if _model_contains_inner_module(model):
                self._predict_func_wrapper = self._model.forward
                self._predict_func = self._model.module.forward
            else:
@@ -140,21 +152,39 @@ class Tester(object):
        eval_results = {}
        try:
            with torch.no_grad():
                for batch_x, batch_y in data_iterator:
                    _move_dict_value_to_device(batch_x, batch_y, device=self._model_device)
                    pred_dict = self._data_forward(self._predict_func, batch_x)
                    if not isinstance(pred_dict, dict):
                        raise TypeError(f"The return value of {_get_func_signature(self._predict_func)} "
                                        f"must be `dict`, got {type(pred_dict)}.")
                if not self.use_tqdm:
                    from .utils import _pseudo_tqdm as inner_tqdm
                else:
                    inner_tqdm = tqdm
                with inner_tqdm(total=len(data_iterator), leave=False, dynamic_ncols=True) as pbar:
                    pbar.set_description_str(desc="Test")

                    start_time = time.time()

                    for batch_x, batch_y in data_iterator:
                        _move_dict_value_to_device(batch_x, batch_y, device=self._model_device)
                        pred_dict = self._data_forward(self._predict_func, batch_x)
                        if not isinstance(pred_dict, dict):
                            raise TypeError(f"The return value of {_get_func_signature(self._predict_func)} "
                                            f"must be `dict`, got {type(pred_dict)}.")
                        for metric in self.metrics:
                            metric(pred_dict, batch_y)

                        if self.use_tqdm:
                            pbar.update()

                    for metric in self.metrics:
                        metric(pred_dict, batch_y)
                for metric in self.metrics:
                    eval_result = metric.get_metric()
                    if not isinstance(eval_result, dict):
                        raise TypeError(f"The return value of {_get_func_signature(metric.get_metric)} must be "
                                        f"`dict`, got {type(eval_result)}")
                    metric_name = metric.__class__.__name__
                    eval_results[metric_name] = eval_result
                        eval_result = metric.get_metric()
                        if not isinstance(eval_result, dict):
                            raise TypeError(f"The return value of {_get_func_signature(metric.get_metric)} must be "
                                            f"`dict`, got {type(eval_result)}")
                        metric_name = metric.get_metric_name()
                        eval_results[metric_name] = eval_result
                    pbar.close()
                    end_time = time.time()
                    test_str = f'Evaluate data in {round(end_time - start_time, 2)} seconds!'
                    # pbar.write(test_str)
                    self.logger.info(test_str)
        except _CheckError as e:
            prev_func_signature = _get_func_signature(self._predict_func)
            _check_loss_evaluate(prev_func_signature=prev_func_signature, func_signature=e.func_signature,
@@ -162,7 +192,7 @@ class Tester(object):
                                 dataset=self.data, check_level=0)
        
        if self.verbose >= 1:
            print("[tester] \n{}".format(self._format_eval_results(eval_results)))
            logger.info("[tester] \n{}".format(self._format_eval_results(eval_results)))
        self._mode(network, is_test=False)
        return eval_results
    
--- a/fastNLP/core/trainer.py
+++ b/fastNLP/core/trainer.py
--- a/fastNLP/core/utils.py
+++ b/fastNLP/core/utils.py
@@ -4,7 +4,7 @@ utils模块实现了 fastNLP 内部和外部所需的很多工具。其中用户
 __all__ = [
    "cache_results",
    "seq_len_to_mask",
    "Option",
    "get_seq_len"
 ]

 import _pickle
@@ -17,6 +17,7 @@ import numpy as np
 import torch
 import torch.nn as nn
 from typing import List
 from ._logger import logger

 _CheckRes = namedtuple('_CheckRes', ['missing', 'unused', 'duplicated', 'required', 'all_needed',
                                     'varargs'])
@@ -24,26 +25,27 @@ _CheckRes = namedtuple('_CheckRes', ['missing', 'unused', 'duplicated', 'require

 class Option(dict):
    """a dict can treat keys as attributes"""
    
    def __getattr__(self, item):
        try:
            return self.__getitem__(item)
        except KeyError:
            raise AttributeError(item)

    
    def __setattr__(self, key, value):
        if key.startswith('__') and key.endswith('__'):
            raise AttributeError(key)
        self.__setitem__(key, value)

    
    def __delattr__(self, item):
        try:
            self.pop(item)
        except KeyError:
            raise AttributeError(item)

    
    def __getstate__(self):
        return self

    
    def __setstate__(self, state):
        self.update(state)

@@ -62,7 +64,6 @@ def _prepare_cache_filepath(filepath):
        os.makedirs(cache_dir)


 #  TODO 可以保存下缓存时的参数，如果load的时候发现参数不一致，发出警告。
 def cache_results(_cache_fp, _refresh=False, _verbose=1):
    """
    别名：:class:`fastNLP.cache_results` :class:`fastNLP.core.uitls.cache_results`
@@ -144,7 +145,7 @@ def cache_results(_cache_fp, _refresh=False, _verbose=1):
                    with open(cache_filepath, 'rb') as f:
                        results = _pickle.load(f)
                    if verbose == 1:
                        print("Read cache from {}.".format(cache_filepath))
                        logger.info("Read cache from {}.".format(cache_filepath))
                    refresh_flag = False
            
            if refresh_flag:
@@ -155,7 +156,7 @@ def cache_results(_cache_fp, _refresh=False, _verbose=1):
                    _prepare_cache_filepath(cache_filepath)
                    with open(cache_filepath, 'wb') as f:
                        _pickle.dump(results, f)
                    print("Save cache to {}.".format(cache_filepath))
                    logger.info("Save cache to {}.".format(cache_filepath))
            
            return results
        
@@ -163,6 +164,7 @@ def cache_results(_cache_fp, _refresh=False, _verbose=1):
    
    return wrapper_


 def _save_model(model, model_name, save_dir, only_param=False):
    """ 存储不含有显卡信息的state_dict或model
    :param model:
@@ -187,50 +189,6 @@ def _save_model(model, model_name, save_dir, only_param=False):
        torch.save(model, model_path)
        model.to(_model_device)


 # def save_pickle(obj, pickle_path, file_name):
 #     """Save an object into a pickle file.
 #
 #     :param obj: an object
 #     :param pickle_path: str, the directory where the pickle file is to be saved
 #     :param file_name: str, the name of the pickle file. In general, it should be ended by "pkl".
 #     """
 #     if not os.path.exists(pickle_path):
 #         os.mkdir(pickle_path)
 #         print("make dir {} before saving pickle file".format(pickle_path))
 #     with open(os.path.join(pickle_path, file_name), "wb") as f:
 #         _pickle.dump(obj, f)
 #     print("{} saved in {}".format(file_name, pickle_path))
 #
 #
 # def load_pickle(pickle_path, file_name):
 #     """Load an object from a given pickle file.
 #
 #     :param pickle_path: str, the directory where the pickle file is.
 #     :param file_name: str, the name of the pickle file.
 #     :return obj: an object stored in the pickle
 #     """
 #     with open(os.path.join(pickle_path, file_name), "rb") as f:
 #         obj = _pickle.load(f)
 #     print("{} loaded from {}".format(file_name, pickle_path))
 #     return obj
 #
 #
 # def pickle_exist(pickle_path, pickle_name):
 #     """Check if a given pickle file exists in the directory.
 #
 #     :param pickle_path: the directory of target pickle file
 #     :param pickle_name: the filename of target pickle file
 #     :return: True if file exists else False
 #     """
 #     if not os.path.exists(pickle_path):
 #         os.makedirs(pickle_path)
 #     file_name = os.path.join(pickle_path, pickle_name)
 #     if os.path.exists(file_name):
 #         return True
 #     else:
 #         return False

 def _move_model_to_device(model, device):
    """
    将model移动到device
@@ -253,8 +211,8 @@ def _move_model_to_device(model, device):

    :return: torch.nn.DataParallel or torch.nn.Module
    """
    if isinstance(model, torch.nn.parallel.DistributedDataParallel):
        raise RuntimeError("model of `torch.nn.parallel.DistributedDataParallel` is not supported right now.")
    # if isinstance(model, torch.nn.parallel.DistributedDataParallel):
    #     raise RuntimeError("model of `torch.nn.parallel.DistributedDataParallel` is not supported right now.")
    
    if device is None:
        if isinstance(model, torch.nn.DataParallel):
@@ -351,7 +309,6 @@ def _map_args(maps: dict, **kwargs):
            output.update({name: val})
    for keys in maps.keys():
        if keys not in output.keys():
            # TODO: add UNUSED warning.
            pass
    return output

@@ -569,18 +526,6 @@ def _check_loss_evaluate(prev_func_signature: str, func_signature: str, check_re
                else:
                    _tmp = f'Provide `{_miss}` in DataSet or output of {prev_func_signature}.'
                suggestions.append(_tmp)
        # for _miss in unmapped_missing:
        #     if _miss in dataset:
        #         suggestions.append(f"Set `{_miss}` as target.")
        #     else:
        #         _tmp = ''
        #         if check_res.unused:
        #             _tmp = f"Specify your assignment for `{input_func_map.get(_miss, _miss)}` when initialize {module_name}."
        #         if _tmp:
        #             _tmp += f' Or provide `{_miss}` in DataSet or output of {prev_func_signature}.'
        #         else:
        #             _tmp = f'Provide `{_miss}` in output of {prev_func_signature} or DataSet.'
        #         suggestions.append(_tmp)
    
    if check_res.duplicated:
        errs.append(f"\tduplicated param: {check_res.duplicated}.")
@@ -673,7 +618,7 @@ def seq_len_to_mask(seq_len, max_len=None):
    将一个表示sequence length的一维数组转换为二维的mask，不包含的位置为0。
    转变 1-d seq_len到2-d mask.

    Example::
    .. code-block::
    
        >>> seq_len = torch.arange(2, 16)
        >>> mask = seq_len_to_mask(seq_len)
@@ -691,7 +636,7 @@ def seq_len_to_mask(seq_len, max_len=None):
    :param np.ndarray,torch.LongTensor seq_len: shape将是(B,)
    :param int max_len: 将长度pad到这个长度。默认(None)使用的是seq_len中最长的长度。但在nn.DataParallel的场景下可能不同卡的seq_len会有
        区别，所以需要传入一个max_len使得mask的长度是pad到该长度。
    :return: np.ndarray or torch.Tensor, shape将是(B, max_length)。 元素类似为bool或torch.uint8
    :return: np.ndarray, torch.Tensor 。shape将是(B, max_length)， 元素类似为bool或torch.uint8
    """
    if isinstance(seq_len, np.ndarray):
        assert len(np.shape(seq_len)) == 1, f"seq_len can only have one dimension, got {len(np.shape(seq_len))}."
@@ -715,15 +660,14 @@ class _pseudo_tqdm:
    """
    当无法引入tqdm，或者Trainer中设置use_tqdm为false的时候，用该方法打印数据
    """
    
    def __init__(self, **kwargs):
        pass
        self.logger = logger
    
    def write(self, info):
        print(info)
        self.logger.info(info)
    
    def set_postfix_str(self, info):
        print(info)
        self.logger.info(info)
    
    def __getattr__(self, item):
        def pass_func(*args, **kwargs):
@@ -737,7 +681,8 @@ class _pseudo_tqdm:
    def __exit__(self, exc_type, exc_val, exc_tb):
        del self

 def iob2(tags:List[str])->List[str]:

 def iob2(tags: List[str]) -> List[str]:
    """
    检查数据是否是合法的IOB数据，如果是IOB1会被自动转换为IOB2。两者的差异见
        https://datascience.stackexchange.com/questions/37824/difference-between-iob-and-iob2-format
@@ -760,7 +705,8 @@ def iob2(tags:List[str])->List[str]:
            tags[i] = "B" + tag[1:]
    return tags

 def iob2bioes(tags:List[str])->List[str]:

 def iob2bioes(tags: List[str]) -> List[str]:
    """
    将iob的tag转换为bioes编码
    :param tags: List[str]. 编码需要是大写的。
@@ -773,15 +719,35 @@ def iob2bioes(tags:List[str])->List[str]:
        else:
            split = tag.split('-')[0]
            if split == 'B':
                if i+1!=len(tags) and tags[i+1].split('-')[0] == 'I':
                if i + 1 != len(tags) and tags[i + 1].split('-')[0] == 'I':
                    new_tags.append(tag)
                else:
                    new_tags.append(tag.replace('B-', 'S-'))
            elif split == 'I':
                if i + 1<len(tags) and tags[i+1].split('-')[0] == 'I':
                if i + 1 < len(tags) and tags[i + 1].split('-')[0] == 'I':
                    new_tags.append(tag)
                else:
                    new_tags.append(tag.replace('I-', 'E-'))
            else:
                raise TypeError("Invalid IOB format.")
    return new_tags
    return new_tags


 def _is_iterable(value):
    # 检查是否是iterable的, duck typing
    try:
        iter(value)
        return True
    except BaseException as e:
        return False


 def get_seq_len(words, pad_value=0):
    """
    给定batch_size x max_len的words矩阵，返回句子长度

    :param words: batch_size x max_len
    :return: (batch_size,)
    """
    mask = words.ne(pad_value)
    return mask.sum(dim=-1)
--- a/fastNLP/core/vocabulary.py
+++ b/fastNLP/core/vocabulary.py
@@ -1,14 +1,22 @@
 """
 .. todo::
    doc
 """

 __all__ = [
    "Vocabulary",
    "VocabularyOption",
 ]

 from collections import Counter
 from functools import partial
 from functools import wraps
 from collections import Counter, defaultdict

 from ._logger import logger
 from .dataset import DataSet
 from .utils import Option
 from functools import partial
 import numpy as np
 from .utils import _is_iterable


 class VocabularyOption(Option):
    def __init__(self,
@@ -48,8 +56,8 @@ def _check_build_status(func):
        if self.rebuild is False:
            self.rebuild = True
            if self.max_size is not None and len(self.word_count) >= self.max_size:
                print("[Warning] Vocabulary has reached the max size {} when calling {} method. "
                      "Adding more words may cause unexpected behaviour of Vocabulary. ".format(
                logger.info("[Warning] Vocabulary has reached the max size {} when calling {} method. "
                            "Adding more words may cause unexpected behaviour of Vocabulary. ".format(
                    self.max_size, func.__name__))
        return func(self, *args, **kwargs)
    
@@ -92,7 +100,7 @@ class Vocabulary(object):
        self.rebuild = True
        #  用于承载不需要单独创建entry的词语，具体见from_dataset()方法
        self._no_create_word = Counter()

    
    @_check_build_status
    def update(self, word_lst, no_create_entry=False):
        """依次增加序列中词在词典中的出现频率
@@ -107,6 +115,7 @@ class Vocabulary(object):
        """
        self._add_no_create_entry(word_lst, no_create_entry)
        self.word_count.update(word_lst)
        return self
    
    @_check_build_status
    def add(self, word, no_create_entry=False):
@@ -123,23 +132,24 @@ class Vocabulary(object):
        """
        self._add_no_create_entry(word, no_create_entry)
        self.word_count[word] += 1

        return self
    
    def _add_no_create_entry(self, word, no_create_entry):
        """
        在新加入word时，检查_no_create_word的设置。

        :param str, List[str] word:
        :param str List[str] word:
        :param bool no_create_entry:
        :return:
        """
        if isinstance(word, str):
        if isinstance(word, str) or not _is_iterable(word):
            word = [word]
        for w in word:
            if no_create_entry and self.word_count.get(w, 0) == self._no_create_word.get(w, 0):
                self._no_create_word[w] += 1
            elif not no_create_entry and w in self._no_create_word:
                self._no_create_word.pop(w)

    
    @_check_build_status
    def add_word(self, word, no_create_entry=False):
        """
@@ -169,6 +179,7 @@ class Vocabulary(object):
            则这个词将认为是需要创建单独的vector的。
        """
        self.update(word_lst, no_create_entry=no_create_entry)
        return self
    
    def build_vocab(self):
        """
@@ -193,13 +204,15 @@ class Vocabulary(object):
        self.word2idx.update({w: i + start_idx for i, (w, _) in enumerate(words)})
        self.build_reverse_vocab()
        self.rebuild = False

        return self
    
    def build_reverse_vocab(self):
        """
        基于 "word to index" dict, 构建 "index to word" dict.
        基于 `word to index` dict, 构建 `index to word` dict.

        """
        self.idx2word = {i: w for w, i in self.word2idx.items()}
        return self
    
    @_check_build_vocab
    def __len__(self):
@@ -250,46 +263,57 @@ class Vocabulary(object):
            # remember to use `field_name`
            vocab.index_dataset(train_data, dev_data, test_data, field_name='words')

        :param datasets: 需要转index的 class:`~fastNLP.DataSet` , 支持一个或多个（list）
        :param str field_name: 需要转index的field, 若有多个 DataSet, 每个DataSet都必须有此 field.
            目前仅支持 ``str`` , ``list(str)`` , ``list(list(str))``
        :param str new_field_name: 保存结果的field_name. 若为 ``None`` , 将覆盖原field.
            Default: ``None``
        :param ~fastNLP.DataSet,List[~fastNLP.DataSet] datasets: 需要转index的一个或多个数据集
        :param list,str field_name: 需要转index的field, 若有多个 DataSet, 每个DataSet都必须有此 field.
            目前支持 ``str`` , ``List[str]``
        :param list,str new_field_name: 保存结果的field_name. 若为 ``None`` , 将覆盖原field.
            Default: ``None``.
        """
        
        def index_instance(ins):
        def index_instance(field):
            """
            有几种情况, str, 1d-list, 2d-list
            :param ins:
            :return:
            """
            field = ins[field_name]
            if isinstance(field, str):
            if isinstance(field, str) or not _is_iterable(field):
                return self.to_index(field)
            elif isinstance(field, list):
                if not isinstance(field[0], list):
            else:
                if isinstance(field[0], str) or not _is_iterable(field[0]):
                    return [self.to_index(w) for w in field]
                else:
                    if isinstance(field[0][0], list):
                    if not isinstance(field[0][0], str) and _is_iterable(field[0][0]):
                        raise RuntimeError("Only support field with 2 dimensions.")
                    return [[self.to_index(c) for c in w] for w in field]
        
        if new_field_name is None:
            new_field_name = field_name
        new_field_name = new_field_name or field_name
        
        if type(new_field_name) == type(field_name):
            if isinstance(new_field_name, list):
                assert len(new_field_name) == len(field_name), "new_field_name should have same number elements with " \
                                                               "field_name."
            elif isinstance(new_field_name, str):
                field_name = [field_name]
                new_field_name = [new_field_name]
            else:
                raise TypeError("field_name and new_field_name can only be str or List[str].")
        
        for idx, dataset in enumerate(datasets):
            if isinstance(dataset, DataSet):
                try:
                    dataset.apply(index_instance, new_field_name=new_field_name)
                    for f_n, n_f_n in zip(field_name, new_field_name):
                        dataset.apply_field(index_instance, field_name=f_n, new_field_name=n_f_n)
                except Exception as e:
                    print("When processing the `{}` dataset, the following error occurred.".format(idx))
                    logger.info("When processing the `{}` dataset, the following error occurred.".format(idx))
                    raise e
            else:
                raise RuntimeError("Only DataSet type is allowed.")

        return self
    
    @property
    def _no_create_word_length(self):
        return len(self._no_create_word)

    
    def from_dataset(self, *datasets, field_name, no_create_entry_dataset=None):
        """
        使用dataset的对应field中词构建词典::
@@ -297,11 +321,10 @@ class Vocabulary(object):
            # remember to use `field_name`
            vocab.from_dataset(train_data1, train_data2, field_name='words')

        :param datasets: 需要转index的 class:`~fastNLP.DataSet` , 支持一个或多个（list）
        :param field_name: 可为 ``str`` 或 ``list(str)`` .
            构建词典所使用的 field(s), 支持一个或多个field
            若有多个 DataSet, 每个DataSet都必须有这些field.
            目前仅支持的field结构: ``str`` , ``list(str)`` , ``list(list(str))``
        :param ~fastNLP.DataSet,List[~fastNLP.DataSet] datasets: 需要转index的一个或多个数据集
        :param str,List[str] field_name: 可为 ``str`` 或 ``List[str]`` .
            构建词典所使用的 field(s), 支持一个或多个field，若有多个 DataSet, 每个DataSet都必须有这些field. 目前支持的field结构
            : ``str`` , ``List[str]``
        :param no_create_entry_dataset: 可以传入DataSet, List[DataSet]或者None(默认)，该选项用在接下来的模型会使用pretrain
            的embedding(包括glove, word2vec, elmo与bert)且会finetune的情况。如果仅使用来自于train的数据建立vocabulary，会导致test与dev
            中的数据无法充分利用到来自于预训练embedding的信息，所以在建立词表的时候将test与dev考虑进来会使得最终的结果更好。
@@ -319,29 +342,29 @@ class Vocabulary(object):
        def construct_vocab(ins, no_create_entry=False):
            for fn in field_name:
                field = ins[fn]
                if isinstance(field, str):
                if isinstance(field, str) or not _is_iterable(field):
                    self.add_word(field, no_create_entry=no_create_entry)
                elif isinstance(field, (list, np.ndarray)):
                    if not isinstance(field[0], (list, np.ndarray)):
                else:
                    if isinstance(field[0], str) or not _is_iterable(field[0]):
                        for word in field:
                            self.add_word(word, no_create_entry=no_create_entry)
                    else:
                        if isinstance(field[0][0], (list, np.ndarray)):
                        if not isinstance(field[0][0], str) and _is_iterable(field[0][0]):
                            raise RuntimeError("Only support field with 2 dimensions.")
                        for words in field:
                            for word in words:
                                self.add_word(word, no_create_entry=no_create_entry)

        
        for idx, dataset in enumerate(datasets):
            if isinstance(dataset, DataSet):
                try:
                    dataset.apply(construct_vocab)
                except Exception as e:
                    print("When processing the `{}` dataset, the following error occurred.".format(idx))
                except BaseException as e:
                    log("When processing the `{}` dataset, the following error occurred:".format(idx))
                    raise e
            else:
                raise TypeError("Only DataSet type is allowed.")

        
        if no_create_entry_dataset is not None:
            partial_construct_vocab = partial(construct_vocab, no_create_entry=True)
            if isinstance(no_create_entry_dataset, DataSet):
@@ -352,7 +375,7 @@ class Vocabulary(object):
                        raise TypeError("Only DataSet type is allowed.")
                    dataset.apply(partial_construct_vocab)
        return self

    
    def _is_word_no_create_entry(self, word):
        """
        判断当前的word是否是不需要创建entry的，具体参见from_dataset的说明
@@ -360,11 +383,10 @@ class Vocabulary(object):
        :return: bool
        """
        return word in self._no_create_word

    
    def to_index(self, w):
        """
        将词转为数字. 若词不再词典中被记录, 将视为 unknown, 若 ``unknown=None`` , 将抛出
        ``ValueError``::
        将词转为数字. 若词不再词典中被记录, 将视为 unknown, 若 ``unknown=None`` , 将抛出``ValueError``::

            index = vocab.to_index('abc')
            # equals to
@@ -416,6 +438,7 @@ class Vocabulary(object):
        self.idx2word = None
        self.rebuild = True
        self._no_create_word.clear()
        return self
    
    def __getstate__(self):
        """Use to prepare data for pickle.
--- a/fastNLP/embeddings/init.py
+++ b/fastNLP/embeddings/init.py
@@ -0,0 +1,27 @@
 """
 embeddings 模块主要用于从各种预训练的模型中获取词语的分布式表示，目前支持的预训练模型包括word2vec, glove, ELMO, BERT等。这里所有
 embedding的forward输入都是形状为 ``(batch_size, max_len)`` 的torch.LongTensor，输出都是 ``(batch_size, max_len, embedding_dim)`` 的
 torch.FloatTensor。所有的embedding都可以使用 `self.num_embedding` 获取最大的输入index范围, 用 `self.embeddig_dim` 或 `self.embed_size` 获取embedding的
 输出维度。
 """

 __all__ = [
    "Embedding",
    "TokenEmbedding",
    "StaticEmbedding",
    "ElmoEmbedding",
    "BertEmbedding",
    "BertWordPieceEncoder",
    "StackEmbedding",
    "LSTMCharEmbedding",
    "CNNCharEmbedding",
    "get_embeddings",
 ]

 from .embedding import Embedding, TokenEmbedding
 from .static_embedding import StaticEmbedding
 from .elmo_embedding import ElmoEmbedding
 from .bert_embedding import BertEmbedding, BertWordPieceEncoder
 from .char_embedding import CNNCharEmbedding, LSTMCharEmbedding
 from .stack_embedding import StackEmbedding
 from .utils import get_embeddings
--- a/fastNLP/embeddings/bert_embedding.py
+++ b/fastNLP/embeddings/bert_embedding.py
@@ -0,0 +1,471 @@
 """
 .. todo::
    doc
 """

 __all__ = [
    "BertEmbedding",
    "BertWordPieceEncoder"
 ]

 import os
 import collections

 from torch import nn
 import torch
 import numpy as np
 from itertools import chain

 from ..core.vocabulary import Vocabulary
 from ..io.file_utils import _get_embedding_url, cached_path, PRETRAINED_BERT_MODEL_DIR
 from ..modules.encoder.bert import _WordPieceBertModel, BertModel, BertTokenizer
 from .contextual_embedding import ContextualEmbedding
 import warnings
 from ..core import logger


 class BertEmbedding(ContextualEmbedding):
    """
    别名：:class:`fastNLP.embeddings.BertEmbedding`   :class:`fastNLP.embeddings.bert_embedding.BertEmbedding`

    使用BERT对words进行编码的Embedding。建议将输入的words长度限制在430以内，而不要使用512(根据预训练模型参数，可能有变化)。这是由于
    预训练的bert模型长度限制为512个token，而因为输入的word是未进行word piece分割的(word piece的分割有BertEmbedding在输入word
    时切分)，在分割之后长度可能会超过最大长度限制。

    BertEmbedding可以支持自动下载权重，当前支持的模型有以下的几种(待补充):

    Example::

        >>> import torch
        >>> from fastNLP import Vocabulary
        >>> from fastNLP.embeddings import BertEmbedding
        >>> vocab = Vocabulary().add_word_lst("The whether is good .".split())
        >>> embed = BertEmbedding(vocab, model_dir_or_name='en-base-uncased', requires_grad=False, layers='4,-2,-1')
        >>> words = torch.LongTensor([[vocab.to_index(word) for word in "The whether is good .".split()]])
        >>> outputs = embed(words)
        >>> outputs.size()
        >>> # torch.Size([1, 5, 2304])

    :param ~fastNLP.Vocabulary vocab: 词表
    :param str model_dir_or_name: 模型所在目录或者模型的名称。当传入模型所在目录时，目录中应该包含一个词表文件(以.txt作为后缀名),
        权重文件(以.bin作为文件后缀名), 配置文件(以.json作为后缀名)。
    :param str layers: 输出embedding表示来自于哪些层，不同层的结果按照layers中的顺序在最后一维concat起来。以','隔开层数，层的序号是
        从0开始，可以以负数去索引倒数几层。
    :param str pool_method: 因为在bert中，每个word会被表示为多个word pieces, 当获取一个word的表示的时候，怎样从它的word pieces
        中计算得到它对应的表示。支持 ``last`` , ``first`` , ``avg`` , ``max``。
    :param float word_dropout: 以多大的概率将一个词替换为unk。这样既可以训练unk也是一定的regularize。
    :param float dropout: 以多大的概率对embedding的表示进行Dropout。0.1即随机将10%的值置为0。
    :param bool include_cls_sep: bool，在bert计算句子的表示的时候，需要在前面加上[CLS]和[SEP], 是否在结果中保留这两个内容。 这样
        会使得word embedding的结果比输入的结果长两个token。如果该值为True，则在使用 :class::StackEmbedding 可能会与其它类型的
        embedding长度不匹配。
    :param bool pooled_cls: 返回的[CLS]是否使用预训练中的BertPool映射一下，仅在include_cls_sep时有效。如果下游任务只取[CLS]做预测，
        一般该值为True。
    :param bool requires_grad: 是否需要gradient以更新Bert的权重。
    :param bool auto_truncate: 当句子words拆分为word pieces长度超过bert最大允许长度(一般为512), 自动截掉拆分后的超过510个
        word pieces后的内容，并将第512个word piece置为[SEP]。超过长度的部分的encode结果直接全部置零。一般仅有只使用[CLS]
        来进行分类的任务将auto_truncate置为True。
    """
    
    def __init__(self, vocab: Vocabulary, model_dir_or_name: str = 'en-base-uncased', layers: str = '-1',
                 pool_method: str = 'first', word_dropout=0, dropout=0, include_cls_sep: bool = False,
                 pooled_cls=True, requires_grad: bool = False, auto_truncate: bool = False):
        super(BertEmbedding, self).__init__(vocab, word_dropout=word_dropout, dropout=dropout)
        
        # 根据model_dir_or_name检查是否存在并下载
        if model_dir_or_name.lower() in PRETRAINED_BERT_MODEL_DIR:
            if 'cn' in model_dir_or_name.lower() and pool_method not in ('first', 'last'):
                warnings.warn("For Chinese bert, pooled_method should choose from 'first', 'last' in order to achieve"
                              " faster speed.")
            model_url = _get_embedding_url('bert', model_dir_or_name.lower())
            model_dir = cached_path(model_url, name='embedding')
            # 检查是否存在
        elif os.path.isdir(os.path.abspath(os.path.expanduser(model_dir_or_name))):
            model_dir = os.path.abspath(os.path.expanduser(model_dir_or_name))
        else:
            raise ValueError(f"Cannot recognize {model_dir_or_name}.")
        
        self._word_sep_index = None
        if '[SEP]' in vocab:
            self._word_sep_index = vocab['[SEP]']
        
        self.model = _WordBertModel(model_dir=model_dir, vocab=vocab, layers=layers,
                                    pool_method=pool_method, include_cls_sep=include_cls_sep,
                                    pooled_cls=pooled_cls, auto_truncate=auto_truncate, min_freq=2)
        
        self.requires_grad = requires_grad
        self._embed_size = len(self.model.layers) * self.model.encoder.hidden_size
    
    def _delete_model_weights(self):
        del self.model
    
    def forward(self, words):
        """
        计算words的bert embedding表示。计算之前会在每句话的开始增加[CLS]在结束增加[SEP], 并根据include_cls_sep判断要不要
            删除这两个token的表示。

        :param torch.LongTensor words: [batch_size, max_len]
        :return: torch.FloatTensor. batch_size x max_len x (768*len(self.layers))
        """
        words = self.drop_word(words)
        outputs = self._get_sent_reprs(words)
        if outputs is not None:
            return self.dropout(outputs)
        outputs = self.model(words)
        outputs = torch.cat([*outputs], dim=-1)
        
        return self.dropout(outputs)
    
    def drop_word(self, words):
        """
        按照设定随机将words设置为unknown_index。

        :param torch.LongTensor words: batch_size x max_len
        :return:
        """
        if self.word_dropout > 0 and self.training:
            with torch.no_grad():
                if self._word_sep_index:  # 不能drop sep
                    sep_mask = words.eq(self._word_sep_index)
                mask = torch.full_like(words, fill_value=self.word_dropout, dtype=torch.float, device=words.device)
                mask = torch.bernoulli(mask).eq(1)  # dropout_word越大，越多位置为1
                pad_mask = words.ne(0)
                mask = pad_mask.__and__(mask)  # pad的位置不为unk
                words = words.masked_fill(mask, self._word_unk_index)
                if self._word_sep_index:
                    words.masked_fill_(sep_mask, self._word_sep_index)
        return words
    
    @property
    def requires_grad(self):
        """
        Embedding的参数是否允许优化。True: 所有参数运行优化; False: 所有参数不允许优化; None: 部分允许优化、部分不允许
        
        :return:
        """
        requires_grads = set([param.requires_grad for name, param in self.named_parameters()
                              if 'word_pieces_lengths' not in name])
        if len(requires_grads) == 1:
            return requires_grads.pop()
        else:
            return None
    
    @requires_grad.setter
    def requires_grad(self, value):
        for name, param in self.named_parameters():
            if 'word_pieces_lengths' in name:  # 这个不能加入到requires_grad中
                continue
            param.requires_grad = value


 class BertWordPieceEncoder(nn.Module):
    """
    读取bert模型，读取之后调用index_dataset方法在dataset中生成word_pieces这一列。

    :param str model_dir_or_name: 模型所在目录或者模型的名称。默认值为 ``en-base-uncased``
    :param str layers: 最终结果中的表示。以','隔开层数，可以以负数去索引倒数几层
    :param bool pooled_cls: 返回的句子开头的[CLS]是否使用预训练中的BertPool映射一下，仅在include_cls_sep时有效。如果下游任务只取
        [CLS]做预测，一般该值为True。
    :param float word_dropout: 以多大的概率将一个词替换为unk。这样既可以训练unk也是一定的regularize。
    :param float dropout: 以多大的概率对embedding的表示进行Dropout。0.1即随机将10%的值置为0。
    :param bool requires_grad: 是否需要gradient。
    """
    
    def __init__(self, model_dir_or_name: str = 'en-base-uncased', layers: str = '-1', pooled_cls: bool = False,
                 word_dropout=0, dropout=0, requires_grad: bool = False):
        super().__init__()
        
        if model_dir_or_name.lower() in PRETRAINED_BERT_MODEL_DIR:
            model_url = _get_embedding_url('bert', model_dir_or_name.lower())
            model_dir = cached_path(model_url, name='embedding')
            # 检查是否存在
        elif os.path.isdir(os.path.expanduser(os.path.abspath(model_dir_or_name))):
            model_dir = model_dir_or_name
        else:
            raise ValueError(f"Cannot recognize {model_dir_or_name}.")
        
        self.model = _WordPieceBertModel(model_dir=model_dir, layers=layers, pooled_cls=pooled_cls)
        self._sep_index = self.model._sep_index
        self._wordpiece_pad_index = self.model._wordpiece_pad_index
        self._wordpiece_unk_index = self.model._wordpiece_unknown_index
        self._embed_size = len(self.model.layers) * self.model.encoder.hidden_size
        self.requires_grad = requires_grad
        self.word_dropout = word_dropout
        self.dropout_layer = nn.Dropout(dropout)
    
    @property
    def requires_grad(self):
        """
        Embedding的参数是否允许优化。True: 所有参数运行优化; False: 所有参数不允许优化; None: 部分允许优化、部分不允许
        :return:
        """
        requires_grads = set([param.requires_grad for name, param in self.named_parameters()])
        if len(requires_grads) == 1:
            return requires_grads.pop()
        else:
            return None
    
    @requires_grad.setter
    def requires_grad(self, value):
        for name, param in self.named_parameters():
            param.requires_grad = value
    
    @property
    def embed_size(self):
        return self._embed_size
    
    @property
    def embedding_dim(self):
        return self._embed_size
    
    @property
    def num_embedding(self):
        return self.model.encoder.config.vocab_size
    
    def index_datasets(self, *datasets, field_name, add_cls_sep=True):
        """
        使用bert的tokenizer新生成word_pieces列加入到datasets中，并将他们设置为input,且将word_pieces这一列的pad value设置为了
        bert的pad value。

        :param ~fastNLP.DataSet datasets: DataSet对象
        :param str field_name: 基于哪一列的内容生成word_pieces列。这一列中每个数据应该是List[str]的形式。
        :param bool add_cls_sep: 如果首尾不是[CLS]与[SEP]会在首尾额外加入[CLS]与[SEP]。
        :return:
        """
        self.model.index_dataset(*datasets, field_name=field_name, add_cls_sep=add_cls_sep)
    
    def forward(self, word_pieces, token_type_ids=None):
        """
        计算words的bert embedding表示。传入的words中应该自行包含[CLS]与[SEP]的tag。

        :param words: batch_size x max_len
        :param token_type_ids: batch_size x max_len, 用于区分前一句和后一句话. 如果不传入，则自动生成(大部分情况，都不需要输入),
            第一个[SEP]及之前为0, 第二个[SEP]及到第一个[SEP]之间为1; 第三个[SEP]及到第二个[SEP]之间为0，依次往后推。
        :return: torch.FloatTensor. batch_size x max_len x (768*len(self.layers))
        """
        with torch.no_grad():
            sep_mask = word_pieces.eq(self._sep_index)  # batch_size x max_len
            if token_type_ids is None:
                sep_mask_cumsum = sep_mask.flip(dims=[-1]).cumsum(dim=-1).flip(dims=[-1])
                token_type_ids = sep_mask_cumsum.fmod(2)
                if token_type_ids[0, 0].item():  # 如果开头是奇数，则需要flip一下结果，因为需要保证开头为0
                    token_type_ids = token_type_ids.eq(0).long()
        
        word_pieces = self.drop_word(word_pieces)
        outputs = self.model(word_pieces, token_type_ids)
        outputs = torch.cat([*outputs], dim=-1)
        
        return self.dropout_layer(outputs)
    
    def drop_word(self, words):
        """
        按照设定随机将words设置为unknown_index。

        :param torch.LongTensor words: batch_size x max_len
        :return:
        """
        if self.word_dropout > 0 and self.training:
            with torch.no_grad():
                if self._word_sep_index:  # 不能drop sep
                    sep_mask = words.eq(self._wordpiece_unk_index)
                mask = torch.full_like(words, fill_value=self.word_dropout, dtype=torch.float, device=words.device)
                mask = torch.bernoulli(mask).eq(1)  # dropout_word越大，越多位置为1
                pad_mask = words.ne(self._wordpiece_pad_index)
                mask = pad_mask.__and__(mask)  # pad的位置不为unk
                words = words.masked_fill(mask, self._word_unk_index)
                if self._word_sep_index:
                    words.masked_fill_(sep_mask, self._wordpiece_unk_index)
        return words


 class _WordBertModel(nn.Module):
    def __init__(self, model_dir: str, vocab: Vocabulary, layers: str = '-1', pool_method: str = 'first',
                 include_cls_sep: bool = False, pooled_cls: bool = False, auto_truncate: bool = False, min_freq=2):
        super().__init__()
        
        self.tokenzier = BertTokenizer.from_pretrained(model_dir)
        self.encoder = BertModel.from_pretrained(model_dir)
        self._max_position_embeddings = self.encoder.config.max_position_embeddings
        #  检查encoder_layer_number是否合理
        encoder_layer_number = len(self.encoder.encoder.layer)
        self.layers = list(map(int, layers.split(',')))
        for layer in self.layers:
            if layer < 0:
                assert -layer <= encoder_layer_number, f"The layer index:{layer} is out of scope for " \
                                                       f"a bert model with {encoder_layer_number} layers."
            else:
                assert layer < encoder_layer_number, f"The layer index:{layer} is out of scope for " \
                                                     f"a bert model with {encoder_layer_number} layers."
        
        assert pool_method in ('avg', 'max', 'first', 'last')
        self.pool_method = pool_method
        self.include_cls_sep = include_cls_sep
        self.pooled_cls = pooled_cls
        self.auto_truncate = auto_truncate
        
        # 将所有vocab中word的wordpiece计算出来, 需要额外考虑[CLS]和[SEP]
        logger.info("Start to generating word pieces for word.")
        # 第一步统计出需要的word_piece, 然后创建新的embed和word_piece_vocab, 然后填入值
        word_piece_dict = {'[CLS]': 1, '[SEP]': 1}  # 用到的word_piece以及新增的
        found_count = 0
        self._has_sep_in_vocab = '[SEP]' in vocab  # 用来判断传入的数据是否需要生成token_ids
        if '[sep]' in vocab:
            warnings.warn("Lower cased [sep] detected, it cannot be correctly recognized as [SEP] by BertEmbedding.")
        if "[CLS]" in vocab:
            warnings.warn("[CLS] detected in your vocabulary. BertEmbedding will add [CSL] and [SEP] to the begin "
                          "and end of the input automatically, make sure you don't add [CLS] and [SEP] at the begin"
                          " and end.")
        for word, index in vocab:
            if index == vocab.padding_idx:  # pad是个特殊的符号
                word = '[PAD]'
            elif index == vocab.unknown_idx:
                word = '[UNK]'
            word_pieces = self.tokenzier.wordpiece_tokenizer.tokenize(word)
            if len(word_pieces) == 1:
                if not vocab._is_word_no_create_entry(word):  # 如果是train中的值, 但是却没有找到
                    if index != vocab.unknown_idx and word_pieces[0] == '[UNK]':  # 说明这个词不在原始的word里面
                        if vocab.word_count[word] >= min_freq and not vocab._is_word_no_create_entry(
                                word):  # 出现次数大于这个次数才新增
                            word_piece_dict[word] = 1  # 新增一个值
                        continue
            for word_piece in word_pieces:
                word_piece_dict[word_piece] = 1
            found_count += 1
        original_embed = self.encoder.embeddings.word_embeddings.weight.data
        # 特殊词汇要特殊处理
        embed = nn.Embedding(len(word_piece_dict), original_embed.size(1))  # 新的embed
        new_word_piece_vocab = collections.OrderedDict()
        for index, token in enumerate(['[PAD]', '[UNK]']):
            word_piece_dict.pop(token, None)
            embed.weight.data[index] = original_embed[self.tokenzier.vocab[token]]
            new_word_piece_vocab[token] = index
        for token in word_piece_dict.keys():
            if token in self.tokenzier.vocab:
                embed.weight.data[len(new_word_piece_vocab)] = original_embed[self.tokenzier.vocab[token]]
            else:
                embed.weight.data[len(new_word_piece_vocab)] = original_embed[self.tokenzier.vocab['[UNK]']]
            new_word_piece_vocab[token] = len(new_word_piece_vocab)
        self.tokenzier._reinit_on_new_vocab(new_word_piece_vocab)
        self.encoder.embeddings.word_embeddings = embed
        
        word_to_wordpieces = []
        word_pieces_lengths = []
        for word, index in vocab:
            if index == vocab.padding_idx:  # pad是个特殊的符号
                word = '[PAD]'
            elif index == vocab.unknown_idx:
                word = '[UNK]'
            word_pieces = self.tokenzier.wordpiece_tokenizer.tokenize(word)
            word_pieces = self.tokenzier.convert_tokens_to_ids(word_pieces)
            word_to_wordpieces.append(word_pieces)
            word_pieces_lengths.append(len(word_pieces))
        self._cls_index = self.tokenzier.vocab['[CLS]']
        self._sep_index = self.tokenzier.vocab['[SEP]']
        self._word_pad_index = vocab.padding_idx
        self._wordpiece_pad_index = self.tokenzier.vocab['[PAD]']  # 需要用于生成word_piece
        logger.info("Found(Or segment into word pieces) {} words out of {}.".format(found_count, len(vocab)))
        self.word_to_wordpieces = np.array(word_to_wordpieces)
        self.word_pieces_lengths = nn.Parameter(torch.LongTensor(word_pieces_lengths), requires_grad=False)
        logger.debug("Successfully generate word pieces.")
    
    def forward(self, words):
        """

        :param words: torch.LongTensor, batch_size x max_len
        :return: num_layers x batch_size x max_len x hidden_size或者num_layers x batch_size x (max_len+2) x hidden_size
        """
        with torch.no_grad():
            batch_size, max_word_len = words.size()
            word_mask = words.ne(self._word_pad_index)  # 为1的地方有word
            seq_len = word_mask.sum(dim=-1)
            batch_word_pieces_length = self.word_pieces_lengths[words].masked_fill(word_mask.eq(0),
                                                                                   0)  # batch_size x max_len
            word_pieces_lengths = batch_word_pieces_length.sum(dim=-1)  # batch_size
            word_piece_length = batch_word_pieces_length.sum(dim=-1).max().item()  # 表示word piece的长度(包括padding)
            if word_piece_length + 2 > self._max_position_embeddings:
                if self.auto_truncate:
                    word_pieces_lengths = word_pieces_lengths.masked_fill(
                        word_pieces_lengths + 2 > self._max_position_embeddings,
                        self._max_position_embeddings - 2)
                else:
                    raise RuntimeError(
                        "After split words into word pieces, the lengths of word pieces are longer than the "
                        f"maximum allowed sequence length:{self._max_position_embeddings} of bert.")
            
            # +2是由于需要加入[CLS]与[SEP]
            word_pieces = words.new_full((batch_size, min(word_piece_length + 2, self._max_position_embeddings)),
                                         fill_value=self._wordpiece_pad_index)
            attn_masks = torch.zeros_like(word_pieces)
            # 1. 获取words的word_pieces的id，以及对应的span范围
            word_indexes = words.cpu().numpy()
            for i in range(batch_size):
                word_pieces_i = list(chain(*self.word_to_wordpieces[word_indexes[i, :seq_len[i]]]))
                if self.auto_truncate and len(word_pieces_i) > self._max_position_embeddings - 2:
                    word_pieces_i = word_pieces_i[:self._max_position_embeddings - 2]
                word_pieces[i, 1:word_pieces_lengths[i] + 1] = torch.LongTensor(word_pieces_i)
                attn_masks[i, :word_pieces_lengths[i] + 2].fill_(1)
            # 添加[cls]和[sep]
            word_pieces[:, 0].fill_(self._cls_index)
            batch_indexes = torch.arange(batch_size).to(words)
            word_pieces[batch_indexes, word_pieces_lengths + 1] = self._sep_index
            if self._has_sep_in_vocab:  # 但[SEP]在vocab中出现应该才会需要token_ids
                sep_mask = word_pieces.eq(self._sep_index)  # batch_size x max_len
                sep_mask_cumsum = sep_mask.flip(dims=[-1]).cumsum(dim=-1).flip(dims=[-1])
                token_type_ids = sep_mask_cumsum.fmod(2)
                if token_type_ids[0, 0].item():  # 如果开头是奇数，则需要flip一下结果，因为需要保证开头为0
                    token_type_ids = token_type_ids.eq(0).long()
            else:
                token_type_ids = torch.zeros_like(word_pieces)
        # 2. 获取hidden的结果，根据word_pieces进行对应的pool计算
        # all_outputs: [batch_size x max_len x hidden_size, batch_size x max_len x hidden_size, ...]
        bert_outputs, pooled_cls = self.encoder(word_pieces, token_type_ids=token_type_ids, attention_mask=attn_masks,
                                                output_all_encoded_layers=True)
        # output_layers = [self.layers]  # len(self.layers) x batch_size x real_word_piece_length x hidden_size
        
        if self.include_cls_sep:
            outputs = bert_outputs[-1].new_zeros(len(self.layers), batch_size, max_word_len + 2,
                                                 bert_outputs[-1].size(-1))
            s_shift = 1
        else:
            outputs = bert_outputs[-1].new_zeros(len(self.layers), batch_size, max_word_len,
                                                 bert_outputs[-1].size(-1))
            s_shift = 0
        batch_word_pieces_cum_length = batch_word_pieces_length.new_zeros(batch_size, max_word_len + 1)
        batch_word_pieces_cum_length[:, 1:] = batch_word_pieces_length.cumsum(dim=-1)  # batch_size x max_len
        for l_index, l in enumerate(self.layers):
            output_layer = bert_outputs[l]
            real_word_piece_length = output_layer.size(1) - 2
            if word_piece_length > real_word_piece_length:  # 如果实际上是截取出来的
                paddings = output_layer.new_zeros(batch_size,
                                                  word_piece_length - real_word_piece_length,
                                                  output_layer.size(2))
                output_layer = torch.cat((output_layer, paddings), dim=1).contiguous()
            # 从word_piece collapse到word的表示
            truncate_output_layer = output_layer[:, 1:-1]  # 删除[CLS]与[SEP] batch_size x len x hidden_size
            outputs_seq_len = seq_len + s_shift
            if self.pool_method == 'first':
                for i in range(batch_size):
                    i_word_pieces_cum_length = batch_word_pieces_cum_length[i, :seq_len[i]]  # 每个word的start位置
                    outputs[l_index, i, s_shift:outputs_seq_len[i]] = truncate_output_layer[
                        i, i_word_pieces_cum_length]  # num_layer x batch_size x len x hidden_size
            elif self.pool_method == 'last':
                for i in range(batch_size):
                    i_word_pieces_cum_length = batch_word_pieces_cum_length[i, 1:seq_len[i] + 1] - 1  # 每个word的end
                    outputs[l_index, i, s_shift:outputs_seq_len[i]] = truncate_output_layer[i, i_word_pieces_cum_length]
            elif self.pool_method == 'max':
                for i in range(batch_size):
                    for j in range(seq_len[i]):
                        start, end = batch_word_pieces_cum_length[i, j], batch_word_pieces_cum_length[i, j + 1]
                        outputs[l_index, i, j + s_shift], _ = torch.max(truncate_output_layer[i, start:end], dim=-2)
            else:
                for i in range(batch_size):
                    for j in range(seq_len[i]):
                        start, end = batch_word_pieces_cum_length[i, j], batch_word_pieces_cum_length[i, j + 1]
                        outputs[l_index, i, j + s_shift] = torch.mean(truncate_output_layer[i, start:end], dim=-2)
            if self.include_cls_sep:
                if l in (len(bert_outputs) - 1, -1) and self.pooled_cls:
                    outputs[l_index, :, 0] = pooled_cls
                else:
                    outputs[l_index, :, 0] = output_layer[:, 0]
                outputs[l_index, batch_indexes, seq_len + s_shift] = output_layer[batch_indexes, seq_len + s_shift]
        # 3. 最终的embedding结果
        return outputs
--- a/fastNLP/embeddings/char_embedding.py
+++ b/fastNLP/embeddings/char_embedding.py
@@ -0,0 +1,325 @@
 """
 该文件中主要包含的是character的Embedding，包括基于CNN与LSTM的character Embedding。与其它Embedding一样，这里的Embedding输入也是
 词的index而不需要使用词语中的char的index来获取表达。
 """

 __all__ = [
    "CNNCharEmbedding",
    "LSTMCharEmbedding"
 ]

 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 from typing import List

 from .static_embedding import StaticEmbedding
 from ..modules.encoder.lstm import LSTM
 from ..core.vocabulary import Vocabulary
 from .embedding import TokenEmbedding
 from .utils import _construct_char_vocab_from_vocab
 from .utils import get_embeddings
 from ..core import logger


 class CNNCharEmbedding(TokenEmbedding):
    """
    别名：:class:`fastNLP.embeddings.CNNCharEmbedding`   :class:`fastNLP.embeddings.char_embedding.CNNCharEmbedding`

    使用CNN生成character embedding。CNN的结构为, embed(x) -> Dropout(x) -> CNN(x) -> activation(x) -> pool -> fc -> Dropout.
    不同的kernel大小的fitler结果是concat起来然后通过一层fully connected layer, 然后输出word的表示。

    Example::

        >>> import torch
        >>> from fastNLP import Vocabulary
        >>> from fastNLP.embeddings import CNNCharEmbedding
        >>> vocab = Vocabulary().add_word_lst("The whether is good .".split())
        >>> embed = CNNCharEmbedding(vocab, embed_size=50)
        >>> words = torch.LongTensor([[vocab.to_index(word) for word in "The whether is good .".split()]])
        >>> outputs = embed(words)
        >>> outputs.size()
        >>> # torch.Size([1, 5，50])

    :param vocab: 词表
    :param embed_size: 该CNNCharEmbedding的输出维度大小，默认值为50.
    :param char_emb_size: character的embed的维度。character是从vocab中生成的。默认值为50.
    :param float word_dropout: 以多大的概率将一个词替换为unk。这样既可以训练unk也是一定的regularize。
    :param float dropout: 以多大的概率drop分布式表示与char embedding的输出。
    :param filter_nums: filter的数量. 长度需要和kernels一致。默认值为[40, 30, 20].
    :param kernel_sizes: kernel的大小. 默认值为[5, 3, 1].
    :param pool_method: character的表示在合成一个表示时所使用的pool方法，支持'avg', 'max'.
    :param activation: CNN之后使用的激活方法，支持'relu', 'sigmoid', 'tanh' 或者自定义函数.
    :param min_char_freq: character的最少出现次数。默认值为2.
    :param pre_train_char_embed: 可以有两种方式调用预训练好的character embedding：第一种是传入embedding文件夹
        (文件夹下应该只有一个以.txt作为后缀的文件)或文件路径；第二种是传入embedding的名称，第二种情况将自动查看缓存中是否存在该模型，
        没有的话将自动下载。如果输入为None则使用embedding_dim的维度随机初始化一个embedding.
    """
    
    def __init__(self, vocab: Vocabulary, embed_size: int = 50, char_emb_size: int = 50, word_dropout: float = 0,
                 dropout: float = 0, filter_nums: List[int] = (40, 30, 20), kernel_sizes: List[int] = (5, 3, 1),
                 pool_method: str = 'max', activation='relu', min_char_freq: int = 2, pre_train_char_embed: str = None):
        super(CNNCharEmbedding, self).__init__(vocab, word_dropout=word_dropout, dropout=dropout)
        
        for kernel in kernel_sizes:
            assert kernel % 2 == 1, "Only odd kernel is allowed."
        
        assert pool_method in ('max', 'avg')
        self.pool_method = pool_method
        # activation function
        if isinstance(activation, str):
            if activation.lower() == 'relu':
                self.activation = F.relu
            elif activation.lower() == 'sigmoid':
                self.activation = F.sigmoid
            elif activation.lower() == 'tanh':
                self.activation = F.tanh
        elif activation is None:
            self.activation = lambda x: x
        elif callable(activation):
            self.activation = activation
        else:
            raise Exception(
                "Undefined activation function: choose from: [relu, tanh, sigmoid, or a callable function]")
        
        logger.info("Start constructing character vocabulary.")
        # 建立char的词表
        self.char_vocab = _construct_char_vocab_from_vocab(vocab, min_freq=min_char_freq)
        self.char_pad_index = self.char_vocab.padding_idx
        logger.info(f"In total, there are {len(self.char_vocab)} distinct characters.")
        # 对vocab进行index
        max_word_len = max(map(lambda x: len(x[0]), vocab))
        self.words_to_chars_embedding = nn.Parameter(torch.full((len(vocab), max_word_len),
                                                                fill_value=self.char_pad_index, dtype=torch.long),
                                                     requires_grad=False)
        self.word_lengths = nn.Parameter(torch.zeros(len(vocab)).long(), requires_grad=False)
        for word, index in vocab:
            # if index!=vocab.padding_idx:  # 如果是pad的话，直接就为pad_value了。修改为不区分pad, 这样所有的<pad>也是同一个embed
            self.words_to_chars_embedding[index, :len(word)] = \
                torch.LongTensor([self.char_vocab.to_index(c) for c in word])
            self.word_lengths[index] = len(word)
        # self.char_embedding = nn.Embedding(len(self.char_vocab), char_emb_size)
        if pre_train_char_embed:
            self.char_embedding = StaticEmbedding(self.char_vocab, model_dir_or_name=pre_train_char_embed)
        else:
            self.char_embedding = get_embeddings((len(self.char_vocab), char_emb_size))
        
        self.convs = nn.ModuleList([nn.Conv1d(
            char_emb_size, filter_nums[i], kernel_size=kernel_sizes[i], bias=True, padding=kernel_sizes[i] // 2)
            for i in range(len(kernel_sizes))])
        self._embed_size = embed_size
        self.fc = nn.Linear(sum(filter_nums), embed_size)
        self.reset_parameters()
    
    def forward(self, words):
        """
        输入words的index后，生成对应的words的表示。

        :param words: [batch_size, max_len]
        :return: [batch_size, max_len, embed_size]
        """
        words = self.drop_word(words)
        batch_size, max_len = words.size()
        chars = self.words_to_chars_embedding[words]  # batch_size x max_len x max_word_len
        word_lengths = self.word_lengths[words]  # batch_size x max_len
        max_word_len = word_lengths.max()
        chars = chars[:, :, :max_word_len]
        # 为1的地方为mask
        chars_masks = chars.eq(self.char_pad_index)  # batch_size x max_len x max_word_len 如果为0, 说明是padding的位置了
        chars = self.char_embedding(chars)  # batch_size x max_len x max_word_len x embed_size
        chars = self.dropout(chars)
        reshaped_chars = chars.reshape(batch_size * max_len, max_word_len, -1)
        reshaped_chars = reshaped_chars.transpose(1, 2)  # B' x E x M
        conv_chars = [conv(reshaped_chars).transpose(1, 2).reshape(batch_size, max_len, max_word_len, -1)
                      for conv in self.convs]
        conv_chars = torch.cat(conv_chars, dim=-1).contiguous()  # B x max_len x max_word_len x sum(filters)
        conv_chars = self.activation(conv_chars)
        if self.pool_method == 'max':
            conv_chars = conv_chars.masked_fill(chars_masks.unsqueeze(-1), float('-inf'))
            chars, _ = torch.max(conv_chars, dim=-2)  # batch_size x max_len x sum(filters)
        else:
            conv_chars = conv_chars.masked_fill(chars_masks.unsqueeze(-1), 0)
            chars = torch.sum(conv_chars, dim=-2) / chars_masks.eq(0).sum(dim=-1, keepdim=True).float()
        chars = self.fc(chars)
        return self.dropout(chars)
    
    @property
    def requires_grad(self):
        """
        Embedding的参数是否允许优化。True: 所有参数运行优化; False: 所有参数不允许优化; None: 部分允许优化、部分不允许
        :return:
        """
        params = []
        for name, param in self.named_parameters():
            if 'words_to_chars_embedding' not in name and 'word_lengths' not in name:
                params.append(param.requires_grad)
        requires_grads = set(params)
        if len(requires_grads) == 1:
            return requires_grads.pop()
        else:
            return None
    
    @requires_grad.setter
    def requires_grad(self, value):
        for name, param in self.named_parameters():
            if 'words_to_chars_embedding' in name or 'word_lengths' in name:  # 这个不能加入到requires_grad中
                continue
            param.requires_grad = value
    
    def reset_parameters(self):
        for name, param in self.named_parameters():
            if 'words_to_chars_embedding' in name or 'word_lengths' in name:  # 这个不能reset
                continue
            if 'char_embedding' in name:
                continue
            if param.data.dim() > 1:
                nn.init.xavier_uniform_(param, 1)
            else:
                nn.init.uniform_(param, -1, 1)


 class LSTMCharEmbedding(TokenEmbedding):
    """
    别名：:class:`fastNLP.embeddings.LSTMCharEmbedding`   :class:`fastNLP.embeddings.char_embedding.LSTMCharEmbedding`

    使用LSTM的方式对character进行encode. embed(x) -> Dropout(x) -> LSTM(x) -> activation(x) -> pool -> Dropout

    Example::

        >>> import torch
        >>> from fastNLP import Vocabulary
        >>> from fastNLP.embeddings import LSTMCharEmbedding
        >>> vocab = Vocabulary().add_word_lst("The whether is good .".split())
        >>> embed = LSTMCharEmbedding(vocab, embed_size=50)
        >>> words = torch.LongTensor([[vocab.to_index(word) for word in "The whether is good .".split()]])
        >>> outputs = embed(words)
        >>> outputs.size()
        >>> # torch.Size([1, 5，50])

    :param vocab: 词表
    :param embed_size: LSTMCharEmbedding的输出维度。默认值为50.
    :param char_emb_size: character的embedding的维度。默认值为50.
    :param float word_dropout: 以多大的概率将一个词替换为unk。这样既可以训练unk也是一定的regularize。
    :param dropout: 以多大概率drop character embedding的输出以及最终的word的输出。
    :param hidden_size: LSTM的中间hidden的大小，如果为bidirectional的，hidden会除二，默认为50.
    :param pool_method: 支持'max', 'avg'。
    :param activation: 激活函数，支持'relu', 'sigmoid', 'tanh', 或者自定义函数.
    :param min_char_freq: character的最小出现次数。默认值为2.
    :param bidirectional: 是否使用双向的LSTM进行encode。默认值为True。
    :param pre_train_char_embed: 可以有两种方式调用预训练好的character embedding：第一种是传入embedding文件夹
        (文件夹下应该只有一个以.txt作为后缀的文件)或文件路径；第二种是传入embedding的名称，第二种情况将自动查看缓存中是否存在该模型，
        没有的话将自动下载。如果输入为None则使用embedding_dim的维度随机初始化一个embedding.
    """
    
    def __init__(self, vocab: Vocabulary, embed_size: int = 50, char_emb_size: int = 50, word_dropout: float = 0,
                 dropout: float = 0, hidden_size=50, pool_method: str = 'max', activation='relu',
                 min_char_freq: int = 2,
                 bidirectional=True, pre_train_char_embed: str = None):
        super(LSTMCharEmbedding, self).__init__(vocab, word_dropout=word_dropout, dropout=dropout)
        
        assert hidden_size % 2 == 0, "Only even kernel is allowed."
        
        assert pool_method in ('max', 'avg')
        self.pool_method = pool_method
        # activation function
        if isinstance(activation, str):
            if activation.lower() == 'relu':
                self.activation = F.relu
            elif activation.lower() == 'sigmoid':
                self.activation = F.sigmoid
            elif activation.lower() == 'tanh':
                self.activation = F.tanh
        elif activation is None:
            self.activation = lambda x: x
        elif callable(activation):
            self.activation = activation
        else:
            raise Exception(
                "Undefined activation function: choose from: [relu, tanh, sigmoid, or a callable function]")
        
        logger.info("Start constructing character vocabulary.")
        # 建立char的词表
        self.char_vocab = _construct_char_vocab_from_vocab(vocab, min_freq=min_char_freq)
        self.char_pad_index = self.char_vocab.padding_idx
        logger.info(f"In total, there are {len(self.char_vocab)} distinct characters.")
        # 对vocab进行index
        self.max_word_len = max(map(lambda x: len(x[0]), vocab))
        self.words_to_chars_embedding = nn.Parameter(torch.full((len(vocab), self.max_word_len),
                                                                fill_value=self.char_pad_index, dtype=torch.long),
                                                     requires_grad=False)
        self.word_lengths = nn.Parameter(torch.zeros(len(vocab)).long(), requires_grad=False)
        for word, index in vocab:
            # if index!=vocab.padding_idx:  # 如果是pad的话，直接就为pad_value了. 修改为不区分pad与否
            self.words_to_chars_embedding[index, :len(word)] = \
                torch.LongTensor([self.char_vocab.to_index(c) for c in word])
            self.word_lengths[index] = len(word)
        # self.char_embedding = nn.Embedding(len(self.char_vocab), char_emb_size)
        if pre_train_char_embed:
            self.char_embedding = StaticEmbedding(self.char_vocab, pre_train_char_embed)
        else:
            self.char_embedding = nn.Embedding(len(self.char_vocab), char_emb_size)
        
        self.fc = nn.Linear(hidden_size, embed_size)
        hidden_size = hidden_size // 2 if bidirectional else hidden_size
        
        self.lstm = LSTM(char_emb_size, hidden_size, bidirectional=bidirectional, batch_first=True)
        self._embed_size = embed_size
        self.bidirectional = bidirectional
    
    def forward(self, words):
        """
        输入words的index后，生成对应的words的表示。

        :param words: [batch_size, max_len]
        :return: [batch_size, max_len, embed_size]
        """
        words = self.drop_word(words)
        batch_size, max_len = words.size()
        chars = self.words_to_chars_embedding[words]  # batch_size x max_len x max_word_len
        word_lengths = self.word_lengths[words]  # batch_size x max_len
        max_word_len = word_lengths.max()
        chars = chars[:, :, :max_word_len]
        # 为mask的地方为1
        chars_masks = chars.eq(self.char_pad_index)  # batch_size x max_len x max_word_len 如果为0, 说明是padding的位置了
        chars = self.char_embedding(chars)  # batch_size x max_len x max_word_len x embed_size
        chars = self.dropout(chars)
        reshaped_chars = chars.reshape(batch_size * max_len, max_word_len, -1)
        char_seq_len = chars_masks.eq(0).sum(dim=-1).reshape(batch_size * max_len)
        lstm_chars = self.lstm(reshaped_chars, char_seq_len)[0].reshape(batch_size, max_len, max_word_len, -1)
        # B x M x M x H
        
        lstm_chars = self.activation(lstm_chars)
        if self.pool_method == 'max':
            lstm_chars = lstm_chars.masked_fill(chars_masks.unsqueeze(-1), float('-inf'))
            chars, _ = torch.max(lstm_chars, dim=-2)  # batch_size x max_len x H
        else:
            lstm_chars = lstm_chars.masked_fill(chars_masks.unsqueeze(-1), 0)
            chars = torch.sum(lstm_chars, dim=-2) / chars_masks.eq(0).sum(dim=-1, keepdim=True).float()
        
        chars = self.fc(chars)
        
        return self.dropout(chars)
    
    @property
    def requires_grad(self):
        """
        Embedding的参数是否允许优化。True: 所有参数运行优化; False: 所有参数不允许优化; None: 部分允许优化、部分不允许
        
        :return:
        """
        params = []
        for name, param in self.named_parameters():
            if 'words_to_chars_embedding' not in name and 'word_lengths' not in name:
                params.append(param)
        requires_grads = set(params)
        if len(requires_grads) == 1:
            return requires_grads.pop()
        else:
            return None
    
    @requires_grad.setter
    def requires_grad(self, value):
        for name, param in self.named_parameters():
            if 'words_to_chars_embedding' in name or 'word_lengths' in name:  # 这个不能加入到requires_grad中
                continue
            param.requires_grad = value
--- a/fastNLP/embeddings/contextual_embedding.py
+++ b/fastNLP/embeddings/contextual_embedding.py
@@ -0,0 +1,110 @@
 """
 .. todo::
    doc
 """

 __all__ = [
    "ContextualEmbedding"
 ]

 from abc import abstractmethod

 import torch

 from .embedding import TokenEmbedding
 from ..core import logger
 from ..core.batch import DataSetIter
 from ..core.dataset import DataSet
 from ..core.sampler import SequentialSampler
 from ..core.utils import _move_model_to_device, _get_model_device
 from ..core.vocabulary import Vocabulary


 class ContextualEmbedding(TokenEmbedding):
    def __init__(self, vocab: Vocabulary, word_dropout: float = 0.0, dropout: float = 0.0):
        super(ContextualEmbedding, self).__init__(vocab, word_dropout=word_dropout, dropout=dropout)
    
    def add_sentence_cache(self, *datasets, batch_size=32, device='cpu', delete_weights: bool = True):
        """
        由于动态embedding生成比较耗时，所以可以把每句话embedding缓存下来，这样就不需要每次都运行生成过程。

        :param datasets: DataSet对象
        :param batch_size: int, 生成cache的sentence表示时使用的batch的大小
        :param device: 参考 :class::fastNLP.Trainer 的device
        :param delete_weights: 似乎在生成了cache之后删除权重，在不需要finetune动态模型的情况下，删除权重会大量减少内存占用。
        :return:
        """
        for index, dataset in enumerate(datasets):
            try:
                assert isinstance(dataset, DataSet), "Only fastNLP.DataSet object is allowed."
                assert 'words' in dataset.get_input_name(), "`words` field has to be set as input."
            except Exception as e:
                logger.error(f"Exception happens at {index} dataset.")
                raise e
        
        sent_embeds = {}
        _move_model_to_device(self, device=device)
        device = _get_model_device(self)
        pad_index = self._word_vocab.padding_idx
        logger.info("Start to calculate sentence representations.")
        with torch.no_grad():
            for index, dataset in enumerate(datasets):
                try:
                    batch = DataSetIter(dataset, batch_size=batch_size, sampler=SequentialSampler())
                    for batch_x, batch_y in batch:
                        words = batch_x['words'].to(device)
                        words_list = words.tolist()
                        seq_len = words.ne(pad_index).sum(dim=-1)
                        max_len = words.size(1)
                        # 因为有些情况可能包含CLS, SEP, 从后面往前计算比较安全。
                        seq_len_from_behind = (max_len - seq_len).tolist()
                        word_embeds = self(words).detach().cpu().numpy()
                        for b in range(words.size(0)):
                            length = seq_len_from_behind[b]
                            if length == 0:
                                sent_embeds[tuple(words_list[b][:seq_len[b]])] = word_embeds[b]
                            else:
                                sent_embeds[tuple(words_list[b][:seq_len[b]])] = word_embeds[b, :-length]
                except Exception as e:
                    logger.error(f"Exception happens at {index} dataset.")
                    raise e
        logger.info("Finish calculating sentence representations.")
        self.sent_embeds = sent_embeds
        if delete_weights:
            self._delete_model_weights()
    
    def _get_sent_reprs(self, words):
        """
        获取sentence的表示，如果有缓存，则返回缓存的值; 没有缓存则返回None

        :param words: torch.LongTensor
        :return:
        """
        if hasattr(self, 'sent_embeds'):
            words_list = words.tolist()
            seq_len = words.ne(self._word_pad_index).sum(dim=-1)
            _embeds = []
            for b in range(len(words)):
                words_i = tuple(words_list[b][:seq_len[b]])
                embed = self.sent_embeds[words_i]
                _embeds.append(embed)
            max_sent_len = max(map(len, _embeds))
            embeds = words.new_zeros(len(_embeds), max_sent_len, self.embed_size, dtype=torch.float,
                                     device=words.device)
            for i, embed in enumerate(_embeds):
                embeds[i, :len(embed)] = torch.FloatTensor(embed).to(words.device)
            return embeds
        return None
    
    @abstractmethod
    def _delete_model_weights(self):
        """删除计算表示的模型以节省资源"""
        raise NotImplementedError
    
    def remove_sentence_cache(self):
        """
        删除缓存的句子表示. 删除之后如果模型权重没有被删除，将开始使用动态计算权重。

        :return:
        """
        del self.sent_embeds
--- a/fastNLP/embeddings/elmo_embedding.py
+++ b/fastNLP/embeddings/elmo_embedding.py
@@ -0,0 +1,345 @@
 """
 .. todo::
    doc
 """

 __all__ = [
    "ElmoEmbedding"
 ]

 import os
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 import json
 import codecs

 from ..core.vocabulary import Vocabulary
 from ..io.file_utils import cached_path, _get_embedding_url, PRETRAINED_ELMO_MODEL_DIR
 from ..modules.encoder._elmo import ElmobiLm, ConvTokenEmbedder
 from .contextual_embedding import ContextualEmbedding
 from ..core import logger

 class ElmoEmbedding(ContextualEmbedding):
    """
    别名：:class:`fastNLP.embeddings.ElmoEmbedding`   :class:`fastNLP.embeddings.elmo_embedding.ElmoEmbedding`

    使用ELMo的embedding。初始化之后，只需要传入words就可以得到对应的embedding。当前支持的使用名称初始化的模型有以下的这些(待补充)

    Example::
    
        >>> import torch
        >>> from fastNLP import Vocabulary
        >>> from fastNLP.embeddings import ElmoEmbedding
        >>> vocab = Vocabulary().add_word_lst("The whether is good .".split())
        >>> # 使用不同层的concat的结果
        >>> embed = ElmoEmbedding(vocab, model_dir_or_name='en', layers='1,2', requires_grad=False)
        >>> words = torch.LongTensor([[vocab.to_index(word) for word in "The whether is good .".split()]])
        >>> outputs = embed(words)
        >>> outputs.size()
        >>> # torch.Size([1, 5, 2048])

        >>> # 使用不同层的weighted sum。
        >>> embed = ElmoEmbedding(vocab, model_dir_or_name='en', layers='mix', requires_grad=False)
        >>> embed.set_mix_weights_requires_grad()  # 使得weighted的权重是可以学习的，但ELMO的LSTM部分是不更新

    :param vocab: 词表
    :param model_dir_or_name: 可以有两种方式调用预训练好的ELMo embedding：第一种是传入ELMo所在文件夹，该文件夹下面应该有两个文件，
        其中一个是以json为后缀的配置文件，另一个是以pkl为后缀的权重文件；第二种是传入ELMo版本的名称，将自动查看缓存中是否存在该模型，
        没有的话将自动下载并缓存。
    :param layers: str, 指定返回的层数(从0开始), 以,隔开不同的层。如果要返回第二层的结果'2', 返回后两层的结果'1,2'。不同的层的结果
        按照这个顺序concat起来，默认为'2'。'mix'会使用可学习的权重结合不同层的表示(权重是否可训练与requires_grad保持一致，
        初始化权重对三层结果进行mean-pooling, 可以通过ElmoEmbedding.set_mix_weights_requires_grad()方法只将mix weights设置为可学习。)
    :param requires_grad: bool, 该层是否需要gradient, 默认为False.
    :param float word_dropout: 以多大的概率将一个词替换为unk。这样既可以训练unk也是一定的regularize。
    :param float dropout: 以多大的概率对embedding的表示进行Dropout。0.1即随机将10%的值置为0。
    :param cache_word_reprs: 可以选择对word的表示进行cache; 设置为True的话，将在初始化的时候为每个word生成对应的embedding，
        并删除character encoder，之后将直接使用cache的embedding。默认为False。
    """
    
    def __init__(self, vocab: Vocabulary, model_dir_or_name: str = 'en', layers: str = '2', requires_grad: bool = False,
                 word_dropout=0.0, dropout=0.0, cache_word_reprs: bool = False):
        super(ElmoEmbedding, self).__init__(vocab, word_dropout=word_dropout, dropout=dropout)
        
        # 根据model_dir_or_name检查是否存在并下载
        if model_dir_or_name.lower() in PRETRAINED_ELMO_MODEL_DIR:
            model_url = _get_embedding_url('elmo', model_dir_or_name.lower())
            model_dir = cached_path(model_url, name='embedding')
            # 检查是否存在
        elif os.path.isdir(os.path.abspath(os.path.expanduser(model_dir_or_name))):
            model_dir = model_dir_or_name
        else:
            raise ValueError(f"Cannot recognize {model_dir_or_name}.")
        self.model = _ElmoModel(model_dir, vocab, cache_word_reprs=cache_word_reprs)
        
        if layers == 'mix':
            self.layer_weights = nn.Parameter(torch.zeros(self.model.config['lstm']['n_layers'] + 1),
                                              requires_grad=requires_grad)
            self.gamma = nn.Parameter(torch.ones(1), requires_grad=requires_grad)
            self._get_outputs = self._get_mixed_outputs
            self._embed_size = self.model.config['lstm']['projection_dim'] * 2
        else:
            layers = list(map(int, layers.split(',')))
            assert len(layers) > 0, "Must choose one output"
            for layer in layers:
                assert 0 <= layer <= 2, "Layer index should be in range [0, 2]."
            self.layers = layers
            self._get_outputs = self._get_layer_outputs
            self._embed_size = len(self.layers) * self.model.config['lstm']['projection_dim'] * 2
        
        self.requires_grad = requires_grad
    
    def _get_mixed_outputs(self, outputs):
        # outputs: num_layers x batch_size x max_len x hidden_size
        # return: batch_size x max_len x hidden_size
        weights = F.softmax(self.layer_weights + 1 / len(outputs), dim=0).to(outputs)
        outputs = torch.einsum('l,lbij->bij', weights, outputs)
        return self.gamma.to(outputs) * outputs
    
    def set_mix_weights_requires_grad(self, flag=True):
        """
        当初始化ElmoEmbedding时layers被设置为mix时，可以通过调用该方法设置mix weights是否可训练。如果layers不是mix，调用
        该方法没有用。
        
        :param bool flag: 混合不同层表示的结果是否可以训练。
        :return:
        """
        if hasattr(self, 'layer_weights'):
            self.layer_weights.requires_grad = flag
            self.gamma.requires_grad = flag
    
    def _get_layer_outputs(self, outputs):
        if len(self.layers) == 1:
            outputs = outputs[self.layers[0]]
        else:
            outputs = torch.cat(tuple([*outputs[self.layers]]), dim=-1)
        
        return outputs
    
    def forward(self, words: torch.LongTensor):
        """
        计算words的elmo embedding表示。根据elmo文章中介绍的ELMO实际上是有2L+1层结果，但是为了让结果比较容易拆分，token的
        被重复了一次，使得实际上layer=0的结果是[token_embedding;token_embedding], 而layer=1的结果是[forward_hiddens;
        backward_hiddens].

        :param words: batch_size x max_len
        :return: torch.FloatTensor. batch_size x max_len x (512*len(self.layers))
        """
        words = self.drop_word(words)
        outputs = self._get_sent_reprs(words)
        if outputs is not None:
            return self.dropout(outputs)
        outputs = self.model(words)
        outputs = self._get_outputs(outputs)
        return self.dropout(outputs)
    
    def _delete_model_weights(self):
        for name in ['layers', 'model', 'layer_weights', 'gamma']:
            if hasattr(self, name):
                delattr(self, name)
    
    @property
    def requires_grad(self):
        """
        Embedding的参数是否允许优化。True: 所有参数运行优化; False: 所有参数不允许优化; None: 部分允许优化、部分不允许

        :return:
        """
        requires_grads = set([param.requires_grad for name, param in self.named_parameters()
                              if 'words_to_chars_embedding' not in name and 'words_to_words' not in name])
        if len(requires_grads) == 1:
            return requires_grads.pop()
        else:
            return None
    
    @requires_grad.setter
    def requires_grad(self, value):
        for name, param in self.named_parameters():
            if 'words_to_chars_embedding' in name or 'words_to_words' in name:  # 这个不能加入到requires_grad中
                continue
            param.requires_grad = value


 class _ElmoModel(nn.Module):
    """
    该Module是ElmoEmbedding中进行所有的heavy lifting的地方。做的工作，包括
        (1) 根据配置，加载模型;
        (2) 根据vocab，对模型中的embedding进行调整. 并将其正确初始化
        (3) 保存一个words与chars的对应转换，获取时自动进行相应的转换
        (4) 设计一个保存token的embedding，允许缓存word的表示。

    """
    
    def __init__(self, model_dir: str, vocab: Vocabulary = None, cache_word_reprs: bool = False):
        super(_ElmoModel, self).__init__()
        self.model_dir = model_dir
        dir = os.walk(self.model_dir)
        config_file = None
        weight_file = None
        config_count = 0
        weight_count = 0
        for path, dir_list, file_list in dir:
            for file_name in file_list:
                if file_name.__contains__(".json"):
                    config_file = file_name
                    config_count += 1
                elif file_name.__contains__(".pkl"):
                    weight_file = file_name
                    weight_count += 1
        if config_count > 1 or weight_count > 1:
            raise Exception(f"Multiple config files(*.json) or weight files(*.hdf5) detected in {model_dir}.")
        elif config_count == 0 or weight_count == 0:
            raise Exception(f"No config file or weight file found in {model_dir}")
        with open(os.path.join(model_dir, config_file), 'r') as config_f:
            config = json.load(config_f)
        self.weight_file = os.path.join(model_dir, weight_file)
        self.config = config
        
        OOV_TAG = '<oov>'
        PAD_TAG = '<pad>'
        BOS_TAG = '<bos>'
        EOS_TAG = '<eos>'
        BOW_TAG = '<bow>'
        EOW_TAG = '<eow>'
        
        # For the model trained with character-based word encoder.
        char_lexicon = {}
        with codecs.open(os.path.join(model_dir, 'char.dic'), 'r', encoding='utf-8') as fpi:
            for line in fpi:
                tokens = line.strip().split('\t')
                if len(tokens) == 1:
                    tokens.insert(0, '\u3000')
                token, i = tokens
                char_lexicon[token] = int(i)
        
        # 做一些sanity check
        for special_word in [PAD_TAG, OOV_TAG, BOW_TAG, EOW_TAG]:
            assert special_word in char_lexicon, f"{special_word} not found in char.dic."
        
        # 从vocab中构建char_vocab
        char_vocab = Vocabulary(unknown=OOV_TAG, padding=PAD_TAG)
        # 需要保证<bow>与<eow>在里面
        char_vocab.add_word_lst([BOW_TAG, EOW_TAG, BOS_TAG, EOS_TAG])
        
        for word, index in vocab:
            char_vocab.add_word_lst(list(word))
        
        self.bos_index, self.eos_index, self._pad_index = len(vocab), len(vocab) + 1, vocab.padding_idx
        # 根据char_lexicon调整, 多设置一位，是预留给word padding的(该位置的char表示为全0表示)
        char_emb_layer = nn.Embedding(len(char_vocab) + 1, int(config['char_cnn']['embedding']['dim']),
                                      padding_idx=len(char_vocab))
        
        # 读入预训练权重 这里的elmo_model 包含char_cnn和 lstm 的 state_dict
        elmo_model = torch.load(os.path.join(self.model_dir, weight_file), map_location='cpu')
        
        char_embed_weights = elmo_model["char_cnn"]['char_emb_layer.weight']
        
        found_char_count = 0
        for char, index in char_vocab:  # 调整character embedding
            if char in char_lexicon:
                index_in_pre = char_lexicon.get(char)
                found_char_count += 1
            else:
                index_in_pre = char_lexicon[OOV_TAG]
            char_emb_layer.weight.data[index] = char_embed_weights[index_in_pre]
        
        logger.info(f"{found_char_count} out of {len(char_vocab)} characters were found in pretrained elmo embedding.")
        # 生成words到chars的映射
        max_chars = config['char_cnn']['max_characters_per_token']
        
        self.words_to_chars_embedding = nn.Parameter(torch.full((len(vocab) + 2, max_chars),
                                                                fill_value=len(char_vocab),
                                                                dtype=torch.long),
                                                     requires_grad=False)
        for word, index in list(iter(vocab)) + [(BOS_TAG, len(vocab)), (EOS_TAG, len(vocab) + 1)]:
            if len(word) + 2 > max_chars:
                word = word[:max_chars - 2]
            if index == self._pad_index:
                continue
            elif word == BOS_TAG or word == EOS_TAG:
                char_ids = [char_vocab.to_index(BOW_TAG)] + [char_vocab.to_index(word)] + [
                    char_vocab.to_index(EOW_TAG)]
                char_ids += [char_vocab.to_index(PAD_TAG)] * (max_chars - len(char_ids))
            else:
                char_ids = [char_vocab.to_index(BOW_TAG)] + [char_vocab.to_index(c) for c in word] + [
                    char_vocab.to_index(EOW_TAG)]
                char_ids += [char_vocab.to_index(PAD_TAG)] * (max_chars - len(char_ids))
            self.words_to_chars_embedding[index] = torch.LongTensor(char_ids)
        
        self.char_vocab = char_vocab
        
        self.token_embedder = ConvTokenEmbedder(
            config, self.weight_file, None, char_emb_layer)
        elmo_model["char_cnn"]['char_emb_layer.weight'] = char_emb_layer.weight
        self.token_embedder.load_state_dict(elmo_model["char_cnn"])
        
        self.output_dim = config['lstm']['projection_dim']
        
        # lstm encoder
        self.encoder = ElmobiLm(config)
        self.encoder.load_state_dict(elmo_model["lstm"])
        
        if cache_word_reprs:
            if config['char_cnn']['embedding']['dim'] > 0:  # 只有在使用了chars的情况下有用
                logger.info("Start to generate cache word representations.")
                batch_size = 320
                # bos eos
                word_size = self.words_to_chars_embedding.size(0)
                num_batches = word_size // batch_size + \
                              int(word_size % batch_size != 0)
                
                self.cached_word_embedding = nn.Embedding(word_size,
                                                          config['lstm']['projection_dim'])
                with torch.no_grad():
                    for i in range(num_batches):
                        words = torch.arange(i * batch_size,
                                             min((i + 1) * batch_size, word_size)).long()
                        chars = self.words_to_chars_embedding[words].unsqueeze(1)  # batch_size x 1 x max_chars
                        word_reprs = self.token_embedder(words.unsqueeze(1),
                                                         chars).detach()  # batch_size x 1 x config['encoder']['projection_dim']
                        self.cached_word_embedding.weight.data[words] = word_reprs.squeeze(1)
                    
                    logger.info("Finish generating cached word representations. Going to delete the character encoder.")
                del self.token_embedder, self.words_to_chars_embedding
            else:
                logger.info("There is no need to cache word representations, since no character information is used.")
    
    def forward(self, words):
        """

        :param words: batch_size x max_len
        :return: num_layers x batch_size x max_len x hidden_size
        """
        # 扩展<bos>, <eos>
        batch_size, max_len = words.size()
        expanded_words = words.new_zeros(batch_size, max_len + 2)  # 因为pad一定为0，
        seq_len = words.ne(self._pad_index).sum(dim=-1)
        expanded_words[:, 1:-1] = words
        expanded_words[:, 0].fill_(self.bos_index)
        expanded_words[torch.arange(batch_size).to(words), seq_len + 1] = self.eos_index
        seq_len = seq_len + 2
        zero_tensor = expanded_words.new_zeros(expanded_words.shape)
        mask = (expanded_words == zero_tensor).unsqueeze(-1)
        if hasattr(self, 'cached_word_embedding'):
            token_embedding = self.cached_word_embedding(expanded_words)
        else:
            if hasattr(self, 'words_to_chars_embedding'):
                chars = self.words_to_chars_embedding[expanded_words]
            else:
                chars = None
            token_embedding = self.token_embedder(expanded_words, chars)  # batch_size x max_len x embed_dim
        
        encoder_output = self.encoder(token_embedding, seq_len)
        if encoder_output.size(2) < max_len + 2:
            num_layers, _, output_len, hidden_size = encoder_output.size()
            dummy_tensor = encoder_output.new_zeros(num_layers, batch_size,
                                                    max_len + 2 - output_len, hidden_size)
            encoder_output = torch.cat((encoder_output, dummy_tensor), 2)
        sz = encoder_output.size()  # 2, batch_size, max_len, hidden_size
        token_embedding = token_embedding.masked_fill(mask, 0)
        token_embedding = torch.cat((token_embedding, token_embedding), dim=2).view(1, sz[1], sz[2], sz[3])
        encoder_output = torch.cat((token_embedding, encoder_output), dim=0)
        
        # 删除<eos>, <bos>. 这里没有精确地删除，但应该也不会影响最后的结果了。
        encoder_output = encoder_output[:, :, 1:-1]
        return encoder_output