diff --git a/tutorials/fastnlp_tutorial_2.ipynb b/tutorials/fastnlp_tutorial_2.ipynb index 3aa27c86..4ee9579f 100644 --- a/tutorials/fastnlp_tutorial_2.ipynb +++ b/tutorials/fastnlp_tutorial_2.ipynb @@ -867,7 +867,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.7.13" }, "pycharm": { "stem_cell": { diff --git a/tutorials/fastnlp_tutorial_3.ipynb b/tutorials/fastnlp_tutorial_3.ipynb index 353e4645..172e1232 100644 --- a/tutorials/fastnlp_tutorial_3.ipynb +++ b/tutorials/fastnlp_tutorial_3.ipynb @@ -29,13 +29,7 @@ "\n", "### 1.1 dataloader 的职责描述\n", "\n", - "在`fastNLP 0.8`中,在数据加载模块`DataLoader`之前,还存在其他的一些模块,负责例如对文本数据\n", - "\n", - "  进行补零对齐,即 **核对器`collator`模块**,进行分词标注,即 **分词器`tokenizer`模块**\n", - "\n", - "  本节将对`fastNLP`中的核对器`collator`等展开介绍,分词器`tokenizer`将在下一节中详细介绍\n", - "\n", - "在`fastNLP 0.8`中,**核对器`collator`模块负责文本序列的补零对齐**,通过" + "在`fastNLP 0.8`中,在数据加载模块`DataLoader`之前" ] }, { @@ -45,13 +39,7 @@ "source": [ "### 1.2 dataloader 的基本使用\n", "\n", - "在`fastNLP 0.8`中,在数据加载模块`DataLoader`之前,还存在其他的一些模块,负责例如对文本数据\n", - "\n", - "  进行补零对齐,即 **核对器`collator`模块**,进行分词标注,即 **分词器`tokenizer`模块**\n", - "\n", - "  本节将对`fastNLP`中的核对器`collator`等展开介绍,分词器`tokenizer`将在下一节中详细介绍\n", - "\n", - "在`fastNLP 0.8`中,**核对器`collator`模块负责文本序列的补零对齐**,通过" + "在`fastNLP 0.8`中,在数据加载模块`DataLoader`之前," ] }, { diff --git a/tutorials/fastnlp_tutorial_4.ipynb b/tutorials/fastnlp_tutorial_4.ipynb index 532118b0..3e148bf3 100644 --- a/tutorials/fastnlp_tutorial_4.ipynb +++ b/tutorials/fastnlp_tutorial_4.ipynb @@ -5,21 +5,21 @@ "id": "fdd7ff16", "metadata": {}, "source": [ - "# T4. model 的搭建与 driver 的概念\n", + "# T4. trainer 和 evaluator 的深入介绍(一)\n", "\n", - "  1   fastNLP 中预训练模型的使用\n", + "  1   fastNLP 结合 pytorch 搭建模型\n", " \n", "    1.1   \n", "\n", "    1.2   \n", "\n", - "  2   fastNLP 中使用 Pytorch 搭建模型\n", + "  2   fastNLP 中的 driver 与 device\n", "\n", "    2.1   \n", "\n", "    2.2   \n", "\n", - "  3   fastNLP 中的 driver\n", + "  3   fastNLP 中 trainer 的补充介绍\n", "\n", "    3.1   \n", "\n", @@ -51,7 +51,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.7.13" } }, "nbformat": 4, diff --git a/tutorials/fastnlp_tutorial_5.ipynb b/tutorials/fastnlp_tutorial_5.ipynb new file mode 100644 index 00000000..1e41a36e --- /dev/null +++ b/tutorials/fastnlp_tutorial_5.ipynb @@ -0,0 +1,59 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fdd7ff16", + "metadata": {}, + "source": [ + "# T5. fastNLP 与 paddle 或 jittor 的结合\n", + "\n", + "  1   fastNLP 结合 paddle 训练模型\n", + " \n", + "    1.1   \n", + "\n", + "    1.2   \n", + "\n", + "  2   fastNLP 结合 jittor 训练模型\n", + "\n", + "    2.1   \n", + "\n", + "    2.2   \n", + "\n", + "  3   fastNLP 实现 paddle 与 pytorch 互转\n", + "\n", + "    3.1   \n", + "\n", + "    3.2   " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08752c5a", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/fastnlp_tutorial_6.ipynb b/tutorials/fastnlp_tutorial_6.ipynb new file mode 100644 index 00000000..bd4b37ed --- /dev/null +++ b/tutorials/fastnlp_tutorial_6.ipynb @@ -0,0 +1,59 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fdd7ff16", + "metadata": {}, + "source": [ + "# T6. trainer 和 evaluator 的深入介绍(二)\n", + "\n", + "  1   fastNLP 中预定义模型 models\n", + " \n", + "    1.1   \n", + "\n", + "    1.2   \n", + "\n", + "  2   fastNLP 中预定义模型 modules\n", + " \n", + "    2.1   \n", + "\n", + "    2.2   \n", + "\n", + "  3   fastNLP 中的更多 metric 类型\n", + "\n", + "    3.1   \n", + "\n", + "    3.2   " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08752c5a", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/fastnlp_tutorial_7.ipynb b/tutorials/fastnlp_tutorial_7.ipynb new file mode 100644 index 00000000..0a7d6922 --- /dev/null +++ b/tutorials/fastnlp_tutorial_7.ipynb @@ -0,0 +1,59 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fdd7ff16", + "metadata": {}, + "source": [ + "# T7. callback 自定义训练过程\n", + "\n", + "  1   \n", + " \n", + "    1.1   \n", + "\n", + "    1.2   \n", + "\n", + "  2   \n", + "\n", + "    2.1   \n", + "\n", + "    2.2   \n", + "\n", + "  3   \n", + "\n", + "    3.1   \n", + "\n", + "    3.2   " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08752c5a", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/fastnlp_tutorial_8.ipynb b/tutorials/fastnlp_tutorial_8.ipynb new file mode 100644 index 00000000..0664bc41 --- /dev/null +++ b/tutorials/fastnlp_tutorial_8.ipynb @@ -0,0 +1,59 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "fdd7ff16", + "metadata": {}, + "source": [ + "# T8. fastNLP 中的文件读取模块\n", + "\n", + "  1   fastNLP 中的 EmbedLoader 模块\n", + " \n", + "    1.1   \n", + "\n", + "    1.2   \n", + "\n", + "  2   fastNLP 中的 Loader 模块\n", + "\n", + "    2.1   \n", + "\n", + "    2.2   \n", + "\n", + "  3   fastNLP 中的 Pipe 模块\n", + "\n", + "    3.1   \n", + "\n", + "    3.2   " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08752c5a", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/fastnlp_tutorial_e1.ipynb b/tutorials/fastnlp_tutorial_e1.ipynb index 6ec04cb4..8897c800 100644 --- a/tutorials/fastnlp_tutorial_e1.ipynb +++ b/tutorials/fastnlp_tutorial_e1.ipynb @@ -6,7 +6,7 @@ "source": [ "  从这篇开始,我们将开启**`fastNLP v0.8 tutorial`的`example`系列**,在接下来的\n", "\n", - "  每篇`tutorial`里,我们将会介绍`fastNLP v0.8`在一些自然语言处理任务上的应用" + "  每篇`tutorial`里,我们将会介绍`fastNLP v0.8`在自然语言处理任务上的应用实例" ] }, { @@ -82,9 +82,9 @@ "\n", "  包含9个数据集,各语料的语言均为英语,涉及多个自然语言理解`NLU`任务,包括\n", "\n", - "    **`CoLA`**,文本分类任务,预测单句语法正误分类;**`SST2`**,文本分类任务,预测单句情感二分类\n", + "    **`CoLA`**,文本分类任务,预测单句语法正误分类;**`SST-2`**,文本分类任务,预测单句情感二分类\n", "\n", - "    **`MRPC`**,句对分类任务,预测句对语义一致性;**`STSB`**,相似度打分任务,预测句对语义相似度回归\n", + "    **`MRPC`**,句对分类任务,预测句对语义一致性;**`STS-B`**,相似度打分任务,预测句对语义相似度回归\n", "\n", "    **`QQP`**,句对分类任务,预测问题对语义一致性;**`MNLI`**,文本推理任务,预测句对蕴含/矛盾/中立预测\n", "\n", @@ -216,15 +216,15 @@ "\n", "  即使用较小的、不区分大小写的数据集,**对`bert-base`进行知识蒸馏后的版本**,结构上\n", "\n", - "  模型包含1个编码层、6个自注意力层,详解见本篇末尾,更多细节请参考[DistilBert论文](https://arxiv.org/pdf/1910.01108.pdf)\n", + "  包含**1个编码层**、**6个自注意力层**,**参数量`66M`**,详解见本篇末尾,更多请参考[DistilBert论文](https://arxiv.org/pdf/1910.01108.pdf)\n", "\n", - "首先,通过从`transformers`库中导入`AutoTokenizer`模块,使用`from_pretrained`函数初始化\n", + "首先,通过从`transformers`库中导入**`AutoTokenizer`模块**,**使用`from_pretrained`函数初始化**\n", "\n", "  此处的`use_fast`表示是否使用`tokenizer`的快速版本;尝试序列化示例数据,检查加载结果\n", "\n", - "  需要注意的是,处理后返回的两个键值,`'input_ids'`表示原始文本对应的词素编号序列\n", + "  需要注意的是,处理后返回的两个键值,**`'input_ids'`**表示原始文本对应的词素编号序列\n", "\n", - "    `'attention_mask'`表示自注意力运算时的掩模(标上`0`的部分对应`padding`的内容" + "    **`'attention_mask'`**表示自注意力运算时的掩模(标上`0`的部分对应`padding`的内容" ] }, { diff --git a/tutorials/fastnlp_tutorial_e2.ipynb b/tutorials/fastnlp_tutorial_e2.ipynb index 93143090..c8f4b7dc 100644 --- a/tutorials/fastnlp_tutorial_e2.ipynb +++ b/tutorials/fastnlp_tutorial_e2.ipynb @@ -25,31 +25,53 @@ "\n", "    将首先简单介绍提示学习模型的研究,以及与`fastNLP v0.8`结合的优势\n", "\n", - "**`prompt`**,**提示词、提词器**,最早出自**`PET`**,\n", + "**`prompt`**,**提示词**,最早出自论文[Exploiting Cloze Questions for Few Shot TC and NLI](https://arxiv.org/pdf/2001.07676.pdf)中的**`PET`模型**\n", "\n", - "  \n", + "    全称 **`Pattern-Exploiting Training`**,虽然文中并没有提到**`prompt`的说法,但仍视为其开山之作\n", "\n", - "**`prompt-based tuning`**,**基于提示的微调**,描述\n", + "  其大致思路包括,对于文本分类任务,假定输入文本为,后来被称`prompt`,后来被称`verbalizer`,\n", "\n", - "  **`prompt-based model`**,**基于提示的模型**\n", + "  其主要贡献在于,\n", "\n", - "**`prompt-based model`**,**基于提示的模型**,举例\n", + "\n", "\n", - "  案例一:**`P-Tuning v1`**\n", + "**`prompt-based tuning`**,**基于提示的微调**,\n", "\n", - "  案例二:**`PromptTuning`**\n", + "  xxxx,更多参考[prompt综述](https://arxiv.org/pdf/2107.13586.pdf)\n", "\n", - "  案例三:**`PrefixTuning`**\n", + "    以下列举些经典的`prompt-based tuning`案例,简单地介绍下`prompt-based tuning`的脉络\n", "\n", - "  案例四:**`SoftPrompt`**\n", + "  案例一:**`P-Tuning v1`**,详细内容参考[P-Tuning-v1论文](https://arxiv.org/pdf/2103.10385.pdf)\n", "\n", - "使用`fastNLP v0.8`实现`prompt-based model`的优势\n", + "    其主要贡献在于,\n", "\n", - "  \n", + "    其方法大致包括,\n", "\n", - "  本示例仍使用了`tutorial-E1`的`SST2`数据集,将`bert-base-uncased`作为基础模型\n", + "  案例二:**`PromptTuning`**,详细内容参考[PromptTuning论文](https://arxiv.org/pdf/2104.08691.pdf)\n", "\n", - "    在后续实现中,意图通过将连续的`prompt`与`model`拼接,解决`SST2`二分类任务" + "    其主要贡献在于,\n", + "\n", + "    其方法大致包括,\n", + "\n", + "  案例三:**`PrefixTuning`**,详细内容参考[PrefixTuning论文](https://arxiv.org/pdf/2101.00190.pdf)\n", + "\n", + "    其主要贡献在于,\n", + "\n", + "    其方法大致包括,\n", + "\n", + "通过上述介绍可以发现`prompt-based tuning`只是模型微调方式,独立于预训练模型基础`backbone`\n", + "\n", + "  目前,加载预训练模型的主流方法是使用`transformers`模块,而实现微调的框架则\n", + "\n", + "    可以是`pytorch`、`paddle`、`jittor`等,而不同框架间又存在不兼容的问题\n", + "\n", + "  因此,**使用`fastNLP v0.8`实现`prompt-based tuning`**,可以**很好地解决`paddle`等框架**\n", + "\n", + "    **和`transformers`模块之间的桥接**(`transformers`模块基于`pytorch`实现)\n", + "\n", + "本示例仍使用了`tutorial-E1`的`SST2`数据集、`distilbert-base-uncased`模型(便于比较\n", + "\n", + "  使用`pytorch`框架,通过将连续的`prompt`与`model`拼接,解决`SST2`二分类任务" ] }, { @@ -98,7 +120,7 @@ "print(transformers.__version__)\n", "\n", "task = 'sst2'\n", - "model_checkpoint = 'bert-base-uncased'" + "model_checkpoint = 'distilbert-base-uncased' # 'bert-base-uncased'" ] }, { @@ -111,20 +133,32 @@ "\n", "    以下首先简述`P-Tuning v2`的论文原理,并由此引出`fastNLP v0.8`的代码实践\n", "\n", - "`P-Tuning v2`出自论文 [Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf)\n", + "**`P-Tuning v2`**出自论文[Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf)\n", + "\n", + "  其主要贡献在于,**在`PrefixTuning`等深度提示学习基础上**,**提升了其在分类标注等`NLU`任务的表现**\n", + "\n", + "    并使之在中等规模模型,主要是**参数量在`100M-1B`区间的模型上**,**获得与全参数微调相同的效果**\n", + "\n", + "  其结构如图所示,通过**在输入序列的分类符`[CLS]`之前**,**加入前缀序列**(**序号对应嵌入是待训练的连续值向量**\n", + "\n", + "    **刺激模型在新任务下**,从`[CLS]`对应位置,**输出符合微调任务的输出**,从而达到适应微调任务的目的\n", + "\n", + "\n", "\n", - "  其主要贡献在于,在`PrefixTuning`等深度提示学习基础上,提升了其在分类标注等`NLU`任务的表现\n", + "本示例使用`bert-base-uncased`模型,作为`P-Tuning v2`的基础`backbone`,设置`requires_grad=False`\n", "\n", - "    并使之在中等规模模型,主要是参数量在`100M-1B`区间的模型上,获得与全参数微调相同的效果\n", + "    固定其参数不参与训练,**设置`pre_seq_len`长的`prefix_tokens`作为输入的提示前缀序列**\n", "\n", - "  其结构如图所示,\n", + "  **使用基于`nn.Embedding`的`prefix_encoder`为提示前缀嵌入**,通过`get_prompt`函数获取,再将之\n", "\n", - "" + "    拼接至批量内每笔数据前得到`inputs_embeds`,同时更新自注意力掩模`attention_mask`\n", + "\n", + "  将`inputs_embeds`、`attention_mask`和`labels`输入`backbone`,**得到输出包括`loss`和`logits`**" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -178,24 +212,24 @@ "source": [ "接着,通过确定分类数量初始化模型实例,同时调用`torch.optim.AdamW`模块初始化优化器\n", "\n", - "  根据`P-Tuning v2`论文:*Generally, simple classification tasks prefer shorter prompts (less than 20)*\n", + "  根据`P-Tuning v2`论文:*`Generally, simple classification tasks prefer shorter prompts (less than 20)`*\n", "\n", "  此处`pre_seq_len`参数设定为`20`,学习率相应做出调整,其他内容和`tutorial-E1`中的内容一致" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias']\n", - "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", - "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", - "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']\n", + "Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_projector.bias']\n", + "- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", + "- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", + "Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'pre_classifier.bias', 'classifier.bias']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] } @@ -225,7 +259,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 4, "metadata": { "scrolled": false }, @@ -240,7 +274,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "b72eeebd34354a88a99b2e07ec9a86df", + "model_id": "21cbd92c3397497d84dc10f017ec96f4", "version_major": 2, "version_minor": 0 }, @@ -262,30 +296,17 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "Loading cached processed dataset at /remote-home/xrliu/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-18ec0e709f05e61e.arrow\n", - "Loading cached processed dataset at /remote-home/xrliu/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-e2f02ee7442ad73e.arrow\n" + "Loading cached processed dataset at /remote-home/xrliu/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-294e481a713c5754.arrow\n", + "Loading cached processed dataset at /remote-home/xrliu/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-ed9d9258aaf0fb54.arrow\n", + "Loading cached processed dataset at /remote-home/xrliu/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-f44c5576b89f9e6b.arrow\n" ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d15505d825b34f649b719f1ff0d56114", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - " 0%| | 0/2 [00:00[22:53:00] INFO Running evaluator sanity check for 2 batches. trainer.py:592\n", + "\n" + ], + "text/plain": [ + "\u001b[2;36m[22:53:00]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Running evaluator sanity check for \u001b[1;36m2\u001b[0m batches. \u001b]8;id=406635;file://../fastNLP/core/controllers/trainer.py\u001b\\\u001b[2mtrainer.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=951504;file://../fastNLP/core/controllers/trainer.py#592\u001b\\\u001b[2m592\u001b[0m\u001b]8;;\u001b\\\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Output()" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Output()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:1, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m1\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.540625,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 173.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.540625\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m173.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:2, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m2\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.5,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 160.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.5\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m160.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:3, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m3\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.509375,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 163.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.509375\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m163.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:4, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m4\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.634375,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 203.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.634375\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m203.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:5, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m5\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.6125,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 196.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.6125\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m196.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:6, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m6\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.675,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 216.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.675\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m216.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:7, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m7\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.64375,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 206.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.64375\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m206.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:8, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m8\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.665625,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 213.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.665625\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m213.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
----------------------------- Eval. results on Epoch:9, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "----------------------------- Eval. results on Epoch:\u001b[1;36m9\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.659375,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 211.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.659375\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m211.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
---------------------------- Eval. results on Epoch:10, Batch:0 -----------------------------\n",
+       "
\n" + ], + "text/plain": [ + "---------------------------- Eval. results on Epoch:\u001b[1;36m10\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
{\n",
+       "  \"acc#acc\": 0.696875,\n",
+       "  \"total#acc\": 320.0,\n",
+       "  \"correct#acc\": 223.0\n",
+       "}\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1m{\u001b[0m\n", + " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.696875\u001b[0m,\n", + " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", + " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m223.0\u001b[0m\n", + "\u001b[1m}\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "trainer.run(num_eval_batch_per_dl=10)" ] @@ -421,14 +976,55 @@ "cell_type": "markdown", "metadata": {}, "source": [ - " " + "可以发现,其效果远远逊色于`fine-tuning`,这是因为`P-Tuning v2`虽然能够适应参数量\n", + "\n", + "  在`100M-1B`区间的模型,但是,**`distilbert-base`的参数量仅为`66M`**,无法触及其下限\n", + "\n", + "另一方面,**`fastNLP v0.8`不支持`jupyter`多卡**,所以无法在笔者的电脑/服务器上,完成\n", + "\n", + "  合适规模模型的学习,例如`110M`的`bert-base`模型,以及`340M`的`bert-large`模型" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Output()" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'acc#acc': 0.737385, 'total#acc': 872.0, 'correct#acc': 643.0}"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "trainer.evaluator.run()"
    ]
diff --git a/tutorials/figures/E2-fig-pet-model.png b/tutorials/figures/E2-fig-pet-model.png
new file mode 100644
index 00000000..c3c377c0
Binary files /dev/null and b/tutorials/figures/E2-fig-pet-model.png differ