You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

enas_controller.py 8.2 kB

Dev0.4.0 (#149) * 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释 * BucketSampler增加一条错误检测 * 1.修改ClipGradientCallback的bug;删除LRSchedulerCallback中的print,之后应该传入pbar进行打印;2.增加MLP注释 * update MLP module * 增加metric注释;修改trainer save过程中的bug * Update README.md fix tutorial link * Add ENAS (Efficient Neural Architecture Search) * add ignore_type in DataSet.add_field * * AutoPadder will not pad when dtype is None * add ignore_type in DataSet.apply * 修复fieldarray中padder潜在bug * 修复crf中typo; 以及可能导致数值不稳定的地方 * 修复CRF中可能存在的bug * change two default init arguments of Trainer into None * Changes to Callbacks: * 给callback添加给定几个只读属性 * 通过manager设置这些属性 * 代码优化,减轻@transfer的负担 * * 将enas相关代码放到automl目录下 * 修复fast_param_mapping的一个bug * Trainer添加自动创建save目录 * Vocabulary的打印,显示内容 * * 给vocabulary添加遍历方法 * 修复CRF为负数的bug * add SQuAD metric * add sigmoid activate function in MLP * - add star transformer model - add ConllLoader, for all kinds of conll-format files - add JsonLoader, for json-format files - add SSTLoader, for SST-2 & SST-5 - change Callback interface - fix batch multi-process when killed - add README to list models and their performance * - fix test * - fix callback & tests * - update README * 修改部分bug;调整callback * 准备发布0.4.0版本“ * update readme * support parallel loss * 防止多卡的情况导致无法正确计算loss“ * update advance_tutorial jupyter notebook * 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove. 2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。 3. 在utils中新增一个cache_result()修饰器,用于cache函数的返回值。 4. callback中新增update_every属性 * 1.DataSet.apply()报错时提供错误的index 2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序 3.embedloader在embed读取时遇到不规则的数据跳过这一行. * update attention * doc tools * fix some doc errors * 修改为中文注释,增加viterbi解码方法 * 样例版本 * - add pad sequence for lstm - add csv, conll, json filereader - update dataloader - remove useless dataloader - fix trainer loss print - fix tests * - fix test_tutorial * 注释增加 * 测试文档 * 本地暂存 * 本地暂存 * 修改文档的顺序 * - add document * 本地暂存 * update pooling * update bert * update documents in MLP * update documents in snli * combine self attention module to attention.py * update documents on losses.py * 对DataSet的文档进行更新 * update documents on metrics * 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档 * 增加对Trainer的注释 * 完善了trainer,callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏 * update char level encoder * update documents on embedding.py * - update doc * 补充注释,并修改部分代码 * - update doc - add get_embeddings * 修改了文档配置项 * 修改embedding为init_embed初始化 * 1.增加对Trainer和Tester的多卡支持; * - add test - fix jsonloader * 删除了注释教程 * 给 dataset 增加了get_field_names * 修复bug * - add Const - fix bugs * 修改部分注释 * - add model runner for easier test models - add model tests * 修改了 docs 的配置和架构 * 修改了核心部分的一大部分文档,TODO: 1. 完善 trainer 和 tester 部分的文档 2. 研究注释样例与测试 * core部分的注释基本检查完成 * 修改了 io 部分的注释 * 全部改为相对路径引用 * 全部改为相对路径引用 * small change * 1. 从安装文件中删除api/automl的安装 2. metric中存在seq_len的bug 3. sampler中存在命名错误,已修改 * 修复 bug :兼容 cpu 版本的 PyTorch TODO:其它地方可能也存在类似的 bug * 修改文档中的引用部分 * 把 tqdm.autonotebook 换成tqdm.auto * - fix batch & vocab * 上传了文档文件 *.rst * 上传了文档文件和若干 TODO * 讨论并整合了若干模块 * core部分的测试和一些小修改 * 删除了一些冗余文档 * update init files * update const files * update const files * 增加cnn的测试 * fix a little bug * - update attention - fix tests * 完善测试 * 完成快速入门教程 * 修改了sequence_modeling 命名为 sequence_labeling 的文档 * 重新 apidoc 解决改名的遗留问题 * 修改文档格式 * 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask * 增加了一行提示 * 在文档中展示 dataset_loader * 提示 Dataset.read_csv 会被 CSVLoader 替换 * 完成 Callback 和 Trainer 之间的文档 * index更新了部分 * 删除冗余的print * 删除用于分词的metric,因为有可能引起错误 * 修改文档中的中文名称 * 完成了详细介绍文档 * tutorial 的 ipynb 文件 * 修改了一些介绍文档 * 修改了 models 和 modules 的主页介绍 * 加上了 titlesonly 这个设置 * 修改了模块文档展示的标题 * 修改了 core 和 io 的开篇介绍 * 修改了 modules 和 models 开篇介绍 * 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释 * 修改了一些注释 * delete an old metric in test * 修改 tutorials 的测试文件 * 把暂不发布的功能移到 legacy 文件夹 * 删除了不能运行的测试 * 修改 callback 的测试文件 * 删除了过时的教程和测试文件 * cache_results 参数的修改 * 修改 io 的测试文件; 删除了一些过时的测试 * 修复bug * 修复无法通过test_utils.py的测试 * 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar * 1. 修复metric中的bug; 2.增加metric测试 * add model summary * 增加别名 * 删除encoder中的嵌套层 * 修改了 core 部分 import 的顺序,__all__ 暴露的内容 * 修改了 models 部分 import 的顺序,__all__ 暴露的内容 * 修改了文件名 * 修改了 modules 模块的__all__ 和 import * fix var runn * 增加vocab的clear方法 * 一些符合 PEP8 的微调 * 更新了cache_results的例子 * 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index * 修改了一个typo * 修改了 README.md * update documents on bert * update documents on encoder/bert * 增加一个fitlog callback,实现与fitlog实验记录 * typo * - update dataset_loader * 增加了到 fitlog 文档的链接。 * 增加了 DataSet Loader 的文档 * - add star-transformer reproduction
6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223
  1. # Code Modified from https://github.com/carpedm20/ENAS-pytorch
  2. """A module with NAS controller-related code."""
  3. import collections
  4. import os
  5. import torch
  6. import torch.nn.functional as F
  7. import fastNLP.automl.enas_utils as utils
  8. from fastNLP.automl.enas_utils import Node
  9. def _construct_dags(prev_nodes, activations, func_names, num_blocks):
  10. """Constructs a set of DAGs based on the actions, i.e., previous nodes and
  11. activation functions, sampled from the controller/policy pi.
  12. Args:
  13. prev_nodes: Previous node actions from the policy.
  14. activations: Activations sampled from the policy.
  15. func_names: Mapping from activation function names to functions.
  16. num_blocks: Number of blocks in the target RNN cell.
  17. Returns:
  18. A list of DAGs defined by the inputs.
  19. RNN cell DAGs are represented in the following way:
  20. 1. Each element (node) in a DAG is a list of `Node`s.
  21. 2. The `Node`s in the list dag[i] correspond to the subsequent nodes
  22. that take the output from node i as their own input.
  23. 3. dag[-1] is the node that takes input from x^{(t)} and h^{(t - 1)}.
  24. dag[-1] always feeds dag[0].
  25. dag[-1] acts as if `w_xc`, `w_hc`, `w_xh` and `w_hh` are its
  26. weights.
  27. 4. dag[N - 1] is the node that produces the hidden state passed to
  28. the next timestep. dag[N - 1] is also always a leaf node, and therefore
  29. is always averaged with the other leaf nodes and fed to the output
  30. decoder.
  31. """
  32. dags = []
  33. for nodes, func_ids in zip(prev_nodes, activations):
  34. dag = collections.defaultdict(list)
  35. # add first node
  36. dag[-1] = [Node(0, func_names[func_ids[0]])]
  37. dag[-2] = [Node(0, func_names[func_ids[0]])]
  38. # add following nodes
  39. for jdx, (idx, func_id) in enumerate(zip(nodes, func_ids[1:])):
  40. dag[utils.to_item(idx)].append(Node(jdx + 1, func_names[func_id]))
  41. leaf_nodes = set(range(num_blocks)) - dag.keys()
  42. # merge with avg
  43. for idx in leaf_nodes:
  44. dag[idx] = [Node(num_blocks, 'avg')]
  45. # This is actually y^{(t)}. h^{(t)} is node N - 1 in
  46. # the graph, where N Is the number of nodes. I.e., h^{(t)} takes
  47. # only one other node as its input.
  48. # last h[t] node
  49. last_node = Node(num_blocks + 1, 'h[t]')
  50. dag[num_blocks] = [last_node]
  51. dags.append(dag)
  52. return dags
  53. class Controller(torch.nn.Module):
  54. """Based on
  55. https://github.com/pytorch/examples/blob/master/word_language_model/model.py
  56. RL controllers do not necessarily have much to do with
  57. language models.
  58. Base the controller RNN on the GRU from:
  59. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/blob/master/model.py
  60. """
  61. def __init__(self, num_blocks=4, controller_hid=100, cuda=False):
  62. torch.nn.Module.__init__(self)
  63. # `num_tokens` here is just the activation function
  64. # for every even step,
  65. self.shared_rnn_activations = ['tanh', 'ReLU', 'identity', 'sigmoid']
  66. self.num_tokens = [len(self.shared_rnn_activations)]
  67. self.controller_hid = controller_hid
  68. self.use_cuda = cuda
  69. self.num_blocks = num_blocks
  70. for idx in range(num_blocks):
  71. self.num_tokens += [idx + 1, len(self.shared_rnn_activations)]
  72. self.func_names = self.shared_rnn_activations
  73. num_total_tokens = sum(self.num_tokens)
  74. self.encoder = torch.nn.Embedding(num_total_tokens,
  75. controller_hid)
  76. self.lstm = torch.nn.LSTMCell(controller_hid, controller_hid)
  77. # Perhaps these weights in the decoder should be
  78. # shared? At least for the activation functions, which all have the
  79. # same size.
  80. self.decoders = []
  81. for idx, size in enumerate(self.num_tokens):
  82. decoder = torch.nn.Linear(controller_hid, size)
  83. self.decoders.append(decoder)
  84. self._decoders = torch.nn.ModuleList(self.decoders)
  85. self.reset_parameters()
  86. self.static_init_hidden = utils.keydefaultdict(self.init_hidden)
  87. def _get_default_hidden(key):
  88. return utils.get_variable(
  89. torch.zeros(key, self.controller_hid),
  90. self.use_cuda,
  91. requires_grad=False)
  92. self.static_inputs = utils.keydefaultdict(_get_default_hidden)
  93. def reset_parameters(self):
  94. init_range = 0.1
  95. for param in self.parameters():
  96. param.data.uniform_(-init_range, init_range)
  97. for decoder in self.decoders:
  98. decoder.bias.data.fill_(0)
  99. def forward(self, # pylint:disable=arguments-differ
  100. inputs,
  101. hidden,
  102. block_idx,
  103. is_embed):
  104. if not is_embed:
  105. embed = self.encoder(inputs)
  106. else:
  107. embed = inputs
  108. hx, cx = self.lstm(embed, hidden)
  109. logits = self.decoders[block_idx](hx)
  110. logits /= 5.0
  111. # # exploration
  112. # if self.args.mode == 'train':
  113. # logits = (2.5 * F.tanh(logits))
  114. return logits, (hx, cx)
  115. def sample(self, batch_size=1, with_details=False, save_dir=None):
  116. """Samples a set of `args.num_blocks` many computational nodes from the
  117. controller, where each node is made up of an activation function, and
  118. each node except the last also includes a previous node.
  119. """
  120. if batch_size < 1:
  121. raise Exception(f'Wrong batch_size: {batch_size} < 1')
  122. # [B, L, H]
  123. inputs = self.static_inputs[batch_size]
  124. hidden = self.static_init_hidden[batch_size]
  125. activations = []
  126. entropies = []
  127. log_probs = []
  128. prev_nodes = []
  129. # The RNN controller alternately outputs an activation,
  130. # followed by a previous node, for each block except the last one,
  131. # which only gets an activation function. The last node is the output
  132. # node, and its previous node is the average of all leaf nodes.
  133. for block_idx in range(2*(self.num_blocks - 1) + 1):
  134. logits, hidden = self.forward(inputs,
  135. hidden,
  136. block_idx,
  137. is_embed=(block_idx == 0))
  138. probs = F.softmax(logits, dim=-1)
  139. log_prob = F.log_softmax(logits, dim=-1)
  140. # .mean() for entropy?
  141. entropy = -(log_prob * probs).sum(1, keepdim=False)
  142. action = probs.multinomial(num_samples=1).data
  143. selected_log_prob = log_prob.gather(
  144. 1, utils.get_variable(action, requires_grad=False))
  145. # why the [:, 0] here? Should it be .squeeze(), or
  146. # .view()? Same below with `action`.
  147. entropies.append(entropy)
  148. log_probs.append(selected_log_prob[:, 0])
  149. # 0: function, 1: previous node
  150. mode = block_idx % 2
  151. inputs = utils.get_variable(
  152. action[:, 0] + sum(self.num_tokens[:mode]),
  153. requires_grad=False)
  154. if mode == 0:
  155. activations.append(action[:, 0])
  156. elif mode == 1:
  157. prev_nodes.append(action[:, 0])
  158. prev_nodes = torch.stack(prev_nodes).transpose(0, 1)
  159. activations = torch.stack(activations).transpose(0, 1)
  160. dags = _construct_dags(prev_nodes,
  161. activations,
  162. self.func_names,
  163. self.num_blocks)
  164. if save_dir is not None:
  165. for idx, dag in enumerate(dags):
  166. utils.draw_network(dag,
  167. os.path.join(save_dir, f'graph{idx}.png'))
  168. if with_details:
  169. return dags, torch.cat(log_probs), torch.cat(entropies)
  170. return dags
  171. def init_hidden(self, batch_size):
  172. zeros = torch.zeros(batch_size, self.controller_hid)
  173. return (utils.get_variable(zeros, self.use_cuda, requires_grad=False),
  174. utils.get_variable(zeros.clone(), self.use_cuda, requires_grad=False))