fastNLP

Commit Graph

Author	SHA1	Message	Date
Yige Xu	5d1c2a7ac3	add test code and data for testing CHN NER and classification loader and pipe	6 years ago
Yige Xu	5a2820cd18	add test code and data for testing cws loader and pipe	6 years ago
Yige Xu	9509c5dd08	move dataset test data to test/data_for_tests/io dir	6 years ago
benbijituo	8b31ca74ab	add test data&follow Matching	6 years ago
ChenXin	d582bd3e15	Delete train.tsv	6 years ago
ChenXin	f2face3b40	Delete test.tsv	6 years ago
ChenXin	7b08c777bc	Delete dev.tsv	6 years ago
Yige Xu	64fc8bc1e5	1. update classification and matching loader and pipe; 2. add data and test codes for testing classification and matching loader and pipe.	6 years ago
Yige Xu	753327d214	fix code style in coreference task and related codes	6 years ago
xxliu	b015cc149c	undocumented	6 years ago
liuxiaoxiong	07bbb79f77	Merge pull request #4 from fastnlp/dev0.5.0 Dev0.5.0	6 years ago
xxliu	ea5fbc8881	增加注释增加测试文件及测试样例修改部分变量命名	6 years ago
Yige Xu	880e3ad969	1. add mini_elmo.pkl and test codes for testing ElmoEmbedding; 2. update bert testing codes	6 years ago
Yige Xu	b9aa05f6cf	add testing codes and data for loader and pipe.	6 years ago
Yige Xu	4440801dbf	1. update bert.py and fix a bug in bert_embedding to adapt torch 1.2.0; 2. update models/bert.py and add BertForSentenceMatching model, now a BertEmbedding param should be passed to these five models; 3. create a small bert version for testing and modify test/models/test_bert.py; 4. move small glove and word2vec files to data_for_tests/embedding/small_static_embedding dir and fix relevant test codes; 5. delete some __init__.py files in test dir.	6 years ago
xuyige	c2d687528e	fix bugs and add test codes for: 1. models.snli; 2. core.metrics.extractive_qa; 3. io.data_loader.mnli	6 years ago
ChenXin	881ce01762	Dev0.4.0 (#149 ) * 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释 * BucketSampler增加一条错误检测 * 1.修改ClipGradientCallback的bug；删除LRSchedulerCallback中的print，之后应该传入pbar进行打印;2.增加MLP注释 * update MLP module * 增加metric注释；修改trainer save过程中的bug * Update README.md fix tutorial link * Add ENAS (Efficient Neural Architecture Search) * add ignore_type in DataSet.add_field * * AutoPadder will not pad when dtype is None * add ignore_type in DataSet.apply * 修复fieldarray中padder潜在bug * 修复crf中typo; 以及可能导致数值不稳定的地方 * 修复CRF中可能存在的bug * change two default init arguments of Trainer into None * Changes to Callbacks: * 给callback添加给定几个只读属性 * 通过manager设置这些属性 * 代码优化，减轻@transfer的负担 * * 将enas相关代码放到automl目录下 * 修复fast_param_mapping的一个bug * Trainer添加自动创建save目录 * Vocabulary的打印，显示内容 * * 给vocabulary添加遍历方法 * 修复CRF为负数的bug * add SQuAD metric * add sigmoid activate function in MLP * - add star transformer model - add ConllLoader, for all kinds of conll-format files - add JsonLoader, for json-format files - add SSTLoader, for SST-2 & SST-5 - change Callback interface - fix batch multi-process when killed - add README to list models and their performance * - fix test * - fix callback & tests * - update README * 修改部分bug；调整callback * 准备发布0.4.0版本“ * update readme * support parallel loss * 防止多卡的情况导致无法正确计算loss“ * update advance_tutorial jupyter notebook * 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove. 2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。 3. 在utils中新增一个cache_result()修饰器，用于cache函数的返回值。 4. callback中新增update_every属性 * 1.DataSet.apply()报错时提供错误的index 2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序 3.embedloader在embed读取时遇到不规则的数据跳过这一行. * update attention * doc tools * fix some doc errors * 修改为中文注释，增加viterbi解码方法 * 样例版本 * - add pad sequence for lstm - add csv, conll, json filereader - update dataloader - remove useless dataloader - fix trainer loss print - fix tests * - fix test_tutorial * 注释增加 * 测试文档 * 本地暂存 * 本地暂存 * 修改文档的顺序 * - add document * 本地暂存 * update pooling * update bert * update documents in MLP * update documents in snli * combine self attention module to attention.py * update documents on losses.py * 对DataSet的文档进行更新 * update documents on metrics * 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档 * 增加对Trainer的注释 * 完善了trainer，callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏 * update char level encoder * update documents on embedding.py * - update doc * 补充注释，并修改部分代码 * - update doc - add get_embeddings * 修改了文档配置项 * 修改embedding为init_embed初始化 * 1.增加对Trainer和Tester的多卡支持; * - add test - fix jsonloader * 删除了注释教程 * 给 dataset 增加了get_field_names * 修复bug * - add Const - fix bugs * 修改部分注释 * - add model runner for easier test models - add model tests * 修改了 docs 的配置和架构 * 修改了核心部分的一大部分文档，TODO： 1. 完善 trainer 和 tester 部分的文档 2. 研究注释样例与测试 * core部分的注释基本检查完成 * 修改了 io 部分的注释 * 全部改为相对路径引用 * 全部改为相对路径引用 * small change * 1. 从安装文件中删除api/automl的安装 2. metric中存在seq_len的bug 3. sampler中存在命名错误，已修改 * 修复 bug ：兼容 cpu 版本的 PyTorch TODO：其它地方可能也存在类似的 bug * 修改文档中的引用部分 * 把 tqdm.autonotebook 换成tqdm.auto * - fix batch & vocab * 上传了文档文件 .rst 上传了文档文件和若干 TODO * 讨论并整合了若干模块 * core部分的测试和一些小修改 * 删除了一些冗余文档 * update init files * update const files * update const files * 增加cnn的测试 * fix a little bug * - update attention - fix tests * 完善测试 * 完成快速入门教程 * 修改了sequence_modeling 命名为 sequence_labeling 的文档 * 重新 apidoc 解决改名的遗留问题 * 修改文档格式 * 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask * 增加了一行提示 * 在文档中展示 dataset_loader * 提示 Dataset.read_csv 会被 CSVLoader 替换 * 完成 Callback 和 Trainer 之间的文档 * index更新了部分 * 删除冗余的print * 删除用于分词的metric，因为有可能引起错误 * 修改文档中的中文名称 * 完成了详细介绍文档 * tutorial 的 ipynb 文件 * 修改了一些介绍文档 * 修改了 models 和 modules 的主页介绍 * 加上了 titlesonly 这个设置 * 修改了模块文档展示的标题 * 修改了 core 和 io 的开篇介绍 * 修改了 modules 和 models 开篇介绍 * 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释 * 修改了一些注释 * delete an old metric in test * 修改 tutorials 的测试文件 * 把暂不发布的功能移到 legacy 文件夹 * 删除了不能运行的测试 * 修改 callback 的测试文件 * 删除了过时的教程和测试文件 * cache_results 参数的修改 * 修改 io 的测试文件; 删除了一些过时的测试 * 修复bug * 修复无法通过test_utils.py的测试 * 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar * 1. 修复metric中的bug; 2.增加metric测试 * add model summary * 增加别名 * 删除encoder中的嵌套层 * 修改了 core 部分 import 的顺序，__all__ 暴露的内容 * 修改了 models 部分 import 的顺序，__all__ 暴露的内容 * 修改了文件名 * 修改了 modules 模块的__all__ 和 import * fix var runn * 增加vocab的clear方法 * 一些符合 PEP8 的微调 * 更新了cache_results的例子 * 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index * 修改了一个typo * 修改了 README.md * update documents on bert * update documents on encoder/bert * 增加一个fitlog callback，实现与fitlog实验记录 * typo * - update dataset_loader * 增加了到 fitlog 文档的链接。 * 增加了 DataSet Loader 的文档 * - add star-transformer reproduction	6 years ago
FengZiYjun	0c5630bd16	Ready for V0.3.1 * 升级parser API和模型 * update docs: add new pages for tutorials * upgrade CWS api download source * add a new method for dataset field access * add introduction for bert * add more unit tests for api/processor * remove unused test data. Add new test data.	6 years ago
hazelnutsgz	5f4ab131ac	Add a loader for conll2003 dataset	6 years ago
FengZiYjun	720a264eb3	* rename DataSet.get_fields() into get_all_fields() * add DataSet.get_field(), to fetch a FieldArray based on its name * remove old tutorials & add new tutorials	7 years ago
FengZiYjun	cc440b5ed6	All tests pass. * 更新测试代码，跑通所有测试，覆盖率65% * refine代码规范和某些注释 * fix tester self.use_cuda未赋值先使用的bug * 添加tutorial样例数据——tutorial_sample_dataset.csv * 【unsolved】embed_loader在计算np.cov时遇到segmentation fault	7 years ago
FengZiYjun	3120cdd09a	更新embed_loader: * 添加fast_load_embedding方法，用vocab的词索引pre-trained中的embedding * 如果vocab有词没出现在pre-train中，从已有embedding中正态采样 Update embed_loader: * add fast_load_embedding method, to index pre-trained embedding with words in Vocab * If words in Vocab are not exist in pre-trained, sample them from normal distribution computed by current embeddings	7 years ago
xuyige	b43d333738	clean some codes and fix some bugs	7 years ago
yunfan	2698094d8f	update embedding loader & vocab	7 years ago
FengZiYjun	5be4cb7bb5	Merge Preprocessor into DataSet. - DataSet's __init__ takes a function as argument, rather than class object - Preprocessor is about to remove. Don't use anymore. - Remove cross_validate in trainer, because it is rarely used and wired - Loader.load is expected to be a static method - Delete sth. in other_modules.py - Add more tests - Delete extra sample data	7 years ago
FengZiYjun	28a0683853	1. add tests in test_fastNLP.py & test_sampler.py; increase test coverage to 81% 2. changes of names: aggregation ----> aggregator interaction ----> interactor action.py ----> sampler.py BasePreprocess ---> Preprocessor BaseTester ----> Tester BaseTrainer ----> Trainer 3. add more code comments 4. fix bugs in predictor's data_forward 5. in sampler.py, remove Bachifier, fix some codes. but not test 6. remove unused codes in other_modules.py & utils.py 7. update fastnlp.py with new config file names and code comments 8. add data examples in data_for_tests/	7 years ago
FengZiYjun	2df8eb740a	Updates to core, loader: - add Loss, Optimizer - change Trainer & Tester initialization interface: two styles of definition provided - handle Optimizer construction and loss function definition in a hard manner - add argparse in task-specific scripts. (seq_labeling.py & text_classify.py) - seq_labeling.py & text_classify.py work	7 years ago
FengZiYjun	fac830e1cd	fix bugs and clean up	7 years ago
FengZiYjun	77b3a0c67d	fastNLP high-level interface: - fastNLP interface for sequence labeling works - fastNLP interface for text classification works	7 years ago
FengZiYjun	80a127cb24	merge jianghao's code	7 years ago
FengZiYjun	c1d7c5d7da	changes to action, trainer and tester: - rename "POSTrainer" to "SeqLabelTrainer" - add text classification test data - update make_batch in Trainer and Tester	7 years ago
FengZiYjun	233e8328f7	changes to seq label model, - [model] optimize cuda support in seq labeling model - [test] add test data "pku" for chinese word seg - test_tester.py and test_cws.py is OK to run!	7 years ago
FengZiYjun	242e576a30	changes to trainer, tester, preprocessor, etc. - [tester][trainer] add cuda support - [preprocess] fix label2index for padding label seq - update README.md - [test] add test_tester.py - rename "action" to "core"	7 years ago
FengZiYjun	c83008add9	fastnlp.py works, see test/test_fastNLP.py for high-level API	7 years ago
FengZiYjun	fe17f611b6	changes to preprocessor, trainer, inference & seq modeling - [trainer]rename "batchify" to "make_batch" in trainer - [trainer]pack (batch_x_pad, seq_len) into batch_x in make_batch for seq labeling, because seq length before pad is needed to make masks - [trainer]unpack it in data_forward - [model]shorten model definition - [inference]build inference class. test_POS_pipeline.py is OK to infer - [preprocessor]handle pickles in a nicer manner - [FastNLP] add fastNLP.py as high-level API, not finished yet	7 years ago
FengZiYjun	3d234bf5b2	add model selection (best dev) in Trainer: save the best model during validation; add Inference.	7 years ago
FengZiYjun	301bbdcd1e	add accuracy in POS Tester; optimize evaluation output in Trainer; keep POS pipeline (loader + trainer + tester + saver) OK; add codes borrowed from FudanParser.	7 years ago
FengZiYjun	621b79ee19	update configLoader to load hyper-parameters from file	7 years ago
FengZiYjun	cca276b8c0	- optimize package calling from test files - add people.txt in data_for_tests - To do: incorrect CRF param in POS_pipeline	7 years ago
FengZiYjun	32652407df	restructure files & add "modules" directory & add CRF.py	7 years ago

40 Commits (cccc1bfd57e1ec05fbb997346cf54e906837b399)