Yige Xu
5d1c2a7ac3
add test code and data for testing CHN NER and classification loader and pipe
6 years ago
Yige Xu
5a2820cd18
add test code and data for testing cws loader and pipe
6 years ago
Yige Xu
9509c5dd08
move dataset test data to test/data_for_tests/io dir
6 years ago
benbijituo
8b31ca74ab
add test data&follow Matching
6 years ago
ChenXin
d582bd3e15
Delete train.tsv
6 years ago
ChenXin
f2face3b40
Delete test.tsv
6 years ago
ChenXin
7b08c777bc
Delete dev.tsv
6 years ago
Yige Xu
64fc8bc1e5
1. update classification and matching loader and pipe; 2. add data and test codes for testing classification and matching loader and pipe.
6 years ago
Yige Xu
753327d214
fix code style in coreference task and related codes
6 years ago
xxliu
b015cc149c
undocumented
6 years ago
liuxiaoxiong
07bbb79f77
Merge pull request #4 from fastnlp/dev0.5.0
Dev0.5.0
6 years ago
xxliu
ea5fbc8881
增加注释
增加测试文件及测试样例
修改部分变量命名
6 years ago
Yige Xu
880e3ad969
1. add mini_elmo.pkl and test codes for testing ElmoEmbedding; 2. update bert testing codes
6 years ago
Yige Xu
b9aa05f6cf
add testing codes and data for loader and pipe.
6 years ago
Yige Xu
4440801dbf
1. update bert.py and fix a bug in bert_embedding to adapt torch 1.2.0; 2. update models/bert.py and add BertForSentenceMatching model, now a BertEmbedding param should be passed to these five models; 3. create a small bert version for testing and modify test/models/test_bert.py; 4. move small glove and word2vec files to data_for_tests/embedding/small_static_embedding dir and fix relevant test codes; 5. delete some __init__.py files in test dir.
6 years ago
xuyige
c2d687528e
fix bugs and add test codes for: 1. models.snli; 2. core.metrics.extractive_qa; 3. io.data_loader.mnli
6 years ago
ChenXin
881ce01762
Dev0.4.0 ( #149 )
* 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释
* BucketSampler增加一条错误检测
* 1.修改ClipGradientCallback的bug;删除LRSchedulerCallback中的print,之后应该传入pbar进行打印;2.增加MLP注释
* update MLP module
* 增加metric注释;修改trainer save过程中的bug
* Update README.md
fix tutorial link
* Add ENAS (Efficient Neural Architecture Search)
* add ignore_type in DataSet.add_field
* * AutoPadder will not pad when dtype is None
* add ignore_type in DataSet.apply
* 修复fieldarray中padder潜在bug
* 修复crf中typo; 以及可能导致数值不稳定的地方
* 修复CRF中可能存在的bug
* change two default init arguments of Trainer into None
* Changes to Callbacks:
* 给callback添加给定几个只读属性
* 通过manager设置这些属性
* 代码优化,减轻@transfer的负担
* * 将enas相关代码放到automl目录下
* 修复fast_param_mapping的一个bug
* Trainer添加自动创建save目录
* Vocabulary的打印,显示内容
* * 给vocabulary添加遍历方法
* 修复CRF为负数的bug
* add SQuAD metric
* add sigmoid activate function in MLP
* - add star transformer model
- add ConllLoader, for all kinds of conll-format files
- add JsonLoader, for json-format files
- add SSTLoader, for SST-2 & SST-5
- change Callback interface
- fix batch multi-process when killed
- add README to list models and their performance
* - fix test
* - fix callback & tests
* - update README
* 修改部分bug;调整callback
* 准备发布0.4.0版本“
* update readme
* support parallel loss
* 防止多卡的情况导致无法正确计算loss“
* update advance_tutorial jupyter notebook
* 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove.
2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。
3. 在utils中新增一个cache_result()修饰器,用于cache函数的返回值。
4. callback中新增update_every属性
* 1.DataSet.apply()报错时提供错误的index
2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序
3.embedloader在embed读取时遇到不规则的数据跳过这一行.
* update attention
* doc tools
* fix some doc errors
* 修改为中文注释,增加viterbi解码方法
* 样例版本
* - add pad sequence for lstm
- add csv, conll, json filereader
- update dataloader
- remove useless dataloader
- fix trainer loss print
- fix tests
* - fix test_tutorial
* 注释增加
* 测试文档
* 本地暂存
* 本地暂存
* 修改文档的顺序
* - add document
* 本地暂存
* update pooling
* update bert
* update documents in MLP
* update documents in snli
* combine self attention module to attention.py
* update documents on losses.py
* 对DataSet的文档进行更新
* update documents on metrics
* 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档
* 增加对Trainer的注释
* 完善了trainer,callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏
* update char level encoder
* update documents on embedding.py
* - update doc
* 补充注释,并修改部分代码
* - update doc
- add get_embeddings
* 修改了文档配置项
* 修改embedding为init_embed初始化
* 1.增加对Trainer和Tester的多卡支持;
* - add test
- fix jsonloader
* 删除了注释教程
* 给 dataset 增加了get_field_names
* 修复bug
* - add Const
- fix bugs
* 修改部分注释
* - add model runner for easier test models
- add model tests
* 修改了 docs 的配置和架构
* 修改了核心部分的一大部分文档,TODO:
1. 完善 trainer 和 tester 部分的文档
2. 研究注释样例与测试
* core部分的注释基本检查完成
* 修改了 io 部分的注释
* 全部改为相对路径引用
* 全部改为相对路径引用
* small change
* 1. 从安装文件中删除api/automl的安装
2. metric中存在seq_len的bug
3. sampler中存在命名错误,已修改
* 修复 bug :兼容 cpu 版本的 PyTorch
TODO:其它地方可能也存在类似的 bug
* 修改文档中的引用部分
* 把 tqdm.autonotebook 换成tqdm.auto
* - fix batch & vocab
* 上传了文档文件 *.rst
* 上传了文档文件和若干 TODO
* 讨论并整合了若干模块
* core部分的测试和一些小修改
* 删除了一些冗余文档
* update init files
* update const files
* update const files
* 增加cnn的测试
* fix a little bug
* - update attention
- fix tests
* 完善测试
* 完成快速入门教程
* 修改了sequence_modeling 命名为 sequence_labeling 的文档
* 重新 apidoc 解决改名的遗留问题
* 修改文档格式
* 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask
* 增加了一行提示
* 在文档中展示 dataset_loader
* 提示 Dataset.read_csv 会被 CSVLoader 替换
* 完成 Callback 和 Trainer 之间的文档
* index更新了部分
* 删除冗余的print
* 删除用于分词的metric,因为有可能引起错误
* 修改文档中的中文名称
* 完成了详细介绍文档
* tutorial 的 ipynb 文件
* 修改了一些介绍文档
* 修改了 models 和 modules 的主页介绍
* 加上了 titlesonly 这个设置
* 修改了模块文档展示的标题
* 修改了 core 和 io 的开篇介绍
* 修改了 modules 和 models 开篇介绍
* 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释
* 修改了一些注释
* delete an old metric in test
* 修改 tutorials 的测试文件
* 把暂不发布的功能移到 legacy 文件夹
* 删除了不能运行的测试
* 修改 callback 的测试文件
* 删除了过时的教程和测试文件
* cache_results 参数的修改
* 修改 io 的测试文件; 删除了一些过时的测试
* 修复bug
* 修复无法通过test_utils.py的测试
* 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar
* 1. 修复metric中的bug; 2.增加metric测试
* add model summary
* 增加别名
* 删除encoder中的嵌套层
* 修改了 core 部分 import 的顺序,__all__ 暴露的内容
* 修改了 models 部分 import 的顺序,__all__ 暴露的内容
* 修改了文件名
* 修改了 modules 模块的__all__ 和 import
* fix var runn
* 增加vocab的clear方法
* 一些符合 PEP8 的微调
* 更新了cache_results的例子
* 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index
* 修改了一个typo
* 修改了 README.md
* update documents on bert
* update documents on encoder/bert
* 增加一个fitlog callback,实现与fitlog实验记录
* typo
* - update dataset_loader
* 增加了到 fitlog 文档的链接。
* 增加了 DataSet Loader 的文档
* - add star-transformer reproduction
6 years ago
FengZiYjun
0c5630bd16
Ready for V0.3.1
* 升级parser API和模型
* update docs: add new pages for tutorials
* upgrade CWS api download source
* add a new method for dataset field access
* add introduction for bert
* add more unit tests for api/processor
* remove unused test data. Add new test data.
6 years ago
hazelnutsgz
5f4ab131ac
Add a loader for conll2003 dataset
6 years ago
FengZiYjun
720a264eb3
* rename DataSet.get_fields() into get_all_fields()
* add DataSet.get_field(), to fetch a FieldArray based on its name
* remove old tutorials & add new tutorials
7 years ago
FengZiYjun
cc440b5ed6
All tests pass.
* 更新测试代码,跑通所有测试,覆盖率65%
* refine代码规范和某些注释
* fix tester self.use_cuda未赋值先使用的bug
* 添加tutorial样例数据——tutorial_sample_dataset.csv
* 【unsolved】embed_loader在计算np.cov时遇到segmentation fault
7 years ago
FengZiYjun
3120cdd09a
更新embed_loader:
* 添加fast_load_embedding方法,用vocab的词索引pre-trained中的embedding
* 如果vocab有词没出现在pre-train中,从已有embedding中正态采样
Update embed_loader:
* add fast_load_embedding method, to index pre-trained embedding with words in Vocab
* If words in Vocab are not exist in pre-trained, sample them from normal distribution computed by current embeddings
7 years ago
xuyige
b43d333738
clean some codes and fix some bugs
7 years ago
yunfan
2698094d8f
update embedding loader & vocab
7 years ago
FengZiYjun
5be4cb7bb5
Merge Preprocessor into DataSet.
- DataSet's __init__ takes a function as argument, rather than class object
- Preprocessor is about to remove. Don't use anymore.
- Remove cross_validate in trainer, because it is rarely used and wired
- Loader.load is expected to be a static method
- Delete sth. in other_modules.py
- Add more tests
- Delete extra sample data
7 years ago
FengZiYjun
28a0683853
1. add tests in test_fastNLP.py & test_sampler.py; increase test coverage to 81%
2. changes of names:
aggregation ----> aggregator
interaction ----> interactor
action.py ----> sampler.py
BasePreprocess ---> Preprocessor
BaseTester ----> Tester
BaseTrainer ----> Trainer
3. add more code comments
4. fix bugs in predictor's data_forward
5. in sampler.py, remove Bachifier, fix some codes. but not test
6. remove unused codes in other_modules.py & utils.py
7. update fastnlp.py with new config file names and code comments
8. add data examples in data_for_tests/
7 years ago
FengZiYjun
2df8eb740a
Updates to core, loader:
- add Loss, Optimizer
- change Trainer & Tester initialization interface: two styles of definition provided
- handle Optimizer construction and loss function definition in a hard manner
- add argparse in task-specific scripts. (seq_labeling.py & text_classify.py)
- seq_labeling.py & text_classify.py work
7 years ago
FengZiYjun
fac830e1cd
fix bugs and clean up
7 years ago
FengZiYjun
77b3a0c67d
fastNLP high-level interface:
- fastNLP interface for sequence labeling works
- fastNLP interface for text classification works
7 years ago
FengZiYjun
80a127cb24
merge jianghao's code
7 years ago
FengZiYjun
c1d7c5d7da
changes to action, trainer and tester:
- rename "POSTrainer" to "SeqLabelTrainer"
- add text classification test data
- update make_batch in Trainer and Tester
7 years ago
FengZiYjun
233e8328f7
changes to seq label model,
- [model] optimize cuda support in seq labeling model
- [test] add test data "pku" for chinese word seg
- test_tester.py and test_cws.py is OK to run!
7 years ago
FengZiYjun
242e576a30
changes to trainer, tester, preprocessor, etc.
- [tester][trainer] add cuda support
- [preprocess] fix label2index for padding label seq
- update README.md
- [test] add test_tester.py
- rename "action" to "core"
7 years ago
FengZiYjun
c83008add9
fastnlp.py works, see test/test_fastNLP.py for high-level API
7 years ago
FengZiYjun
fe17f611b6
changes to preprocessor, trainer, inference & seq modeling
- [trainer]rename "batchify" to "make_batch" in trainer
- [trainer]pack (batch_x_pad, seq_len) into batch_x in make_batch for seq labeling, because seq length before pad is needed to make masks
- [trainer]unpack it in data_forward
- [model]shorten model definition
- [inference]build inference class. test_POS_pipeline.py is OK to infer
- [preprocessor]handle pickles in a nicer manner
- [FastNLP] add fastNLP.py as high-level API, not finished yet
7 years ago
FengZiYjun
3d234bf5b2
add model selection (best dev) in Trainer: save the best model during validation; add Inference.
7 years ago
FengZiYjun
301bbdcd1e
add accuracy in POS Tester;
optimize evaluation output in Trainer;
keep POS pipeline (loader + trainer + tester + saver) OK;
add codes borrowed from FudanParser.
7 years ago
FengZiYjun
621b79ee19
update configLoader to load hyper-parameters from file
7 years ago
FengZiYjun
cca276b8c0
- optimize package calling from test files
- add people.txt in data_for_tests
- To do: incorrect CRF param in POS_pipeline
7 years ago
FengZiYjun
32652407df
restructure files & add "modules" directory & add CRF.py
7 years ago