* 添加fast_load_embedding方法,用vocab的词索引pre-trained中的embedding
* 如果vocab有词没出现在pre-train中,从已有embedding中正态采样
Update embed_loader:
* add fast_load_embedding method, to index pre-trained embedding with words in Vocab
* If words in Vocab are not exist in pre-trained, sample them from normal distribution computed by current embeddings
* 增量添加单词到词典中
* lazy update: 当用到词典的时候才重新build
* 当新添加的词导致词典大小超出限制时,打印一个warning
Update Vocabulary:
* More words can be added after the building.
* Lazy update: rebuild automatically when vocab is used.
* print warning when max size is reached
* In init, detect content type to be Python int, float, or str.
* In append(), check type consistence.
* In init & append(), int will be cast into float if they occur together.
* Map Python type into numpy dtype
* Raise error if type detection fails.
* refine interface of set_target & set_input
* rename DataSet.Instance into DataSet.DataSetIter
* remove unused methods in DataSet.DataSetIter
* remove __setattr__ in DataSet; It is dangerous.
* comment adjustment
* rename preprocess.py into utils.py, because nothing about preprocess in it
* anything in loader/ and saver/ is moved directly into io/
* corresponding unit tests are moved to /test/io
* delete fastnlp.py, because we have new and better APIs
* rename Biaffine_parser/run_test.py to Biaffine_parser/main.py; Otherwise, test will fail.
* A looooooooooot of ancient codes to be refined...........
- DataSet's __init__ takes a function as argument, rather than class object
- Preprocessor is about to remove. Don't use anymore.
- Remove cross_validate in trainer, because it is rarely used and wired
- Loader.load is expected to be a static method
- Delete sth. in other_modules.py
- Add more tests
- Delete extra sample data
2. changes of names:
aggregation ----> aggregator
interaction ----> interactor
action.py ----> sampler.py
BasePreprocess ---> Preprocessor
BaseTester ----> Tester
BaseTrainer ----> Trainer
3. add more code comments
4. fix bugs in predictor's data_forward
5. in sampler.py, remove Bachifier, fix some codes. but not test
6. remove unused codes in other_modules.py & utils.py
7. update fastnlp.py with new config file names and code comments
8. add data examples in data_for_tests/
- apply DataSet in Predictor; remove sub-predictors; add "task" argument to specify which task to predict, as how Trainer/Tester did.
- remove Action class
- add helper function for DataSet, to create DataSet easily
- more code comments
- clean up unnecessary codes
- add unit tests for Batch, Predictor, Preprocessor, Trainer, Tester
- update LabelField's to_tensor method to support int & str single label
- update preprocessor's convert_to_dataset method to support single label inputs
- introduce "task" in Trainer/Tester's data_forward, Tester's evaluate and metrics methods
- in cnn_text_classification.py, change the name of the argument of forward
- in sequence_modeling.py, change the name of the argument of forward
- minor adjustments in test codes
- text_classify.py works
- add DataSet, Instance, Field to represent data in different levels
- encapsulate batching method in Batch class
- modify samplers in action.py to fit Batch
- preprocessor.run returns DataSet, instead of list
- Use Batch in Trainer/Tester
- add required_arg "task" in Trainer/Tester
- remove SeqLabelTrainer/SeqLabelTester dependencies successfully. They empty classes to deprecate.
- modify SeqLabeling model, add another argument in forward, in order to compute mask inside model
- test\model\seq_labeling.py works