Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
yunfan a3738b3d3c | 5 years ago | |
---|---|---|
.. | ||
data | 5 years ago | |
model | 5 years ago | |
readme.md | 5 years ago | |
train_bilstm_crf.py | 5 years ago | |
train_shift_relay.py | 5 years ago |
四个数据集的统计信息,最原始的数据可以从http://sighan.cs.uchicago.edu/bakeoff2005/下载。
pku | # of sents | # of tokens |
---|---|---|
train | 17173 | 1650222 |
dev | 1881 | 176226 |
test | 1944 | 172733 |
total | 20998 | 1999181 |
cityu | # of sents | # of tokens |
---|---|---|
train | 47696 | 2164907 |
dev | 5323 | 238447 |
test | 1492 | 67690 |
total | 54511 | 2471044 |
msra | # of sents | # of tokens |
---|---|---|
train | 78242 | 3644550 |
dev | 8676 | 405919 |
test | 3985 | 184355 |
total | 90903 | 4234824 |
as | # of sents | # of tokens |
---|---|---|
train | 638273 | 7536586 |
dev | 70680 | 831464 |
test | 14429 | 197681 |
total | 723382 | 8565731 |
一款轻量级的自然语言处理(NLP)工具包,目标是减少用户项目中的工程型代码,例如数据处理循环、训练循环、多卡运行等
Python Jupyter Notebook Text CSV Markdown