|
@@ -0,0 +1,36 @@ |
|
|
|
|
|
## Introduction |
|
|
|
|
|
This is the implementation of [Hierarchical Attention Networks for Document Classification](https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf) paper in PyTorch. |
|
|
|
|
|
* Dataset is 600k documents extracted from [Yelp 2018](https://www.yelp.com/dataset) customer reviews |
|
|
|
|
|
* Use [NLTK](http://www.nltk.org/) and [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) to tokenize documents and sentences |
|
|
|
|
|
* Both CPU & GPU support |
|
|
|
|
|
* The best accuracy is 71%, reaching the same performance in the paper |
|
|
|
|
|
|
|
|
|
|
|
## Requirement |
|
|
|
|
|
* python 3.6 |
|
|
|
|
|
* pytorch = 0.3.0 |
|
|
|
|
|
* numpy |
|
|
|
|
|
* gensim |
|
|
|
|
|
* nltk |
|
|
|
|
|
* coreNLP |
|
|
|
|
|
|
|
|
|
|
|
## Parameters |
|
|
|
|
|
According to the paper and experiment, I set model parameters: |
|
|
|
|
|
|word embedding dimension|GRU hidden size|GRU layer|word/sentence context vector dimension| |
|
|
|
|
|
|---|---|---|---| |
|
|
|
|
|
|200|50|1|100| |
|
|
|
|
|
|
|
|
|
|
|
And the training parameters: |
|
|
|
|
|
|Epoch|learning rate|momentum|batch size| |
|
|
|
|
|
|---|---|---|---| |
|
|
|
|
|
|3|0.01|0.9|64| |
|
|
|
|
|
|
|
|
|
|
|
## Run |
|
|
|
|
|
1. Prepare dataset. Download the [data set](https://www.yelp.com/dataset), and unzip the custom reviews as a file. Use preprocess.py to transform file into data set foe model input. |
|
|
|
|
|
2. Train the model. Word enbedding of train data in 'yelp.word2vec'. The model will trained and autosaved in 'model.dict' |
|
|
|
|
|
``` |
|
|
|
|
|
python train |
|
|
|
|
|
``` |
|
|
|
|
|
3. Test the model. |
|
|
|
|
|
``` |
|
|
|
|
|
python evaluate |
|
|
|
|
|
``` |