You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 7.8 kB

4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
  1. <div align=center>
  2. <img src="https://forgeplus.trustie.net/repo/PKU-DAIR/Hetu/raw/branch/master/img/hetu.png?raw=true" width="300" />
  3. </div>
  4. # HETU
  5. <!--- [![license](https://img.shields.io/github/license/apache/zookeeper?color=282661)](LICENSE) --->
  6. [Documentation](https://hetu-doc.readthedocs.io) | [Examples](https://hetu-doc.readthedocs.io/en/latest/Overview/performance.html)
  7. Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by <a href="http://net.pku.edu.cn/~cuibin/" target="_blank" rel="nofollow">DAIR Lab</a> at Peking University. It takes account of both high availability in industry and innovation in academia, which has a number of advanced characteristics:
  8. - Applicability. DL model definition with standard dataflow graph; many basic CPU and GPU operators; efficient implementation of more than plenty of DL models and at least popular 10 ML algorithms.
  9. - Efficiency. Achieve at least 30% speedup compared to TensorFlow on DNN, CNN, RNN benchmarks.
  10. - Flexibility. Supporting various parallel training protocols and distributed communication architectures, such as Data/Model/Pipeline parallel; Parameter server & AllReduce.
  11. - Scalability. Deployment on more than 100 computation nodes; Training giant models with trillions of model parameters, e.g., Criteo Kaggle, Open Graph Benchmark
  12. - Agility. Automatically ML pipeline: feature engineering, model selection, hyperparameter search.
  13. We welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to [Contribution Guide](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/CONTRIBUTING.md) for more details.
  14. ## Installation
  15. 1. Clone the repository.
  16. 2. Prepare the environment. We use Anaconda to manage packages. The following command create the conda environment to be used:
  17. ```conda env create -f environment.yml``` .
  18. Please prepare Cuda toolkit and CuDNN in advance.
  19. 3. We use CMake to compile Hetu. Please copy the example configuration for compilation by `cp cmake/config.example.cmake cmake/config.cmake`. Users can modify the configuration file to enable/disable the compilation of each module. For advanced users (who not using the provided conda environment), the prerequisites for different modules in Hetu is listed in appendix.
  20. ```bash
  21. # modify paths and configurations in cmake/config.cmake
  22. # generate Makefile
  23. mkdir build && cd build && cmake ..
  24. # compile
  25. # make all
  26. make -j 8
  27. # make hetu, version is specified in cmake/config.cmake
  28. make hetu -j 8
  29. # make allreduce module
  30. make allreduce -j 8
  31. # make ps module
  32. make ps -j 8
  33. # make geometric module
  34. make geometric -j 8
  35. # make hetu-cache module
  36. make hetu_cache -j 8
  37. ```
  38. 4. Prepare environment for running. Edit the hetu.exp file and set the environment path for python and the path for executable mpirun if necessary (for advanced users not using the provided conda environment). Then execute the command `source hetu.exp` .
  39. ## Usage
  40. Train logistic regression on gpu:
  41. ```bash
  42. bash examples/cnn/scripts/hetu_1gpu.sh logreg MNIST
  43. ```
  44. Train a 3-layer mlp on gpu:
  45. ```bash
  46. bash examples/cnn/scripts/hetu_1gpu.sh mlp CIFAR10
  47. ```
  48. Train a 3-layer cnn with gpu:
  49. ```bash
  50. bash examples/cnn/scripts/hetu_1gpu.sh cnn_3_layers MNIST
  51. ```
  52. Train a 3-layer mlp with allreduce on 8 gpus (use mpirun):
  53. ```bash
  54. bash examples/cnn/scripts/hetu_8gpu.sh mlp CIFAR10
  55. ```
  56. Train a 3-layer mlp with PS on 1 server and 2 workers:
  57. ```bash
  58. # in the script we launch the scheduler and server, and two workers
  59. bash examples/cnn/scripts/hetu_2gpu_ps.sh mlp CIFAR10
  60. ```
  61. ## More Examples
  62. Please refer to examples directory, which contains CNN, NLP, CTR, GNN training scripts. For distributed training, please refer to CTR and GNN tasks.
  63. ## Community
  64. * Email: xupeng.miao@pku.edu.cn
  65. * Slack: coming soon
  66. * Hetu homepage: https://hetu-doc.readthedocs.io
  67. * [Committers & Contributors](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/COMMITTERS.md)
  68. * [Contributing to Hetu](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/CONTRIBUTING.md)
  69. * [Development plan](https://hetu-doc.readthedocs.io/en/latest/plan.html)
  70. ## Enterprise Users
  71. If you are enterprise users and find Hetu is useful in your work, please let us know, and we are glad to add your company logo here.
  72. <img src="https://forgeplus.trustie.net/repo/PKU-DAIR/Hetu/raw/branch/master/img/tencent.png?raw=true" width = "200"/>
  73. <br><br>
  74. <img src="https://forgeplus.trustie.net/repo/PKU-DAIR/Hetu/raw/branch/master/img/alibabacloud.png?raw=true" width = "200"/>
  75. <br><br>
  76. <img src="https://forgeplus.trustie.net/repo/PKU-DAIR/Hetu/raw/branch/master/img/kuaishou.png?raw=true" width = "200"/>
  77. ## License
  78. The entire codebase is under [license](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/LICENSE)
  79. ## Papers
  80. 1. Xupeng Miao, Linxiao Ma, Zhi Yang, Yingxia Shao, Bin Cui, Lele Yu, Jiawei Jiang. [CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs.](https://ieeexplore.ieee.org/document/9261124). TKDE 2021, ICDE 2021
  81. 2. Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma, Bin Cui. [Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce](https://doi.org/10.1145/3448016.3452773) SIGMOD 2021
  82. 3. coming soon
  83. ## Acknowledgements
  84. We learned and borrowed insights from a few open source projects including [TinyFlow](https://github.com/tqchen/tinyflow), [autodist](https://github.com/petuum/autodist), [tf.distribute](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/distribute) and [Angel](https://github.com/Angel-ML/angel).
  85. ## Appendix
  86. The prerequisites for different modules in Hetu is listed as follows:
  87. ```
  88. "*" means you should prepare by yourself, while others support auto-download
  89. Hetu: OpenMP(*), CMake(*)
  90. Hetu (version mkl): MKL 1.6.1
  91. Hetu (version gpu): CUDA 10.1(*), CUDNN 7.5(*)
  92. Hetu (version all): both
  93. Hetu-AllReduce: MPI 3.1, NCCL 2.8(*), this module needs GPU version
  94. Hetu-PS: Protobuf(*), ZeroMQ 4.3.2
  95. Hetu-Geometric: Pybind11(*), Metis(*)
  96. Hetu-Cache: Pybind11(*), this module needs PS module
  97. ##################################################################
  98. Tips for preparing the prerequisites
  99. Preparing CUDA, CUDNN, NCCL(NCCl is already in conda environment):
  100. 1. download from https://developer.nvidia.com
  101. 2. install
  102. 3. modify paths in cmake/config.cmake if necessary
  103. Preparing OpenMP:
  104. Your just need to ensure your compiler support openmp.
  105. Preparing CMake, Protobuf, Pybind11, Metis:
  106. Install by anaconda:
  107. conda install cmake=3.18 libprotobuf pybind11=2.6.0 metis
  108. Preparing OpenMPI (not necessary):
  109. install by anaconda: `conda install -c conda-forge openmpi=4.0.3`
  110. or
  111. 1. download from https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.3.tar.gz
  112. 2. build openmpi by `./configure /path/to/build && make -j8 && make install`
  113. 3. modify MPI_HOME to /path/to/build in cmake/config.cmake
  114. Preparing MKL (not necessary):
  115. install by anaconda: `conda install -c conda-forge onednn`
  116. or
  117. 1. download from https://github.com/intel/mkl-dnn/archive/v1.6.1.tar.gz
  118. 2. build mkl by `mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8`
  119. 3. modify MKL_ROOT to /path/to/root and MKL_BUILD to /path/to/build in cmake/config.cmake
  120. Preparing ZeroMQ (not necessary):
  121. install by anaconda: `conda install -c anaconda zeromq=4.3.2`
  122. or
  123. 1. download from https://github.com/zeromq/libzmq/releases/download/v4.3.2/zeromq-4.3.2.zip
  124. 2. build zeromq by 'mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8`
  125. 3. modify ZMQ_ROOT to /path/to/build in cmake/config.cmake
  126. ```