You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 1.7 kB

5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
  1. # Run distribute pretrain
  2. ## description
  3. The number of Ascend accelerators can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that.
  4. ## how to use
  5. For example, if we want to generate the launch command of the distributed training of Bert model on Ascend accelerators, we can run the following command in `/bert/` dir:
  6. ```python
  7. python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
  8. ```
  9. output:
  10. ```python
  11. hccl_config_dir: model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
  12. the number of logical core: 192
  13. avg_core_per_rank: 96
  14. rank_size: 2
  15. start training for rank 0, device 5:
  16. rank_id: 0
  17. device_id: 5
  18. core nums: 0-95
  19. epoch_size: 8
  20. data_dir: /data/small_512/
  21. schema_dir:
  22. log file dir: ./LOG5/log.txt
  23. start training for rank 1, device 6:
  24. rank_id: 1
  25. device_id: 6
  26. core nums: 96-191
  27. epoch_size: 8
  28. data_dir: /data/small_512/
  29. schema_dir:
  30. log file dir: ./LOG6/log.txt
  31. ```
  32. ## Note
  33. 1. Note that `hccl_2p_56_x.x.x.x.json` can use [hccl_tools.py](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools) to generate.
  34. 2. For hyper parameter, please note that you should customize the scripts `hyper_parameter_config.ini`. Please note that these two hyper parameters are not allowed to be configured here:
  35. - device_id
  36. - device_num
  37. - data_dir
  38. 3. For Other Model, please note that you should customize the option `run_script` and Corresponding `hyper_parameter_config.ini`.