Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
|
|
4 years ago | |
|---|---|---|
| .. | ||
| README.md | 4 years ago | |
| __init__.py | 5 years ago | |
| get_distribute_pretrain_cmd.py | 5 years ago | |
| hyper_parameter_config.ini | 5 years ago | |
The number of Ascend accelerators can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that.
For example, if we want to generate the launch command of the distributed training of Bert model on Ascend accelerators, we can run the following command in /bert/ dir:
python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
output:
hccl_config_dir: model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
the number of logical core: 192
avg_core_per_rank: 96
rank_size: 2
start training for rank 0, device 5:
rank_id: 0
device_id: 5
core nums: 0-95
epoch_size: 8
data_dir: /data/small_512/
schema_dir:
log file dir: ./LOG5/log.txt
start training for rank 1, device 6:
rank_id: 1
device_id: 6
core nums: 96-191
epoch_size: 8
data_dir: /data/small_512/
schema_dir:
log file dir: ./LOG6/log.txt
Note that hccl_2p_56_x.x.x.x.json can use hccl_tools.py to generate.
For hyper parameter, please note that you should customize the scripts hyper_parameter_config.ini. Please note that these two hyper parameters are not allowed to be configured here:
For Other Model, please note that you should customize the option run_script and Corresponding hyper_parameter_config.ini.
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
C++ Python Text Unity3D Asset C other