Add GPT-3 tensor parallel finetuning, adjust some distributed codes to make tensor and data parallel compatible. Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10949507