ml.utils.distributed
Defines distributed training parameters.
These parameters apply to any distributed training jobs. For model-parallel
training, please refer to ml.models.parallel.env.
RANK: The rank of the current process.WORLD_SIZE: The total number of processes.MASTER_ADDR: The address of the master process.MASTER_PORT: The port of the master process.INIT_METHOD: The method to initialize the process group.