ml.utils.distributed

Defines distributed training parameters.

These parameters apply to any distributed training jobs. For model-parallel training, please refer to ml.models.parallel.env.

  • RANK: The rank of the current process.

  • WORLD_SIZE: The total number of processes.

  • MASTER_ADDR: The address of the master process.

  • MASTER_PORT: The port of the master process.

  • INIT_METHOD: The method to initialize the process group.

ml.utils.distributed.set_rank(rank: int) None[source]
ml.utils.distributed.get_rank_optional() int | None[source]
ml.utils.distributed.get_rank() int[source]
ml.utils.distributed.set_local_rank(rank: int) None[source]
ml.utils.distributed.get_local_rank_optional() int | None[source]
ml.utils.distributed.get_local_rank() int[source]
ml.utils.distributed.set_world_size(world_size: int) None[source]
ml.utils.distributed.get_world_size_optional() int | None[source]
ml.utils.distributed.get_world_size() int[source]
ml.utils.distributed.set_local_world_size(local_world_size: int) None[source]
ml.utils.distributed.get_local_world_size_optional() int | None[source]
ml.utils.distributed.get_local_world_size() int[source]
ml.utils.distributed.set_master_addr(master_addr: str) None[source]
ml.utils.distributed.get_master_addr() str[source]
ml.utils.distributed.set_master_port(port: int) None[source]
ml.utils.distributed.get_master_port() int[source]
ml.utils.distributed.is_master() bool[source]
ml.utils.distributed.is_distributed() bool[source]
ml.utils.distributed.get_init_method() str[source]
ml.utils.distributed.set_init_method(init_method: str) None[source]
ml.utils.distributed.get_random_port(default: int = 1337) int[source]
ml.utils.distributed.set_dist(rank: int, local_rank: int, world_size: int, local_world_size: int, master_addr: str, master_port: int, init_method: str) None[source]