ml.utils.parallel

Utility functions for configuring distributed parallel training.

Distributed training is broken up into three types of parallelism:

Model Parallelism

Model parallelism partitions a single layer across multiple GPUs. During the forward pass, within a layer, different GPUs perform different parts of the computation, then communicate the results to each other.

Data Parallelism

Data parallelism splits the data across multiple GPUs. During the forward pass, each GPU performs the same computation on different data, then communicates the results to each other.

Pipeline Parallelism

Pipeline parallelism splits the model across multiple GPUs. During the forward pass, the output of one layer is computed on one GPU, then passed to the next layer on another GPU.

Parallelism Example

Consider doing distributed training on a model with 8 total GPUs. The model is split length-wise into two parts, and each part is split width-wise into two more parts. This gives a model parallelism of 4 and a data parallelism of 2.

The model parallel groups are then [[0, 1], [2, 3], [4, 5], [6, 7]]. This means that when GPUs 0 and 1 are finished computing their part of some layer, they will communicate the results to each other. The same is true for the other pairs of GPUs.

The pipeline parallel groups are [[0, 2], [1, 3], [4, 6], [5, 7]]. This means that when GPU 0 is finished computing its part of some layer and syncing with GPU 1, it will communicate the output to GPU 2.

The data parallel groups are [[0, 4], [1, 5], [2, 6], [3, 7]]. This means that each minibatch will be split in half, with one half being sent to GPUS [0, 1, 2, 3] and the other half being sent to GPUs [4, 5, 6, 7].

So in summary, the resulting groups are:

  • Model parallel groups: [[0, 1], [2, 3], [4, 5], [6, 7]]

  • Data parallel groups: [[0, 4], [1, 5], [2, 6], [3, 7]]

  • Pipeline parallel groups: [[0, 2], [1, 3], [4, 6], [5, 7]]

ml.utils.parallel.parallel_group_info() _GroupsInfos[source]
ml.utils.parallel.default_group_info() _GroupInfo | None[source]
exception ml.utils.parallel.ParallismError[source]

Bases: Exception

ml.utils.parallel.init_parallelism(model_parallelism: int = 1, pipeline_parallelism: int = 1, *, mp_backend: str | Backend | None = None, pp_backend: str | Backend | None = None, dp_backend: str | Backend | None = None) None[source]

Initializes parallelism groups and parameters.

Parameters:
  • model_parallelism – Number of model parallel GPUs. Each layer of computation will simultaneously run on this many GPUs.

  • pipeline_parallelism – Number of pipeline parallel layers. The total number of GPUs processing a single input will be the product of model_parallelism and pipeline_parallelism.

  • mp_backend – Backend to use for model parallelism.

  • pp_backend – Backend to use for pipeline parallelism.

  • dp_backend – Backend to use for data parallelism.

Raises:

ParallismError – If some settings are invalid.

ml.utils.parallel.parallelism_is_initialized() bool[source]
ml.utils.parallel.reset_parallelism() None[source]