ml.launchers.mp

Defines a launcher for multiprocess training.

This can be used with distributed data parallel (DDP) or fully sharded data parallel (FSDP) training. The launcher will spawn a process for each device and initialize the process group for DDP or FSDP training.

This launcher expects to run on a single machine with one or more GPUs.

ml.launchers.mp.process_main(cfg: MultiprocessConfig, raw_config: DictConfig) None[source]
class ml.launchers.mp.MultiProcessLauncherConfig(name: str = '???', multiprocess: ml.utils.torch_distributed.MultiprocessConfig = <factory>)[source]

Bases: BaseLauncherConfig

multiprocess: MultiprocessConfig
classmethod resolve(config: MultiProcessLauncherConfig) None[source]

Runs post-construction config resolution.

Parameters:

config – The config to resolve

class ml.launchers.mp.MultiProcessLauncher(config: BaseConfigT)[source]

Bases: BaseLauncher[MultiProcessLauncherConfig]

launch() None[source]

Launches the training process.