ml.launchers.mp
Defines a launcher for multiprocess training.
This can be used with distributed data parallel (DDP) or fully sharded data parallel (FSDP) training. The launcher will spawn a process for each device and initialize the process group for DDP or FSDP training.
This launcher expects to run on a single machine with one or more GPUs.
- ml.launchers.mp.process_main(cfg: MultiprocessConfig, raw_config: DictConfig) None [source]
- class ml.launchers.mp.MultiProcessLauncherConfig(name: str = '???', multiprocess: ml.utils.torch_distributed.MultiprocessConfig = <factory>)[source]
Bases:
BaseLauncherConfig
- multiprocess: MultiprocessConfig
- classmethod resolve(config: MultiProcessLauncherConfig) None [source]
Runs post-construction config resolution.
- Parameters:
config – The config to resolve