ml.launchers.torchrun
Defines a launcher which uses torchrun to launch a job.
This is a light-weight werapper around PyTorch’s torch.distributed.launch script. It is used to launch a job on a single node with multiple processes, each with multiple devices.
- class ml.launchers.torchrun.TorchRunLauncherConfig(name: str = '???', nproc_per_node: int = '???', master_addr: str = '127.0.0.1', master_port: int = '???', backend: str = 'nccl', start_method: str = 'spawn', torchrun_path: str = '???')[source]
Bases:
BaseLauncherConfig
- nproc_per_node: int = '???'
- master_addr: str = '127.0.0.1'
- master_port: int = '???'
- backend: str = 'nccl'
- start_method: str = 'spawn'
- torchrun_path: str = '???'
- classmethod resolve(config: TorchRunLauncherConfig) None [source]
Runs post-construction config resolution.
- Parameters:
config – The config to resolve
- class ml.launchers.torchrun.TorchRunLauncher(config: BaseConfigT)[source]
Bases:
BaseLauncher
[TorchRunLauncherConfig
]