ml.utils.mixed_precision

Defines functions used for mixed precision training.

ml.utils.mixed_precision.get_weight_norm(parameters: Iterable[Parameter], norm_type: float = 2.0, foreach: bool | None = None) → Tensor[source]

Computes the norm of an iterable of parameters.

The norm is computed over all parameters together, as if they were concatenated into a single vector.

Parameters:

parameters – An iterable of the model parameters.
norm_type – The type of the used p-norm.
foreach – Use the faster foreach-based implementation.

Returns:

The total norm of the parameters (viewed as a single vector).

ml.utils.mixed_precision.get_grad_norm(parameters: Iterable[Parameter], norm_type: float = 2.0, foreach: bool | None = None) → tuple[torch.Tensor, dict[tuple[torch.device, torch.dtype], tuple[list[list[torch.Tensor]], list[int]]]][source]

ml.utils.mixed_precision.clip_grad_norm_(parameters: Iterable[Parameter], max_norm: float, norm_type: float = 2.0, foreach: bool | None = None) → tuple[torch.Tensor, bool][source]

Clips gradient norm of an iterable of parameters.

The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

Parameters:

parameters – An iterable of the model parameters.
max_norm – The maximum norm of the gradients.
norm_type – The type of the used p-norm.
foreach – Use the faster foreach-based implementation. If None, use the foreach implementation for CUDA and CPU native tensors and silently fall back to the slow implementation for other device types. If True or False, use the foreach or non-foreach implementation, respectively, and raise an error if the chosen implementation is not available.

Returns:

The total norm of the parameters (viewed as a single vector) and whether the parameters were successfully clipped.