ml.utils.mixed_precision
Defines functions used for mixed precision training.
- ml.utils.mixed_precision.get_weight_norm(parameters: Iterable[Parameter], norm_type: float = 2.0, foreach: bool | None = None) Tensor [source]
Computes the norm of an iterable of parameters.
The norm is computed over all parameters together, as if they were concatenated into a single vector.
- Parameters:
parameters – An iterable of the model parameters.
norm_type – The type of the used p-norm.
foreach – Use the faster foreach-based implementation.
- Returns:
The total norm of the parameters (viewed as a single vector).
- ml.utils.mixed_precision.get_grad_norm(parameters: Iterable[Parameter], norm_type: float = 2.0, foreach: bool | None = None) tuple[torch.Tensor, dict[tuple[torch.device, torch.dtype], tuple[list[list[torch.Tensor]], list[int]]]] [source]
- ml.utils.mixed_precision.clip_grad_norm_(parameters: Iterable[Parameter], max_norm: float, norm_type: float = 2.0, foreach: bool | None = None) tuple[torch.Tensor, bool] [source]
Clips gradient norm of an iterable of parameters.
The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.
- Parameters:
parameters – An iterable of the model parameters.
max_norm – The maximum norm of the gradients.
norm_type – The type of the used p-norm.
foreach – Use the faster foreach-based implementation. If
None
, use the foreach implementation for CUDA and CPU native tensors and silently fall back to the slow implementation for other device types. IfTrue
orFalse
, use the foreach or non-foreach implementation, respectively, and raise an error if the chosen implementation is not available.
- Returns:
The total norm of the parameters (viewed as a single vector) and whether the parameters were successfully clipped.