ml.optimizers.common

Common optimizer utilities.

ml.optimizers.common.separate_decayable_params(model: Module, default_decay: bool, weight_decay: float) → Iterable[dict[str, Any]][source]

Don’t weight decay biases.

This is mostly taken from nanoGPT.

Parameters:

model – The model to get the parameters for
default_decay – Whether to decay by default (for modules which aren’t explicitly specified)
weight_decay – The weight decay to use

Returns:

The dictionary to pass to the optimizer

ml.optimizers.common.can_use_fused(model: Module) → bool[source]

ml.optimizers.common.can_use_foreach(model: Module) → bool[source]