ml.optimizers.common

Common optimizer utilities.

ml.optimizers.common.separate_decayable_params(model: Module, default_decay: bool, weight_decay: float) Iterable[dict[str, Any]][source]

Don’t weight decay biases.

This is mostly taken from nanoGPT.

Parameters:
  • model – The model to get the parameters for

  • default_decay – Whether to decay by default (for modules which aren’t explicitly specified)

  • weight_decay – The weight decay to use

Returns:

The dictionary to pass to the optimizer

ml.optimizers.common.can_use_fused(model: Module) bool[source]
ml.optimizers.common.can_use_foreach(model: Module) bool[source]