ml.optimizers.common
Common optimizer utilities.
- ml.optimizers.common.separate_decayable_params(model: Module, default_decay: bool, weight_decay: float) Iterable[dict[str, Any]] [source]
Don’t weight decay biases.
This is mostly taken from nanoGPT.
- Parameters:
model – The model to get the parameters for
default_decay – Whether to decay by default (for modules which aren’t explicitly specified)
weight_decay – The weight decay to use
- Returns:
The dictionary to pass to the optimizer