gpt-3-small

Fields

Field

Default

Description

Type

name

???

The referenced name of the object to construct

str

lr

0.0006

Learning rate

float

betas

(0.9, 0.95)

Beta coefficients

tuple[float, float]

eps

1e-05

Epsilon term to add to the denominator for stability

float

weight_decay

0.1

Weight decay regularization to use

float

amsgrad

False

Whether to use the AMSGrad variant of the algorithm

bool

default_decay

True

Whether to decay module params which aren’t explicitly specified

bool

foreach

None

Whether to use the foreach variant of the optimizer

bool | None

capturable

False

Whether to use capturable AdamW pathway

bool

differentiable

False

Whether to use differentiable AdamW

bool

fused

None

Whether to use the fused optimizer

bool | None