roberta-base
Field |
Default |
Description |
Type |
---|---|---|---|
name |
??? |
The referenced name of the object to construct |
str |
lr |
0.0006 |
Learning rate |
float |
betas |
(0.9, 0.98) |
Beta coefficients |
tuple[float, float] |
eps |
1e-05 |
Epsilon term to add to the denominator for stability |
float |
weight_decay |
0.01 |
Weight decay regularization to use |
float |
amsgrad |
False |
Whether to use the AMSGrad variant of the algorithm |
bool |
default_decay |
True |
Whether to decay module params which aren’t explicitly specified |
bool |
foreach |
None |
Whether to use the foreach variant of the optimizer |
bool | None |
capturable |
False |
Whether to use capturable AdamW pathway |
bool |
differentiable |
False |
Whether to use differentiable AdamW |
bool |
fused |
None |
Whether to use the fused optimizer |
bool | None |