roberta-large ============= .. list-table:: Fields :widths: 20 20 40 20 :header-rows: 1 * - Field - Default - Description - Type * - name - ??? - The referenced name of the object to construct - str * - lr - 0.0004 - Learning rate - float * - betas - (0.9, 0.98) - Beta coefficients - tuple[float, float] * - eps - 1e-05 - Epsilon term to add to the denominator for stability - float * - weight_decay - 0.01 - Weight decay regularization to use - float * - amsgrad - False - Whether to use the AMSGrad variant of the algorithm - bool * - default_decay - True - Whether to decay module params which aren't explicitly specified - bool * - foreach - None - Whether to use the foreach variant of the optimizer - bool | None * - capturable - False - Whether to use capturable AdamW pathway - bool * - differentiable - False - Whether to use differentiable AdamW - bool * - fused - None - Whether to use the fused optimizer - bool | None