ml.models.lora
Helper utilities for using LoRA layers.
LoRA layers are drop-in replacements for certain modules, which can be used for fine-tuning pre-trained models. It is described in the paper LoRA: Low-Rank Adaptation of Large Language Models.
from ml.models.lora import lora
# The pre-trained model weights can be loaded into the LoRA model.
model = nn.Sequential(nn.Linear(5, 7), nn.Linear(7, 5))
lora_model = nn.Sequential(lora(nn.Linear(5, 7)), lora(nn.Linear(7, 5)))
lora_model.load_state_dict(model.state_dict()) # No errors
from ml.models.lora import LoRALinear
# Alternatively, you can just substitute the module name.
model = nn.Sequential(LoRALinear(5, 7), LoRALinear(7, 5))
The modules which can be wrapped with LoRA modules are:
nn.Embedding
nn.Linear
nn.Conv1d
nn.ConvTranspose1d
nn.Conv2d
nn.ConvTranspose2d
nn.LSTM
nn.GRU
ColumnParallelLinear
RowParallelLinear
ParallelEmbedding
In the paper, the authors typically use values of 1, 2, 4, or 8 for the
r
parameter. The lora_alpha
parameter is typically set to 1.0, but
can be tuned to improve performance.
- class ml.models.lora.LoraEmbedding(num_embeddings: int, embedding_dim: int, r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, merge: bool = False, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False)[source]
Bases:
Embedding
,_Lora
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- train(mode: bool = True) LoraEmbedding [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- forward(x: Tensor) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraLinear(in_features: int, out_features: int, r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, fan_in_fan_out: bool = False, merge: bool = False, bias: bool = True)[source]
Bases:
Linear
,_Lora
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- train(mode: bool = True) LoraLinear [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- forward(x: Tensor) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraConv1d(in_channels: int, out_channels: int, kernel_size: int | tuple[int], r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, merge: bool = False, stride: int | tuple[int] = 1, padding: str | int | tuple[int] = 0, dilation: int | tuple[int] = 1, groups: int = 1, bias: bool = True)[source]
Bases:
Conv1d
,_Lora
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- train(mode: bool = True) LoraConv1d [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- forward(x: Tensor) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraConvTranspose1d(in_channels: int, out_channels: int, kernel_size: int | tuple[int], r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, merge: bool = False, stride: int | tuple[int] = 1, padding: int | tuple[int] = 0, output_padding: int | tuple[int] = 0, dilation: int | tuple[int] = 1, groups: int = 1, bias: bool = True)[source]
Bases:
ConvTranspose1d
,_Lora
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- train(mode: bool = True) LoraConvTranspose1d [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- forward(x: Tensor, output_size: list[int] | None = None) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraConv2d(in_channels: int, out_channels: int, kernel_size: int | tuple[int, int], r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, merge: bool = False, stride: int | tuple[int, int] = (1, 1), padding: str | int | tuple[int, int] = (0, 0), dilation: int | tuple[int, int] = (1, 1), groups: int = 1, bias: bool = True)[source]
Bases:
Conv2d
,_Lora
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- train(mode: bool = True) LoraConv2d [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- forward(x: Tensor) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraConvTranspose2d(in_channels: int, out_channels: int, kernel_size: int | tuple[int, int], r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, merge: bool = False, stride: int | tuple[int, int] = (1, 1), padding: int | tuple[int, int] = (0, 0), output_padding: int | tuple[int, int] = (0, 0), dilation: int | tuple[int, int] = (1, 1), groups: int = 1, bias: bool = True)[source]
Bases:
ConvTranspose2d
,_Lora
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- train(mode: bool = True) LoraConvTranspose2d [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- forward(x: Tensor, output_size: list[int] | None = None) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraLSTM(input_size: int, hidden_size: int, r: int, lora_alpha: float = 1.0, num_layers: int = 1, bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, proj_size: int = 0)[source]
Bases:
LSTM
,_LoraRNN
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class ml.models.lora.LoraGRU(input_size: int, hidden_size: int, r: int, lora_alpha: float = 1.0, num_layers: int = 1, bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, proj_size: int = 0)[source]
Bases:
GRU
,_LoraRNN
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class ml.models.lora.LoraLSTMCell(input_size: int, hidden_size: int, r: int, bias: bool = True, lora_alpha: float = 1.0)[source]
Bases:
LSTMCell
,_LoraRNNCellBase
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input: Tensor, hx: tuple[torch.Tensor, torch.Tensor] | None = None) tuple[torch.Tensor, torch.Tensor] [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraGRUCell(input_size: int, hidden_size: int, r: int, bias: bool = True, lora_alpha: float = 1.0)[source]
Bases:
GRUCell
,_LoraRNNCellBase
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input: Tensor, hx: Tensor | None = None) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraParallelEmbedding(num_embeddings: int, embedding_dim: int, r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, merge: bool = False, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, init_type: Literal['orthogonal', 'normal', 'biased_normal', 'uniform', 'kaiming_uniform', 'kaiming_normal', 'xavier_uniform', 'xavier_normal', 'trunc_normal', 'dirac', 'constant', 'zeros', 'ones'] = 'xavier_normal')[source]
Bases:
ParallelEmbedding
,_Lora
Model-parallel embeddings.
Embeddings are partitioned along the
embedding_dim
dimension.- Parameters:
num_embeddings – Number of embeddings (vocabulary size).
embedding_dim – Embedding dimension; must be divisible by the model-parallel size.
padding_idx – See
nn.Embedding
.max_norm – See
nn.Embedding
.norm_type – See
nn.Embedding
.scale_grad_by_freq – See
nn.Embedding
.sparse – See
nn.Embedding
.init_type – Initialization type.
- train(mode: bool = True) LoraParallelEmbedding [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- forward(x: Tensor) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class ml.models.lora.LoraColumnParallelLinear(in_features: int, out_features: int, r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, fan_in_fan_out: bool = False, merge: bool = False, bias: bool = True, gather_output: bool = True, init_type: Literal['orthogonal', 'normal', 'biased_normal', 'uniform', 'kaiming_uniform', 'kaiming_normal', 'xavier_uniform', 'xavier_normal', 'trunc_normal', 'dirac', 'constant', 'zeros', 'ones'] = 'xavier_normal', stride: int = 1)[source]
Bases:
ColumnParallelLinear
,_Lora
A column parallel linear layer.
This layer splits the weight matrix along the output feature dimension, and each rank is only responsible for
out_features // world_size
number of output features.- Parameters:
in_features – Number of input features.
out_features – Number of output features.
bias – Whether to include a bias term.
gather_output – Whether to gather the output from all the model parallel GPUs.
init_type – Initialization type.
stride – Stride for the initialization.
lora_rank – The LoRA rank to use, if any.
- train(mode: bool = True) LoraColumnParallelLinear [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- class ml.models.lora.LoraRowParallelLinear(in_features: int, out_features: int, r: int, lora_alpha: float = 1.0, lora_dropout: float = 0.0, fan_in_fan_out: bool = False, merge: bool = False, bias: bool = True, input_is_parallel: bool = False, init_type: Literal['orthogonal', 'normal', 'biased_normal', 'uniform', 'kaiming_uniform', 'kaiming_normal', 'xavier_uniform', 'xavier_normal', 'trunc_normal', 'dirac', 'constant', 'zeros', 'ones'] = 'xavier_normal', stride: int = 1)[source]
Bases:
RowParallelLinear
,_Lora
A row parallel linear layer.
This layer splits the weight matrix along the input feature dimension, and each rank is only responsible for
in_features // world_size
number of input features.This can be paired with a column parallel layer to create a model parallel two-stage linear layer.
- Parameters:
in_features – Number of input features.
out_features – Number of output features.
bias – Whether to include a bias term.
input_is_parallel – Whether the input tensor is already split along the feature dimension.
init_type – Initialization type.
stride – Stride for the initialization.
- train(mode: bool = True) LoraRowParallelLinear [source]
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters:
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns:
self
- Return type:
Module
- ml.models.lora.lora(module: Embedding, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraEmbedding [source]
- ml.models.lora.lora(module: Linear, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraLinear
- ml.models.lora.lora(module: Conv1d, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraConv1d
- ml.models.lora.lora(module: ConvTranspose1d, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraConv1d
- ml.models.lora.lora(module: Conv2d, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraConv2d
- ml.models.lora.lora(module: ConvTranspose2d, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraConv2d
- ml.models.lora.lora(module: LSTM, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraLSTM
- ml.models.lora.lora(module: GRU, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraGRU
- ml.models.lora.lora(module: LSTMCell, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraLSTMCell
- ml.models.lora.lora(module: GRUCell, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraGRUCell
- ml.models.lora.lora(module: ParallelEmbedding, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraParallelEmbedding
- ml.models.lora.lora(module: ColumnParallelLinear, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraColumnParallelLinear
- ml.models.lora.lora(module: RowParallelLinear, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) LoraRowParallelLinear
- ml.models.lora.lora(module: Embedding | Linear | Conv1d | ConvTranspose1d | Conv2d | ConvTranspose2d | LSTM | GRU | LSTMCell | GRUCell | ColumnParallelLinear | RowParallelLinear | ParallelEmbedding, r: int, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False) Module
Wraps a module with LoRA.
This function takes a base module and returns the LoRA version of that module. The new module is effectively a drop-in replacement for the original module; for example, it can load the same state dict, and it has the same input and output shapes.
- Parameters:
module – The module to wrap.
r – The number of LoRA components to use. If 0, then LoRA is not used.
alpha – The scaling factor for the LoRA components. A higher value means that more weight is given to the LoRA components.
dropout – The dropout probability applied to the input value before computing the LoRA components. This parameter is not supported for RNNs (because it would require modifying the underyling kernel).
merge – Whether to merge the LoRA components into the original weights. If True, then the LoRA components are merged into the weights during training, and the original weights are used during evaluation. If False, then the LoRA components are used during both training and evaluation.
- Returns:
The LoRA version of the module.
- Raises:
ValueError – If the module is not supported.
- ml.models.lora.maybe_lora(module: T_module, r: int | None, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False, freeze: bool = True) T_module [source]
Apply LoRA to a supported module, if a LoRA rank is provided.
- Parameters:
module – A supported module.
r – The LoRA rank.
alpha – The LoRA alpha parameter.
dropout – The LoRA dropout rate.
merge – Whether to merge the LoRA rank into the input dimension.
freeze – Whether to freeze the module’s parameters if a LoRA rank is not provided. This argument has no effect if a LoRA rank is provided, since downstream users can always freeze just the module themselves. Typically, when trying out LoRA fine-tuning, downstream users will want to freeze most of the module parameters and apply LoRA only to a subset of the module’s layers, so this is the default behavior.
- Returns:
The module with LoRA applied, if a LoRA rank is provided.
- ml.models.lora.maybe_lora_weight_norm(module: T_module, r: int | None, alpha: float = 1.0, dropout: float = 0.0, merge: bool = False, freeze: bool = True) T_module [source]
- ml.models.lora.reset_lora_weights_(module: Module) None [source]
Resets any LoRA weights in the module.
All of the LoRA modules have a
reset_lora_parameters
method that will reset the LoRA weights in-place. This function looks for any modules with this method and calls it.- Parameters:
module – The module to reset, in-place.