LoRA vs. Full Fine-Tuning: An Illusion of Equivalence

[Submitted on 28 Oct 2024]

Title:LoRA vs Full Fine-tuning: An Illusion of Equivalence
View a PDF of the paper titled LoRA vs Full Fine-tuning: An Illusion of Equivalence, by Reece Shuttleworth and 3 other authors

View PDF
HTML (experimental)

Abstract:Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solutions really equivalent?} We study how different fine-tuning methods change pre-trained models by analyzing the model’s weight matrices through the lens of their spectral properties. We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure; moreover, the fine-tuned models themselves show distinct generalization behaviors when tested outside the adaptation task’s distribution. More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. Higher-rank, rank-stabilized LoRA models closely mirror full fine-tuning, even when performing on par with lower-rank LoRA models on the same tasks. These results suggest that models updated with LoRA and full fine-tuning access different parts of parameter space, even when they perform equally on the fine-tuned distribution. We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.

Submission history From: Reece Shuttleworth [view email]
[v1]
Mon, 28 Oct 2024 17:14:01 UTC (9,438 KB)

{Categories} _Category: Implications{/Categories}
{URL}https://arxiv.org/abs/2410.21228{/URL}
{Author}unknown{/Author}
{Image}https://static.arxiv.org/icons/twitter/arxiv-logo-twitter-square.png{/Image}
{Keywords}{/Keywords}
{Source}Implications{/Source}
{Thumb}{/Thumb}

Exit mobile version