01
Why ReLoRA Struggles with Small Language Models
Parameter-efficient training methods have transformed how we work with large language models. LoRA, which fine-tunes only a small set of low-rank matrix updates while keeping the rest of the model frozen, has become a staple of the modern ML toolkit. Its extension, **ReLoRA**, pushed this idea one step further: instead of just fine-tuning with low-rank updates, why not *pretrain* with them? Turns out it's not that simple.
Yuval Weiss
Read