EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

1 National Taiwan University, 2 NVIDIA
arXiv Preprint 2025

The dilemma caused by additional memory overhead during fine-tuning. (a) Users opt for a smaller 8B model, sacrificing emergent capabilities and underutilizing available hardware. (b) Use of a larger 26B model requiring memory exceeding the hardware limit. (c) Our EMLoC utilizes a smaller model during fine-tuning, allowing the same budget for both training and inference.


Abstract

Open-source foundation models have seen rapid adoption and development, enabling powerful general-purpose capabilities across diverse domains. However, fine-tuning large foundation models for domain-specific or personalized tasks remains prohibitively expensive for most users due to the significant memory overhead beyond that of inference. We introduce EMLoC, an Emulator-based Memory-efficient fine-tuning framework with LoRA Correction, which enables model fine-tuning within the same memory budget required for inference. EMLoC constructs a task-specific light-weight emulator using activation-aware singular value decomposition (SVD) on a small downstream calibration set. Fine-tuning then is performed on this lightweight emulator via LoRA. To tackle the misalignment between the original model and the compressed emulator, we propose a novel compensation algorithm to correct the fine-tuned LoRA module, which thus can be merged into the original model for inference. EMLoC supports flexible compression ratios and standard training pipelines, making it adaptable to a wide range of applications. Extensive experiments demonstrate that EMLoC outperforms other baselines across multiple datasets and modalities. Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on a single 24GB consumer GPU—bringing efficient and practical model adaptation to individual users.

Method

Quantitative Results

Qualitative Results

Qualitative

EMLoC enables personalization of FLUX.1-dev using 24GB GPU. DreamBooth with LoRA is used to personalize the 12B FLUX.1-dev diffusion model, illustrating that EMLoC can be effectively extended to generative tasks beyond text.

BibTeX

@article{lin2025emloc,
  title={EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction},
  author={Hsi-Che Lin and Yu-Chu Yu and Kai-Po Chang and Yu-Chiang Frank Wang},
  journal={arXiv preprint arXiv:2506.12015},
  year={2025}
}