I think the main use case remains behavior changes: instruction finetuning, fine...

I think the main use case remains behavior changes: instruction finetuning, finetuning for classification, etc. Knowledge addition to the weights is best done via pretraining. Or, if you have an external database or documentation that you want to query during the generation, RAG as you mention.

PS: All winners of the NeurIPS 2023 LLM Efficiency Challenge (finetuning the "best" LLM in 24h on 1 GPU) used LoRA or QLoRA (quantized LoRA).