It probably depends on your problem space. In creative writing, I wonder if its even perceptible if the LLM is creating content at the boundaries of its knowledge base. But for programming or other falsifiable (and rapidly changing) disciplines it is noticeable and a problem.
Maybe some evaluation of the sample size would be helpful? If the LLM has less than X samples of an input word or phrase it could include a cautionary note in its output, or even respond with some variant of “I don’t know”.
Maybe some evaluation of the sample size would be helpful? If the LLM has less than X samples of an input word or phrase it could include a cautionary note in its output, or even respond with some variant of “I don’t know”.