Exactly this. The way to do this is to use the LLM as the statistical model inside an arithmetic coder. You use its next-token probabilities to encode the next token. The KL divergence between the LLM's probabilities, and the empirical probabilities in the corpus that you actually compress, is the compression inefficiency.