Exactly this. The way to do this is to use the LLM as the statistical model insi...

Exactly this. The way to do this is to use the LLM as the statistical model inside an arithmetic coder. You use its next-token probabilities to encode the next token. The KL divergence between the LLM's probabilities, and the empirical probabilities in the corpus that you actually compress, is the compression inefficiency.