I have the same question, what is the different between LLM and Dictionary in th...

binary132 · on Dec 30, 2024

AIUI, a dictionary is built during compression to specify the heuristics of a particular dataset and belongs to that specific dataset only. For example, it could be a ranking of the most frequent 10 symbols in the compressed file. That will be different for every input file.

mbreese · on Dec 31, 2024

> That will be different for every input file

That could be different for every input file, but it doesn't have to be. It could also be a fixed dictionary. For example, ZLIB allows for a user-defined dictionary [1].

In this case, I'd consider the LLM to be a fixed dictionary of sorts. A very large, fixed dictionary with probabilistic return values.

[1] https://www.rfc-editor.org/rfc/rfc1950#page-9

binary132 · on Dec 31, 2024

Ah, I see. I’d never thought of the possibility of using a dictionary not created specifically from the given input dataset, heh

mbreese · on Dec 31, 2024

Admittedly, I don’t think it is common, but I think there was a project a few years ago (Google?) that tried to compress HTML using at least a partially fixed dictionary.

Nowadays though, it’s apparently still something that’s being tried. Chrome now supports shared dictionaries for Zstd and Brotli. One idea being, you would likely benefit from having a shared dictionary used to decompress multiple artifacts for a site. But, you many not want everything compressed all together, so this way you get the compression benefit, but can have those artifacts split into different files.

https://developer.chrome.com/blog/shared-dictionary-compress...