It's a question of entropy. Data is rarely truly random and for larger data ther...

It's a question of entropy. Data is rarely truly random and for larger data there is a lot higher chance of having this "unrandomness" occur.

If your data consists of 4 kilobytes of just 00_01, then you gain a lot by just remembering:

  "write 00_01 2000 times".

Conversely, if the small amount of data is 00_01_00_01_00_01 then using the previous format would yield:

  "write 00_01 3 times"

As you can see, it does not nearly save as much space in comparison with the original data, hence it's less efficient to use the format. The specifics are highly dependent on the compression algorithmm used so take the example with a grain of salt, but I hope it gets the basic idea of why it can be more efficient across.