Author of HashBackup here. To use modern block-based backup programs for large d...

Author of HashBackup here.

To use modern block-based backup programs for large databases and VM images (similar situation), you must use a very small block size for dedup to work well. For VM images, that's 4K. For databases, it's the page size, which is 4K for SQlite and 16K for InnoDB by default.

With very small block sizes, most block-based backup programs kind of fall over, and start downloading lots of data on each backup for the block index, use a lot of RAM for the index, or both. So it's important that you test programs with small block sizes if you expect high dedup across backups. Some backup program allow you to set the block size on a per-file basis (HashBackup does), while others set it at the backup repo level.

To backup a database, there are generally a couple of options:

1. Create an SQL text dump of the database and back that up. For this to dedup well, variable-sized blocks must be used, and the smaller the block size, the higher the dedup ratio.

2. Backup the database while running with a fixed block size equal to the db page size. You could lock the database and do the backup, but it's better to do two backup runs, the first with no locking, and the second with a read lock. The first backup cannot be restored because it would be inconsistent if any changes occur to the database during the backup. But it does not lock out any database users during the backup. The second backup will be much faster because the bulk of the database blocks have already been saved and only the changed blocks have to be re-saved. Since the second backup occurs with a read lock held, the second backup will be a consistent snapshot of the database.

3. The third way is to get the database write logs involved, which is more complex.