Obnam repo size with .sql dump files seem too big

S. B. sb56637 at gmail.com
Wed Jan 2 04:43:55 GMT 2013


>>The best idea, untested, I have is to keep the first SQL dump,
>>in the live data, and then do a new dump before each backup, diff
>>the two dumps, delete the new dump, and then run the backup. This
>>way, each successive Obnam backup generation will have two files
>>(the original SQL dump, and the diff), and you'll need to apply
>>the diff to get the real dump you need to restore your database.
>>Does that make sense to anyone?

Thanks a lot Lars, that's what I was looking for. Here's what I ended up
doing, based on this suggestion:

I decided to use rdiff, since it uses small signature files for doing
diffs, so a huge SQL dump file doesn't need to be present to diff against.
First of all, I did a full DB dump (uncompressed) using some options that
are supposed to make it more diff-friendly that I got from here:
http://news.ycombinator.com/item?id=3D3620062
[mysqldump]
complete-insert
hex-blob
skip-add-drop-table
order-by-primary
skip-dump-date
no-create-info

After creating the rdiff .sig file from the full uncompressed dump file, I
moved the dump into an ./original subdirectory and then bzip'ed it. I will
hopefully not have to touch the original dump or read it again unless I
need to restore a backup, in which case I would obviously have to manually
decompress it first before applying the rdiff delta. The rest will work off
the .sig file, which is only a few MB max. That's all for initial
preparation. The rest can be automated.

I wrote a simple bash script that will be run by cron. Here it is:
http://pastebin.com/kmuy2dWW
Essentially, it does the following:
1. Creates a new uncompressed database dump.
2. Creates a .delta file by comparing the new full dump file to the
signature of the original.
3. Now the most recent big dump file is no longer needed and we don't want
it in the Obnam repo, so delete it.
4. Run Obnam backup, and only pick up the original static bzip'ed dump file
and the static signature file (Obnam will only store them once because they
don't change) plus the current delta file from this run.
5. Delete the current delta file, it's now in the Obnam repo.

It seems to work pretty well. The first backup was 554M in the Obnam repo,
then I immediately ran the script again, and the next one was 568M. And
that was actually with two databases and their respective deltas plus my
webserver flat filesystem (I reduced the above script to just 1 DB for
simplicity).

I know this is all baby's play for the experts, but hopefully this will
help somebody newer (like myself). ;) It'd be nice to have the option to
use a slower but more insightful de-duplicating / diff'ing method for
Obnam, but in the meanwhile this is a pretty good workaround.

Thanks for the help with this and thanks to Lars for creating Obnam!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listmaster.pepperfish.net/pipermail/obnam-support-obnam.org/att=
achments/20130101/628d9292/attachment-0001.html


More information about the obnam-support mailing list