Incremental rsync – R-fx Networks

Current Release:
http://www.rfxn.com/downloads/irsync-current.tar.gz
http://www.rfxn.com/appdocs/README.irsync
http://www.rfxn.com/appdocs/CHANGELOG.irsync

Description:
The irsync tool is an incremental wrapper for the rsync utility, though this is native-supported by rsync, the irsync tool provides convenience features. The design goals behind irsync were to provide a tool that would allow me to create point-in-time incremental backups that used as little space as possible on the storage media in addition to having a complete and effective MySQL backup routine. Though the initial goals of the project were limited and scoped to mainly cover some personal hardware, it quickly snowballed into its own fully featured tool that I decided should be packaged as a project.

Currently I have irsync running on 28 servers managing 448 snapshots consisting of 21TB of data. The usage varies from backups of dns servers to dedicated SQL servers and critical web servers. The only usage note that stands out is that if you elect to use the ‘–mysql-dump-gz’ or ‘mysql_dump_gz=1’ in conf.irsync, this will break incremental support of MySQL dump backups and force a full copy of the dumps to be retained within each snapshot. This may be desired for some people but if you have a large MySQL installation this could quickly get out of hand on space usage across the default retention of 14 days of snap data.

Features:
– traffic control (tc) shaping of outbound traffic for rate limiting
– preservation of full backup with incremental snapshots
– each incremental snapshot can be restored as a full point-in-time backup
– hard link based snapshots to reduce disk usage
– compatible with unmanaged storage space, all opertions are client side
– optional local option for performing serverless backups (i.e: to backup disk)
– auto-deletion of snapshots based on configurable age values
– auto-generation of ssh public/private key pairs for irsync install
– mysql backups through mysqldump with non-locking fast dumps & gzip compression
– mysql backups through mysqlhotcopy of raw mysql database (var/lib/mysql/db/*)
– mysql backups flush to disk of all open tables for consistent backups
– mysql backups stored as full and point-in-time backups of hotcopy images

Storage:
The irsync storage logic is based on hardlinks to create point-in-time backups of full incremental backups. On execution rsync creates a full backup of defined paths then the ‘cp’ tool is used to create a hardlinked copy of data. Upon the next rsync run against the full backup path, any data that has been created, deleted or modified will overwrite the existing data in the storage path
thereby breaking hard links and creating a point-in-time backup of changed data.

The path structure is as follows:
STORAGE_PATH/HOSTNAME.FULL
STORAGE_PATH/HOSTNAME.FULL/MYSQLHOTCOPY
STORAGE_PATH/HOSTNAME.FULL/MYSQLDUMP
STORAGE_PATH/HOSTNAME.SNAPS/DATESTAMP

The point-in-time backups which are restorable as full backups are stored in the .SNAPS directory, these are rotated off for deletion based on the max age value in conf.irsync using find’s mtime option piped to rm.

A common misconception is that deleting a hard link will delete the source data but this is not the case. When an rm is run on hardlink pointers, the number of links is checked and the data is only deleted when links reaches 0.

To demonstrate how the backups work on the storage server we can look at the below storage layout details to see how the snapshots and full image get populated.

The full image synced data with size and # of files:
# ls freedom.lan.full/
etc home local root var mysqldump mysqlhotcopy

# du -sh freedom.lan.full/
1.9G freedom.lan.full

# find freedom.lan.full | wc -l
17911

Now lets assume we have run three iterations of irsync to date, the snapshots path would look something like this:

# ls freedom.lan.snaps/
2010-02-19.202026 2010-02-20.202718 2010-02-21.191503

# ls 2010-02-21.191503/
etc home local root var mysqldump mysqlhotcopy

# du -shc *
12M 2010-02-19.202026
133M 2010-02-20.202718
275M 2010-02-21.191503

# for i in `ls`; do find $i | wc -l; done
17819 2010-02-19.202026
18416 2010-02-20.202718
18227 2010-02-21.191503

So what does this all translate into? as we can see our full backup is 1.9G in size with 17.9k files then subsequent backups have synced in changed data only with the 2010-02-19.202026 image having 12M of changed data and an offset of 92 fewer files. Although we capture the changed data in the 02-19 snap, we also have all our original data as indicated by the file counts but without having the space overhead of duplicating the data.

This is done by hard linking to the full image for any unchanged data, on subsequent irsync runs when new changed data is synced in, it breaks the hard links in the snapshots which leave behind a copy of the original data in its previous state. This method of point-in-time incremental backups allows for the easy retention of changed data, with minimal space usage while having a logical backup layout that is fully restorable from each individual snapshot and compatible with any utility as hard links are treated just like regular
files and directories.

Funding:

Funding for the continued development and research into this and other projects is solely dependent on public contributions and donations. If this is your first time using this software we ask that you evaluate it and consider a small donation; for those who frequent and are continued users of this and other projects we also ask that you make an occasional donation to help ensure the future of our public projects.