Updated: January 21, 2013
Well, you have probably read a million guides on how to backup your personal data using rsync, a highly useful and versatile data copying tool. Here's another one. I would like to show you some basic tips and tricks for smart and safe rsync usage, how to make a flexible and useful setup, and how to automate your backup procedure, as a part of a comprehensive backup strategy, which you must have.
In home setups, rsync might be somewhat of an overkill. and many users might actually prefer to run the tool with some kind of a frontend, like grsync. However, if you want to fully master and control your data sync and transfer, then, at some point, you will examine the usage from the command line. This guide should get you underway.
You can all read man pages, I am sure. In this particular case, rsync is very well documented, and you should be able to get going with just that, in theory. However, before you begin and potentially cause irreversible damage to your data, you should take several necessary precautions.
Normally, we should begin with basic usage, but that comes next. It is so important to emphasize the below section that I am going to skip the actual syntax for now. Not the serial flow you would expect, but it's a must. So here it goes.
You should never run rsync without using the --dry-run option first. This will give a detailed list of what would have happened had you run for real. You can combine the output with the --incremental option to get the list of all changes. Finally, use the --log-file=FILE option to write all changes to a report.
You should start testing with a dummy source and destination directory. You should make sure that you do not overwrite existing data or that you can afford to lose the pieces of your information if the commands go wrong.
Only after you have completed several safe runs and verified no undesired files are copied, desired files are deleted, nothing is missing, and nothing has been modified without your consent, only then should you try copying files in earnest.
Now we can use rsync. The commands are as follows:
rsync FLAGS/OPTIONS SRC DEST
It's as simple as that. The common recommended options you want are:
-avs - All objects, verbose output, do not allow remote shell to interpret characters; in other words, file names with spaces and special characters will not be translated, which is what you want most likely, especially if you have Windows files, too.
--delete will delete files at the target (destination), if they do not exist in the source. This means you will always keep an up to date list of files and the source and destination will match, plus the destination will not slowly grow in size with older, perhaps irrelevant content.
There are a million, literally. So here's a sampling of good things:
-l (lowercase L), when symlinks are encountered, recreate the symlink on the destination.
--exclude=PATTERN exclude files matching PATTERN
--exclude-from=FILE, read exclude patterns from FILE
--include=PATTERN, don't exclude files matching PATTERN
--include-from=FILE, read include patterns from FILE
Likewise, the option --files-from=FILE allows you to specify a detailed list of directories you wish to include in your backup. Please note that if you write down directory paths without trailing slash, they will be recreated blank, and if you do add the trailing slash, their content will also be copied.
And we mentioned the log file earlier, here's a sample:
Another useful option is -h, which prints the rsync copy summary in a human-readable format. You don't care about blocks, you care about MB and suchlike:
Quoting from the man pages, when comparing two timestamps, rsync treats the timestamps as being equal if they differ by no more than the modify-window value. This is normally 0 for an exact match, but you may find it useful to set this to a larger value in some situations. In particular, when transferring to or from a Microsoft Windows FAT filesystem, which represents times with a two-second resolution, --modify-window=1 is useful.
And then you ought to run and verify it all works dandily:
Here's a another, sample text output of an rsync run, no simulation this time, it's happening - the output shows human readable summary, we use the incremental list, we delete files at the destination that do not match the source, and you can see three files being deleted. In fact, I have only renamed two files, replacing the word fun to guide in their names, but you can see the effect being two deletions, two copies, plus one file being removed altogether.
rsync -avs --delete -i -h
sending incremental file list
sent 180.65K bytes received 67 bytes 361.43K bytes/sec
total size is 579.49M speedup is 3206.62
And we check the destination too. You should use the combination of directory and file count and total usage, with commands like du, wc and similar to make sure that you have the exact same information on your target filesystem as the source.
If you are satisfied with the result, you can now script and schedule the command. The first step is to create a simple shell script that contains the earliest rsync command. Then, you should chmod it to be executable and run it once or twice to verify there are no weird bugs or errors.Your typical script might look something like:
echo some useful information perhaps
your rsync command here preferably with good logging
Next, you need to cron your task. But that's a topic for another tutorial. If you need instructions for that, there'll be a followup to this guide. Still, it might look something like the line below - this cron will run every hour:
* */1 * * * /home/roger/rsync-backup.sh
If you must have a GUI, then maybe Grsync is what you want:
There you go, a nice, quick and useful guide. Hopefully, it will help you get past your fear of using the command line and utilizing the awesome little tool called rsync to create backups of your data, which is what we strive for.
The tutorial covers the necessary precautions, like dry-run, list and details log, checking everything carefully before firing potentially destructive commands, basic and advanced usage, input and output formats that should help you manage your backup data, some Windows tips, a word or two on scheduling, as well as a frontend alternative, if you still fear the command line. All in all, there's a plenty going on here. I hope you like it.
Well, that would be all. The warning sign image is in public domain.