Home directory backup - The quick 'n' dirty guide

Updated: March 25, 2019

Say you're a Linux user. No, seriously, say it. OK. Now, you are a Linux person, and you would like to backup your personal data, because it's very important. Indeed, having a solid backup plan is a smart thing. If you're about to make a big system upgrade, try new software or anything else that might bork up your files, a copy of the data will help prevent any undesired tears or panic.

There are two ways you can go about this. One, you can use a tool of some sort, like Grsync or Deja Dup, and that's perfectly fine. Two, you can write your own little script. Normally, there's no reason to bother with the latter method, but it's good to know what to do come the need. Today, I'd like to show you how you can make flexible and encrypted (yes, encrypted) archives of your data, with exclusions and whatnot, and then be able to save them anywhere you like, another disk, another system, cloud, whatever. After me.

Tar-nish your data

We will use the venerable tar utility. It's available on pretty much any UNIX/Linux machine, it requires no UI to work, and it's different from the sexy and neat rsync in that it does not stream (sync) your data to a backup location. It can do that, but its main purpose is to create data archives.

So we will create a local archive with tar, use multiple exclude flags so we don't backup stuff that needs no retention, like temporary files, debug files, cache, Trash, and whatnot, and then we will encrypt the archive using another tool - gpg, so that anyone with access to your data, for whatever reason, can't peek inside, naughty naughty they.

Let's take a look at the list below. Please note that it is NOT comprehensive, and that there could be many other exclusions in your particular setup. But it is indicative and conservative, i.e. some possibly unnecessary data will be backed up, even though you could safely not add it.

tar -cpzf backup-name.tar.gz \
--exclude=backup-name.tar.gz\
--exclude=.cache \
--exclude=.debug \
--exclude=.dbus \
--exclude=.gvfs \
--exclude=.local/share/gvfs-metadata \
--exclude=.local/share/Trash \
--exclude=.recently-used \
--exclude=.thumbnails \
--exclude=.xsession-errors \
--exclude=.Trash \
--exclude=.steam \
--exclude=Downloads \
--exclude=GitHub \
--exclude=Public \
--exclude=Steam \
--exclude=Templates \
--exclude="VirtualBox VMs" \
--warning=no-file-changed .

What do we have here? First, the name of the archive. It can reside on the same volume as the rest of the data, but then you need the second line, which tells us not to include the archive within itself (inception). Then, we have a series of real exclusions. Some of these are generic and some user-specific. The generic exclusions cover data that is mostly session-related or temporary. Specifically (and by that I mean generically):

All the exclude lines after .Trash are personal overrides. For example, I did not see a need to save downloaded GitHub project repo data (you may not even have this or care), Steam stuff (same again) or VirtualBox virtual machines. Your list can include any number of these overrides, as you see fit, of course. I added mine to give you a sense of how you can go about this.

There are many other possible exclusions. For example, this GitHub list contains a few hundred lines of exclusions (intended for rsync, but for the sake of argument, no different here), which you could also consider for your tar command. I did not omit any program-specific directories in the generic part of the list, and only gave you a few personal overrides. The rest is up to you, and you do need a wee bit of homework before you can proceed.

Lastly, we also flash warnings if any of the files slated for backup change, and we save the archive in the current directory (denoted with the dot). You can also save to remote locations, other disks, whatever seems appropriate. And since this is a shell command, you can also script it - and then schedule it.

Encryption

Once the archive has been created, we can encrypt it with gpg:

gpg -o backup-name.tar.gz.gpg --symmetric backup-name.tar.gz

You will be asked to provide a password for your archive. The source will not be deleted, so you can (should) test opening the archive, to make sure that it works and that you remember the password. On systems running a desktop environment, you will get a window prompt:

GPG passphrase

Errors

You might hit a few errors while working with tar. I didn't use the -v flag above on purpose, so that only errors show, and they are then easier to spot and resolve. Some of the common errors you will see are:

tar: .: file changed as we read it
tar: Exiting with failure status due to previous errors

In this case, a file changed while the archive was being created (like say a text file). This isn't an error per se, but the specific file will not be included in the archive, and the exit status of the tar command will be an error code of some kind. You can ignore the error or rerun the tar archive creation. You can also use the --ignore-failed-read flag, if you like, but be aware some files will NOT be backed up.

tar: ./.config/.../leveldb/LOG: Cannot open: Permission denied
tar: ./.config/.../leveldb/000010.ldb: Cannot open: Permission denied

You may see the Permission denied error. This usually stems from permission problems, i.e. some files under your home directory are not accessible or do not belong to you (you're not the owner), so the tar command cannot include them in the archive. You can ignore these errors or fix permissions:

chown "your user":"your group" ~/* -R

Data restore

There's a lot more you can do. The Ubuntu wiki TAR page has a lot of useful information, including how to combine over-the-network backups and restore, which can be quite handy. But the focus here is on quick 'n' dirty, so we're not overcomplicating it, on purpose. First, we need to decrypt the file:

gpg -d backup-name.tar.gz.gpg -o backup-name.tar.gz

And then, we can extract it:

tar -xpzf backup-name.tar.gz

This is the most basic tar archive extraction command. You can combine it with the use of target directory, and other flags, but in most cases, you might not want to blindly copy the data in the archive over your existing system unless you're building a new box.

More reading

Do not forget to backup your system too!

The new and definite CloneZilla tutorial

Timeshift review - Let's do the time warp again

Conclusion

And that's about it. I am a strong believer in data backups. In fact, this is the most important thing you should practice on your computers. It's always useful to have multiple backups and multiple methods to create them, as this can help with various usage scenarios that come your way. Sometimes, a fully fledged UI tool will do the work. Other times, it will be the rsync workhorse. And now you can use another old, reliable program, and that's tar, with a bit of a gpg twist on the side.

In this guide, we talked about exclusions, both generic and personal, we covered encryption, touched on some common errors you may (and most likely will) hit when using tar, and finally also did the data restore, which is as important as all the other steps. Hopefully, you will find the information presented here handy. Take care of them bytes, me hearties. See you around.

Cheers.

You may also like: