{"id":1172,"date":"2013-08-01T01:00:00","date_gmt":"2013-07-31T23:00:00","guid":{"rendered":"https:\/\/www.fussylogic.co.uk\/blog\/?p=1172"},"modified":"2013-08-02T07:40:45","modified_gmt":"2013-08-02T06:40:45","slug":"rdiff-backup-2","status":"publish","type":"post","link":"https:\/\/www.fussylogic.co.uk\/blog\/?p=1172","title":{"rendered":"rdiff-backup"},"content":{"rendered":"<p><a href=\"http:\/\/rdiff-backup.nongnu.org\/\">rdiff-backup<\/a> is about the only backup tool I\u00e2\u20ac\u2122ve found acceptable. The key things for me:<\/p>\n<ul>\n<li>\n<p>The (most recent) backup is readable on disk, as is. That is to say that there is no binary format. If the worst came to the worst, I could restore with <code>cp<\/code>.<\/p>\n<\/li>\n<li>\n<p>It keeps a rolling backup automatically. Each new backup automatically pushes all the previous backups down one. They are all available.<\/p>\n<\/li>\n<li>\n<p>It\u00e2\u20ac\u2122s metadata is not scattered about throughout the backup; it\u00e2\u20ac\u2122s kept in an obvious and easily isolated directory, <code>rdiff-backup-data<\/code>.<\/p>\n<\/li>\n<li>\n<p>It supports <code>ssh<\/code> as the transport mechanism.<\/p>\n<\/li>\n<li>\n<p>It supports pull-backups. This is particularly important because I want to be able to backup a system that\u00e2\u20ac\u2122s live on the internet. To maintain security, you should never keep any secure keys on an internet-facing host. That includes ssh keys. Therefore you should always do your backups with a secure (non-internet facing) host logging into the internet-facing host using an ssh public key. That means the only key stored on the internet-facing system is a public key \u00e2\u20ac\u201c which gives access to nothing. Even if that system is compromised, it doesn\u00e2\u20ac\u2122t give the attacker access to any more of your systems.<\/p>\n<\/li>\n<li>\n<p>The backup is incremental, with increments stored as \u00e2\u20ac\u0153reverse\u00e2\u20ac\u009d diffs. That means that the latest backup is an exact mirror, and the increments are stored as incremental diffs backwards from that latest snapshot. It makes your increments mostly free in terms of disk space. You can afford then to keep many more of them.<\/p>\n<\/li>\n<\/ul>\n<p>Enough advertising. I\u00e2\u20ac\u2122d like to talk about using <code>rdiff-backup<\/code> to backup your, say, small server system. I\u00e2\u20ac\u2122m assuming that a webserver that serves multiple sites, has databases, and maybe even some source code repositories.<\/p>\n<p>Install with<\/p>\n<pre><code>apt-get install rdiff-backup python-pylibacl python-pyxattr<\/code><\/pre>\n<p>I suggest you put your <code>rdiff-backup<\/code> command in a script file. Even though it\u00e2\u20ac\u2122s likely going to be only running one command, that command can be a long and complicated one, and it\u00e2\u20ac\u2122s much easier to debug and understand if it\u00e2\u20ac\u2122s in a file.<\/p>\n<p>Let\u00e2\u20ac\u2122s begin by making a null-backup.<\/p>\n<pre><code>#!\/bin\/sh\nrdiff-backup \\\n    --exclude \/ \\\n    --print-statistics \\\n    root@remote::\/ \\\n    \/local\/backup\/directory<\/code><\/pre>\n<p>Note the \u00e2\u20ac\u0153host::path\u00e2\u20ac\u009d notation makes <code>rdiff-backup<\/code> use <code>ssh<\/code> to connect; you\u00e2\u20ac\u2122ll need <code>rdiff-backup<\/code> installed on the remote machine as well, for <code>rdiff-backup<\/code> to run and communicate with over that <code>ssh<\/code> connection.<\/p>\n<pre><code>--------------[ Session statistics ]--------------\nStartTime 1375345801.00 (Thu Aug  1 09:30:01 2013)\nEndTime 1375345804.04 (Thu Aug  1 09:30:04 2013)\nElapsedTime 3.04 (3.04 seconds)\nSourceFiles 1\nSourceFileSize 0 (0 bytes)\nMirrorFiles 1\nMirrorFileSize 0 (0 bytes)\nNewFiles 0\nNewFileSize 0 (0 bytes)\nDeletedFiles 0\nDeletedFileSize 0 (0 bytes)\nChangedFiles 1\nChangedSourceSize 0 (0 bytes)\nChangedMirrorSize 0 (0 bytes)\nIncrementFiles 0\nIncrementFileSize 0 (0 bytes)\nTotalDestinationSizeChange 0 (0 bytes)\nErrors 0\n--------------------------------------------------<\/code><\/pre>\n<p>I think you should always be explicit about what you\u00e2\u20ac\u2122re backing up when dealing with root-based backups. I find it better to do it this way around, rather than fill your command line with exclusions for all the non-filesystem directories (<code>\/sys<\/code>, <code>\/dev<\/code>, <code>\/tmp<\/code>, <code>\/proc<\/code>, etc). Running the above command gets us a directory tree like this in our backup directory:<\/p>\n<pre><code>$ tree -F\n.\n`-- rdiff-backup-data\/\n    |-- access_control_lists.2013-08-01T09:30:01+01:00.snapshot\n    |-- backup.log\n    |-- chars_to_quote\n    |-- current_mirror.2013-08-01T09:30:01+01:00.data\n    |-- error_log.2013-08-01T09:30:01+01:00.data\n    |-- extended_attributes.2013-08-01T09:30:01+01:00.snapshot\n    |-- file_statistics.2013-08-01T09:30:01+01:00.data.gz\n    |-- increments\/\n    |-- mirror_metadata.2013-08-01T09:30:01+01:00.snapshot.gz\n    `-- session_statistics.2013-08-01T09:30:01+01:00.data\n\n2 directories, 9 files<\/code><\/pre>\n<p>There is nothing but <code>rdiff-backup-data<\/code>; pretty obviously because we\u00e2\u20ac\u2122ve excluded the entire remote directory structure, so we\u00e2\u20ac\u2122re backing up nothing. However that lets us see what <code>rdiff-backup<\/code> is keeping aside from our data. I won\u00e2\u20ac\u2122t describe each of them; they\u00e2\u20ac\u2122re fairly self explanatory, and it\u00e2\u20ac\u2122s unlikely you\u00e2\u20ac\u2122ll ever need to look at anything other than the logs (and even them, rarely).<\/p>\n<p>We can get <code>rdiff-backup<\/code> to tell us about the backups so far.<\/p>\n<pre><code>$ rdiff-backup --list-increment-sizes \/local\/backup\/directory\n        Time                       Size        Cumulative size\n-----------------------------------------------------------------------------\nThu Aug  1 09:30:01 2013         4.00 KB           4.00 KB   (current mirror)<\/code><\/pre>\n<p>Run the same null-backup line again, then look at the increment list again.<\/p>\n<pre><code>$ rdiff-backup --list-increment-sizes \/local\/backup\/directory\n        Time                       Size        Cumulative size\n-----------------------------------------------------------------------------\nThu Aug  1 09:35:04 2013         4.00 KB           4.00 KB   (current mirror)<\/code><\/pre>\n<p>Note that <code>rdiff-backup --list-increment-sizes<\/code> is clever enough to note that nothing has changed, so it simply shows once increment. In truth, it has recorded both backups:<\/p>\n<pre><code>$ tree -F\n.\n`-- rdiff-backup-data\/\n    |-- access_control_lists.2013-08-01T09:30:01+01:00.snapshot\n    |-- access_control_lists.2013-08-01T09:35:04+01:00.snapshot\n    |-- backup.log\n    |-- chars_to_quote\n    |-- current_mirror.2013-08-01T09:35:04+01:00.data\n    |-- error_log.2013-08-01T09:30:01+01:00.data\n    |-- error_log.2013-08-01T09:35:04+01:00.data\n    |-- extended_attributes.2013-08-01T09:30:01+01:00.snapshot\n    |-- extended_attributes.2013-08-01T09:35:04+01:00.snapshot\n    |-- file_statistics.2013-08-01T09:30:01+01:00.data.gz\n    |-- file_statistics.2013-08-01T09:35:04+01:00.data.gz\n    |-- increments\/\n    |-- mirror_metadata.2013-08-01T09:30:01+01:00.diff\n    |-- mirror_metadata.2013-08-01T09:35:04+01:00.snapshot.gz\n    |-- session_statistics.2013-08-01T09:30:01+01:00.data\n    `-- session_statistics.2013-08-01T09:35:04+01:00.data\n\n2 directories, 15 files<\/code><\/pre>\n<p>Note that there are two sets of <code>rdiff-backup<\/code> files; we well get a new set every time we run <code>rdiff-backup<\/code>. Run it a few times to observe if you wish.<\/p>\n<p>Let\u00e2\u20ac\u2122s now create something to backup. Change your backup script to this:<\/p>\n<pre><code>#!\/bin\/sh\nrdiff-backup \\\n    --include \/test-file \\\n    --exclude \/ \\\n    --print-statistics \\\n    root@remote::\/ \\\n    \/local\/backup\/directory<\/code><\/pre>\n<p>You\u00e2\u20ac\u2122ll need to then put some content in <code>remote::\/test-file<\/code>.<\/p>\n<pre><code>$ ssh root@remote &quot;date &gt; \/test-file&quot;<\/code><\/pre>\n<p>Rerun your backup script. This time you\u00e2\u20ac\u2122ll see this output:<\/p>\n<pre><code>--------------[ Session statistics ]--------------\nStartTime 1375346602.00 (Thu Aug  1 09:43:22 2013)\nEndTime 1375346605.25 (Thu Aug  1 09:43:25 2013)\nElapsedTime 3.25 (3.25 seconds)\nSourceFiles 2\nSourceFileSize 29 (29 bytes)\nMirrorFiles 1\nMirrorFileSize 0 (0 bytes)\nNewFiles 1\nNewFileSize 29 (29 bytes)\nDeletedFiles 0\nDeletedFileSize 0 (0 bytes)\nChangedFiles 1\nChangedSourceSize 0 (0 bytes)\nChangedMirrorSize 0 (0 bytes)\nIncrementFiles 2\nIncrementFileSize 0 (0 bytes)\nTotalDestinationSizeChange 29 (29 bytes)\nErrors 0\n--------------------------------------------------<\/code><\/pre>\n<p>More than zero bytes has been transferred. Our backup now looks something like this (yours<\/p>\n<pre><code>.\n|-- rdiff-backup-data\/\n|   |-- access_control_lists.2013-08-01T09:30:01+01:00.snapshot\n|   |-- access_control_lists.2013-08-01T09:35:04+01:00.snapshot\n|   |-- access_control_lists.2013-08-01T09:41:54+01:00.snapshot\n|   |-- access_control_lists.2013-08-01T09:43:22+01:00.snapshot\n|   |-- backup.log\n|   |-- chars_to_quote\n|   |-- current_mirror.2013-08-01T09:43:22+01:00.data\n|   |-- error_log.2013-08-01T09:30:01+01:00.data\n|   |-- error_log.2013-08-01T09:35:04+01:00.data\n|   |-- error_log.2013-08-01T09:41:54+01:00.data\n|   |-- error_log.2013-08-01T09:43:22+01:00.data\n|   |-- extended_attributes.2013-08-01T09:30:01+01:00.snapshot\n|   |-- extended_attributes.2013-08-01T09:35:04+01:00.snapshot\n|   |-- extended_attributes.2013-08-01T09:41:54+01:00.snapshot\n|   |-- extended_attributes.2013-08-01T09:43:22+01:00.snapshot\n|   |-- file_statistics.2013-08-01T09:30:01+01:00.data.gz\n|   |-- file_statistics.2013-08-01T09:35:04+01:00.data.gz\n|   |-- file_statistics.2013-08-01T09:41:54+01:00.data.gz\n|   |-- file_statistics.2013-08-01T09:43:22+01:00.data.gz\n|   |-- increments\/\n|   |   `-- test-file.2013-08-01T09:41:54+01:00.missing\n|   |-- increments.2013-08-01T09:35:04+01:00.dir*\n|   |-- increments.2013-08-01T09:41:54+01:00.dir*\n|   |-- mirror_metadata.2013-08-01T09:30:01+01:00.diff\n|   |-- mirror_metadata.2013-08-01T09:35:04+01:00.diff.gz\n|   |-- mirror_metadata.2013-08-01T09:41:54+01:00.diff.gz\n|   |-- mirror_metadata.2013-08-01T09:43:22+01:00.snapshot.gz\n|   |-- session_statistics.2013-08-01T09:30:01+01:00.data\n|   |-- session_statistics.2013-08-01T09:35:04+01:00.data\n|   |-- session_statistics.2013-08-01T09:41:54+01:00.data\n|   `-- session_statistics.2013-08-01T09:43:22+01:00.data\n`-- test-file<\/code><\/pre>\n<p>More metadata files as we\u00e2\u20ac\u2122ve come to expect; but more importantly now:<\/p>\n<pre><code>.\n|-- rdiff-backup-data\/\n|   |-- increments\/\n|   |   `-- test-file.2013-08-01T09:41:54+01:00.missing\n`-- test-file<\/code><\/pre>\n<p>We have our <code>test-file<\/code> backed as a standard copy; and an increment describing the change (we needn\u00e2\u20ac\u2122t worry about how <code>rdiff-backup<\/code> keeps its increments).<\/p>\n<pre><code>$ rdiff-backup --list-increment-sizes \/local\/backup\/directory\n        Time                       Size        Cumulative size\n-----------------------------------------------------------------------------\nThu Aug  1 09:43:22 2013         4.03 KB           4.03 KB   (current mirror)\nThu Aug  1 09:41:54 2013         0 bytes           4.03 KB\nThu Aug  1 09:35:04 2013         0 bytes           4.03 KB<\/code><\/pre>\n<p>Excellent. <code>rdiff-backup<\/code> is telling us all the snapshots it can take us back to, and we can see that the latest is the only one with any data in it. Obviously in a real backup, you\u00e2\u20ac\u2122ll have much bigger numbers.<\/p>\n<p>I\u00e2\u20ac\u2122d suggest deleting this backup directory now; and we\u00e2\u20ac\u2122ll alter our script to be more realistic. It doesn\u00e2\u20ac\u2122t matter if you don\u00e2\u20ac\u2122t, but your next backup will make it look like your system went from black to full, which isn\u00e2\u20ac\u2122t really accurate.<\/p>\n<p>I prefer not to backup anything that the distribution installs; I\u00e2\u20ac\u2122m only interested in unrecreatable or unobtainable files. Here then is a reasonable backup script:<\/p>\n<pre><code>#!\/bin\/sh\nrdiff-backup \\\n    --exclude **\/.git \\\n    --include \/etc \\\n    --include \/home \\\n    --include \/srv \\\n    --include \/root \\\n    --include \/usr\/local \\\n    --include \/var\/backups \\\n    --include \/var\/lib\/ldap \\\n    --include \/var\/lib\/mysql \\\n    --include \/var\/lib\/postgresql \\\n    --include \/var\/log \\\n    --include \/var\/mail \\\n    --include \/var\/www \\\n    --exclude \/ \\\n    --print-statistics \\\n    root@remote::\/ \\\n    \/local\/backup\/directory<\/code><\/pre>\n<p>Note particularly the inclusion of <code>\/var\/log<\/code> \u00e2\u20ac\u201c one of the first things an attacker will do is delete all your logs. If you\u00e2\u20ac\u2122ve got them backed up, at least you\u00e2\u20ac\u2122ll have some indication of when the attack happened.<\/p>\n<p>I\u00e2\u20ac\u2122m slightly ambivalent about including <code>\/var\/lib\/postgresql<\/code> and other database directories. The problem is that the databases are live when the backup happens, and it\u00e2\u20ac\u2122s unlikely that these directories are in a self-consistent state. However, better to have them than want them \u00e2\u20ac\u201c one day it might save your bacon. I\u00e2\u20ac\u2122ll discuss real database backups another day.<\/p>\n<p>I\u00e2\u20ac\u2122ve also excluded <code>.git\/<\/code> directories. <code>.git<\/code> is the working directory storage for a git repository, it\u00e2\u20ac\u2122s so fluid that you\u00e2\u20ac\u2122ll get a lot of noise in your backups if you include it. Instead, your working directories <em>are<\/em> being backed up, and you should be pushing your repositories to a central location (as with databases, I\u00e2\u20ac\u2122ll come to an automated method for this another day), which you <em>should<\/em> include in a backup. This is entirely a judgement call, if you aren\u00e2\u20ac\u2122t limited by disk space, include them.<\/p>\n<p>Once you have your backups, <code>rdiff-backup<\/code> can help you interrogate them. Rather than repeat information, have a look at <code>rdiff-backup<\/code>\u00e2\u20ac\u2122s own <a href=\"http:\/\/www.nongnu.org\/rdiff-backup\/examples.html\">examples page<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>rdiff-backup is about the only backup tool I\u00e2\u20ac\u2122ve found acceptable. The key things for me: The (most recent) backup is readable on disk, as is. That is to say that there is no binary format. If the worst came to the worst, I could restore with cp. It keeps a rolling backup automatically. Each new\u2026 <span class=\"read-more\"><a href=\"https:\/\/www.fussylogic.co.uk\/blog\/?p=1172\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[97,98,6],"_links":{"self":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1172"}],"collection":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1172"}],"version-history":[{"count":1,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1172\/revisions"}],"predecessor-version":[{"id":1173,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1172\/revisions\/1173"}],"wp:attachment":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}