Monday, July 1, 2013

Why isn't rsync deleting old files?

For many years now, I've been using the old Unix utility rsync to backup my computer to an external hard drive.  As outlined in two earlier posts (here and here), it has always seemed to work quite well for me... until recently.  After replacing a failed hard disk and restoring my /home filesystem, I noticed that one of my two backup drives contained many more files than did the other one.  After inspecting these files, it appeared that most of the extra files should, indeed, have been deleted many months ago.  I've always given the "--delete" option to rsync, and it's appeared to work, so why did these files not get removed?

It turns out that the default behavior for current versions of rsync is to stop deleting things if any sort of I/O error is encountered on the source directory.  This is probably wise, as it prevents erasing your entire backup if something causes the source directory to disappear during the run.  Here's what the rsync(1) man page on Ubuntu Linux has to say:

If the sending side detects any I/O errors, then the deletion of any files at the destination will be automatically disabled. This is to prevent temporary filesystem failures (such as NFS errors) on the sending side from causing a massive deletion of files on the destination. You can override this with the --ignore-errors option.

It turns out that I was probably trying to backup a block device somewhere that couldn't be copied, or maybe a web browser cache file changed out from under rsync, or some other single-file error occurred while rsync was running.  None of these errors would have caused mass deletions, and are therefore nothing to worry about.

I added the "--ignore-errors" option to my rsync script and re-ran it.  It deleted 421,000 files from the backup drive, which amounted to less than 1% of the total disk space.  Yes, I was more than a little scared when I saw that number.  I glanced at the list, and didn't see anything deleted that shouldn't have been, so I think we're on the right track now.

In case you're wondering, here's the rsync command line that I use to backup my desktop machine to a USB hard drive:

rsync -va --delete --delete-excluded --ignore-errors --stats --exclude "/tmp/*" --exclude "/mnt/*" --exclude "/media/*" --exclude "/dev/*" --exclude "/proc/*" --exclude "/sys/*" --exclude "/cdrom/*" --exclude ".gvfs" --exclude "cache/" --exclude "Cache/" --exclude "Cache.Trash/" --exclude ".thumbnails/" / $USBDRIVE

The entire backup script is a bit more intelligent, allowing me to pass a single, short, command-line argument to the script that determines whether to delete files or just copy new ones.  I had to add that option once upon a time after I almost accidentally wiped my backup drive right after my desktop drive failed.  Near-death experiences are good at forcing you to change your ways.  Now, cron runs the script nightly without deleting.  If all is well in the world, I run the script manually with the "--delete" option right after I plug in the drive.  Here's the entire script, which I call "syncusb":


#!/bin/sh
# Sync the important stuff from the live discs to the external USB drives.
# Written by Ben "Obi-Wan" Hollingsworth, obiwan@jedi.com
#
# After the near-death experience in Feb 2008, this was changed to not
# automatically delete files on the other side.  Now, it will merely
# add new files to the backup.  We can manually run this with the
# --delete flag if we're sure we want that to happen.
#

DEL=
if [ "X$1" = "X--delete" ]; then
        DEL="--delete --delete-excluded --ignore-errors"
fi

# Find the mount point for the USB drive.
# Labels for the two backup drives are NexStar-1 and NexStar-2.
DF="/bin/df -mP"
if [ `$DF | /bin/grep -c NexStar` = 0 ]; then
        echo ERROR:  Unable to find backup drive mount point.  Exiting.
        echo
        $DF
        exit
fi
USBDRIVE=`$DF | /usr/bin/awk '/NexStar/ { print $6 }'`

/bin/date
echo ========================================================================
echo    Syncing / to $USBDRIVE
echo
/bin/date > /etc/00-last-usb-backup
/bin/sync
time /usr/bin/rsync -va $DEL --stats --exclude "/tmp/*" --exclude "/mnt/*" --exclude "/media/*" --exclude "/dev/*" --exclude "/proc/*" --exclude "/sys/*" --exclude "/cdrom/*" --exclude ".gvfs" --exclude "cache/" --exclude "Cache/" --exclude "Cache.Trash/" --exclude ".thumbnails/" / $USBDRIVE
/bin/date >> /etc/00-last-usb-backup
/bin/sync
echo   
echo Current usage on $USBDRIVE
echo
/bin/df -m / $USBDRIVE
/bin/date


No comments:

Post a Comment

Please leave your comment below. Comments are moderated, so don't be alarmed if your note doesn't appear immediately. Also, please don't use my blog to advertise your own web site unless it's related to the discussion at hand.