Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
 
Other projects
   Altirra

Archives

Blog Archive

How not to write backup software

I bought a Seagate 160GB external USB2 hard drive yesterday for the purpose of backing up my laptop's 60GB hard drive. Most of the stuff on my hard drive is junk not work saving — often in very large files called test.avi — but I have a lot of source code that's worth keeping, and up to this point, hasn't been backed up in any reliable form except for a few dozen archives on SourceForge. Recently I thought I should be a little less cavalier about my data and actually try a backup solution.

The drive came with a backup program called BounceBack Express. I ordinarily take a rather dim view of software that comes with hardware; the drivers are usually out of date and the supplemental software is usually a crippled version. This was no different, in that the BounceBack Express software is a reduced-functionality version of the BounceBack backup program. I decided to try it out anyway, though, just to see how good (or bad) it was. The backup drive was more than twice as large as the source disk anyway, so backing up junk wasn't going to be a big deal. So I just decided to just let it back up the whole drive on default settings.

With the verification option on.

Verification is a simple feature to implement. The idea is that once you've backed up data onto the drive, you read back the written data to make sure it matches the source to make sure the backup is good. After all, the worst that can happen is that the primary drive fails, you try the backup, and then realize that the backup wasn't made properly and is useless. The BounceBack Express software re-reads the backup file after writing each one, comparing it against the original.

Except that when it finished copying a 300MB file and began its verification pass, the backup drive's activity light went off.

I have a lot of memory in my laptop, 1.2GB, most of which is not used directly by running programs. As soon as I saw that activity light go off I immediately got the sinking feeling that the makers of the backup software had made of the most classic errors in verifying a file copy on Windows. The next time I saw a large file get transferred, I launched Sysinternals FileMon, and dumped a trace of the verify pass (edited for brevity):

11877   FASTIO_READ   C:...BigFile  Offset: 1245184 Length: 65536
11878   FASTIO_READ   F:...BigFile  Offset: 1245184 Length: 65536
11879   FASTIO_READ   C:...BigFile  Offset: 1310720 Length: 65536
11880   FASTIO_READ   F:...BigFile  Offset: 1310720 Length: 65536
11881   IRP_MJ_READ*  C:...BigFile  Offset: 1376256 Length: 65536
11882   IRP_MJ_READ*  C:...BigFile  Offset: 1441792 Length: 65536
11883   FASTIO_READ   C:...BigFile  Offset: 1376256 Length: 65536
11884   FASTIO_READ   F:...BigFile  Offset: 1376256 Length: 65536
11885   FASTIO_READ   C:...BigFile  Offset: 1441792 Length: 65536
11886   FASTIO_READ   F:...BigFile  Offset: 1441792 Length: 65536
11887   IRP_MJ_READ*  C:...BigFile  Offset: 1507328 Length: 65536
11888   IRP_MJ_READ*  C:...BigFile  Offset: 1572864 Length: 65536
11889   FASTIO_READ   C:...BigFile  Offset: 1507328 Length: 65536

Notice that the reads from the target F: drive only have FASTIO_READ requests associated with then, not IRP_MJ_READs like the C: reads. Reading a Windows IT Pro article by Mark Russinovich confirmed my suspicions: a FASTIO_READ request can only directly satisfy a read request if the data is in the disk cache, and an IRP is required otherwise. Which means...

...the backup software wasn't actually verifying anything because it used buffered I/O and was re-reading the entire file from the Windows disk cache.

I don't think I'll be purchasing the full-version upgrade of this software.

Comments

This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.