Updated: February 1, 2016
Here's a nice little story for you. Several days ago, while copying data from one of my Windows computers onto an external, USB-powered hard disk, the system suddenly threw an exception, complaining about an I/O error on the external device. The copy procedure failed, and when I re-plugged the disk back in, it showed as formatted.
Does this sound familiar? Well, it's usually how every and any story with a failed disk begins and often ends. The data is forever lost, and the disk goes to its special resting place in the digital heaven. But sometimes, you might be lucky. Let me tell how you I went about reviving this FAT32-formatted disk, using some neat Linux tools. Follow me.
We discussed data recovery in the past. The best recovery is NO recovery. You simply have so many backups, and everything works dandy, that you never really need to recover. A loss of a computing device should not interrupt your production setup.
But let's say your setup isn't as foolproof as it ought to be. In that case, you might be forced to invoke the power of command line for some neat disk healing. Or as Marvin Gaye would say: Babe, when I get feeling, I need Linux distro healing. Somethin' like that.
My external disk happens to be a 1TB Western Digital Elements device, past its fifth birthday, purchased in August 2010, used rigorously and almost daily. It has a single FAT32 partition, which makes it compatible with most of the devices and use cases it serves. Sure, it isn't the fastest, and it takes a while initiating. But then, it does its job reasonably well.
At least it used to, until its failure. To be fair, for months now, any time I'd plug it into a Windows box, it'd complain and demand a disk check, as it might have been forcefully ejected there and then. I ignored the error until it stopped being recognized by my Windows machines. That was a sure sign I should look into the problem.
As always, whenever I'm facing a non-trivial computing challenge, I will use Linux. It simply offers a much wider range of dedicated tools for repair, recovery and forensics, even for Windows. Moreover, you get more logging and better control of what you're doing, which can help you narrow down the problem and fix it.
So if you're facing an issue with a disk, the first thing you should do is try connecting it to several other machines, including those running Linux. In my case, the disk was properly detected and mounted in Linux Mint, which means the issue is most likely less serious. If you can get your disk to show its data, the damage probably isn't physical. There could be bad sectors lurking somewhere, but since the data is accessible, we will be able to work around the problem.
There are many tools for checking and repairing the integrity of MS-DOS filesystems. One such tool is called dosfsck, and it should already be installed in your distribution. If not, it ought to be available in the repos. The tool can inspect and fix FAT32 partitions, as simple as that. A typical invocation is as follows:
sudo dosfsck -w -r -l -a -v -t <device>
What do we have here then? -w will write changes to the disk immediately. -r will ask you about the repair method, if it has more than one way to fix an inconsistency. -l will list the names of files being checked; can be useful to determine if there are problems with any particular object. -a will automatically repair the filesystem. -v is the verbose mode, which should be helpful. -t will mark bad clusters, so they aren't used. Lastly, you need to specify the right partition.
sudo dosfsck -w -r -l -a -v -t /dev/sdb1
fsck.fat 3.0.26 (2014-03-07)
fsck.fat 3.0.26 (2014-03-07)
Checking we can access the last sector of the filesystem
0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
Automatically removing dirty bit.
Boot sector contents:
System ID "mkdosfs"
Media byte 0xf8 (hard disk)
512 bytes per logical sector
16384 bytes per cluster
32 reserved sectors
First FAT starts at byte 16384 (sector 32)
2 FATs, 32 bit entries
244072448 bytes per FAT (= 476704 sectors)
Root directory start at cluster 2 (arbitrary size)
Data area starts at byte 488161280 (sector 953440)
61017565 data clusters (999711784960 bytes)
63 sectors/track, 255 heads
0 hidden sectors
1953515520 sectors total
Checking file /
Checking file /ESSENTIA.LS
With my disk, the inspection ran for about 9 hours. There were only about 1,700 files and folders on the disk, still, it took quite a while. However, the disk did not stall or shout or sputter during the check.
Checking for bad clusters.
Reclaiming unconnected clusters.
Checking free cluster summary.
/dev/sdb1: 1697 files, 45939806/61017565 clusters
After completing the fix, I did not get a summary of bad clusters, but I would presume there had been some, otherwise Windows would not have complained. On the other hand, it could have been a simple matter of running a health check, simply because the dirty bit was set, and Windows was all fussy about it.
Two weeks after the check, the disk continues working normally. All subsequent mount/umount operations worked fine, there have been no hiccups, and I have not encountered any new problems copying data on and off the disk. I will, of course, be replacing the disk with a new device, and relegate this one to miscellaneous tasks. However, the issue seems to be largely resolved, and it will be interesting to follow up and see when and if it might really die. This is somewhat similar to the problem I had with an internal disk on one of my previous desktops. While statistically it was supposed to fail, it marched on for long after the incident, while it was a second, completely quiet, innocent and error-free disk that had suddenly chosen to die.
I will also see if converting the disk from FAT32 to NTFS will make any performance difference. This is an unrelated topic, and one that can only be completed in Windows, but it should give me an even better indication of the disk's health. Moreover, you can also use fsck.vfat for FAT32 filesystems. Typical usage is as follows: fsck.vfat -r <device>.
What this little exercise has taught us are three things: 1) Always have backups so you can ignore disk failures 2) If you encounter a disk-related problem, test it in Linux, too 3) You should consider running filesystem repairs using Linux. We won't go into the ideology battle here, but you have a better chance of finding the right tool with it, and the right help. And when things are looking tricky on your end, you don't want to waste any time searching online for dubiously reputable magical recovery software.
Having hard disks fail is a certainty. You should always prepare for it. In fact, too much time is wasted on hyping the unnecessary malware drama, whereas most people should really be worried on keeping their data backed up, multiple times, in multiple locations. Alas, too many folks overlook this most critical aspect of digital life, and then they lose it when the inevitable disaster finally happens. Don't be that person. And if you happen to have a failing FAT32 drive, this article might save you your bacon, and maybe even serve you some poached egg, with chives and black pepper. Enjoy.