Updated: February 19, 2020
As you already know, I like to do long-term tests and reviews of hardware and software that I use. Over the years, I've given you my take on how different operating systems progress and change, how different laptops cope with the passage of time, and now, I want to embark on my most ambitious long-term project yet. A reliability study of hard disks. I've waited fifteen years to publish it.
Because I needed time to gather data that has value to the readers. Unlike Google and Backblaze, I don't have thousands of disks buzzing in a data center, so I couldn't just provide any sort of results quickly. But I think you will find this study valuable, as it took place in my production setup, under real-life conditions most home users could or would encounter.
Hard disk usage conditions
To make this relevant, it is important for you to understand how my setup is wired:
- The table below does not include all the devices I had used in the past 15 years; for example, I've not listed a few Thinkpads I had (like say T61 or T400) and such, because they were used for purposes that go beyond the main purpose of the study above, so in order not to taint the results, I had them excluded. However, if anything, their inclusion would only make the results even better. Among the different laptops I used, only one had a disk failure (an old T42p, when it was already long in the tooth), and all others finished their useful service without any issues. So this table is conservatively optimistic.
- All permanently fixed/used disks are attached to computers that are powered by Uninterruptible Power Supply (UPS). This also includes external hard disks that come with their own power supply - typically the large external enclosures (e.g. WD My Book).
- Laptop hard disks are used with the battery present in the laptop chassis.
- Most of the disks in my setup show a temperature of 35-45 degrees Celsius.
- I am partial to Western Digital hard disks, which is why they are the majority of listed devices.
- I have listed mechanical disks only; no SSD. I don't have enough information on SSD yet.
- Only hard disks used for more than six months are listed.
Legend
The table below summarizes my findings. Now, here are some explanations before we delve deeper.
- Disks are sorted by year (third column - From) - when they were first introduced into my setup.
- To - indicates the current date and state. Now means the disks is still in use. Year means the date of decommission.
- Type - We have (D)esktop, (L)aptop and (E)xternal hard disk (all powered by USB).
- Usage - Denotes how the disk was/is used. 24/7 indicates a device that is constantly on. X/M denotes a disk that is used periodically, with X being the number of days the device is used in a month, on average. For example, 1/M is a disk that is used for one day per month, or 12 days a year. This could be 12 days used in a row, or powered on 20-30 times for several hours. 30/M means daily usage but not 24/7. I cannot be ultra-precise when it comes to external hard disks.
- Result - OK means the disk has been decommissioned in a healthy state or it is currently in a healthy state. OK(b) means OK but. In this case, we're talking about a disk that is working but has errors. F means the disk has failed. DOA means Dead On Arrival (purchased faulty).
- Notes - if a multiplier is used (Identical xNumber), it means there are that many identical disks in the setup, with the same results (makes the table shorter and easier to read).
- I did not list the exact models for my hardware, because there are tons of tiny variations between different models (like the year they were produced, the fab, the batch, the exact firmware version, and so on). For laptops, I used whatever information was available to determine the hardware.
Results
And this what we have:
Disk | Size (GB) | From | To | Type | Usage | Result | Notes |
---|---|---|---|---|---|---|---|
WD Black | 200 | 2005 | 2011 | D | 24/7 | OK(b) | Had a likely imminent fail SMART error but continued working for 1+ year without issues |
WD Black | 160 | 2005 | 2009 | D | 24/7 | F | Failed without prior warning |
WD Black | 250 | 2006 | 2012 | D | 24/7 | OK | Identical x2 |
Hitachi | 160 | 2008 | Now | E | 1/M | OK | Used in custom enclosure |
WD My Passport | 250 | 2008 | Now | E | 1/M | OK | |
WD My Book | 500 | 2008 | 2018 | E | 24/7 | F | Became inaccessible without prior warning |
Laptop disk | 320 | 2009 | Now | L | 3/M | OK | |
Toshiba | 320 | 2009 | Now | L | 5/M | OK | |
WD Black | 250 | 2009 | 2011 | D | 24/7 | OK | |
WD My Book | 500 | 2009 | Now | E | 24/7 | OK | |
WD My Book | 1000 | 2009 | 2017 | E | 24/7 | F | Would click on spin-up since day 0; exhibited heating and no spin-down two years before failure; became read-only |
Laptop disk | 500 | 2010 | Now | L | 5/M | OK | |
WD My Passport | 640 | 2010 | Now | E | 1/M | OK | |
WD Black | 500 | 2011 | 2020 | D | 24/7 | OK | |
WD Black | 2000 | 2011 | 2020 | D | 24/7 | OK | Identical x4 |
WD Essentials | 1000 | 2011 | 2015 | E | 30/M | F | Became read-only without prior warning |
WD Blue | 1000 | 2012 | Now | D | 24/7 | OK | Identical x2 |
WD Blue | 1000 | 2012 | 2017 | D | 24/7 | F | Became read-only without prior warning |
Laptop disk | 500 | 2013 | Now | L | 10/M | OK | |
Laptop disk | 1000 | 2014 | Now | L | 10/M | OK | |
Laptop disk | 1000 | 2015 | Now | L | 20/M | OK | |
WD Essentials | 1000 | 2015 | Now | E | 1/M | OK | |
WD Essentials | 1000 | 2015 | Now | E | 30/M | OK | |
WD Elements | 1000 | 2015 | Now | E | 1/M | OK | Identical x2 |
WD Elements | 2000 | 2015 | Now | E | 1/M | OK | |
WD Black | 2000 | 2017 | Now | D | 24/7 | OK | |
WD Elements | 2000 | 2017 | Now | E | 1/M | OK | |
WD Elements | 2000 | 2019 | Now | E | 1/M | OK |
Reliability calculations
From this table, we can see that I experienced:
- Desktop disks - 2/14 failures (14%) over a typical usage period of 5 years. It is also interesting to note that no disks failed after their fifth year of usage. In other words, 2/14 failures for disks anywhere between 3-9 years in use.
- Laptop disks - 0/6 failures (0%) over a typical usage period of 5 years.
- External disks - 1/14 failures (7%) over a usage period of 5 years, 3/14 failures over a usage period of up to 10 years. In contrast to desktop devices, only 1 failures occurred in the first 5 years of any disk's life, and the remaining 2 occurred in advanced stages of their use.
- Out of 5 failures, only 1/4 disks (20%) exhibited early signs of pre-failure.
- Out of 5 failures, 3/5 (60%) resulted in read-only devices where data could be read from and partially salvaged.
- Only 1/30 disks had a SMART error - and did not fail (can be considered false positive).
Based on the data, the estimated MTTF is roughly 61,000 hours, with 5/30 failures over average ~7 years of use (taking into account total usage time and disk age), which means I could expect 1 in 6 disks to fail after being used for about 7 years constantly, with the actual cumulative breakdown: 1 failure after 4 years, 2 failures after 5 years, 3 failures after 6 years, and 5 failures after 10 years.
Year 5 is the riskiest with 2% normalized annual failure rate (no data redundancy).
In other words, practically, if I keep two copies of any which data, the likelihood of data loss is 2.5% over a decade, or 0.06% for three disks. So this kind of confirms my backup strategy from a while back, and also shows that it is important for you to keep multiple copies of important files, if you want them to outlast your hardware.
Conclusion
There you go. I hope you find this 15-year-long study valuable. Of course, any techie like me could do it. All techies hoard hardware like mad, and I'm sure most of Dedoimedo readers have a bunch of computers and tons of hard disks strewn about, so it's just the matter of compiling the right data. And I'm sure every such compilation would be compelling. A compelling compiling, hi hi.
If you have any comments or suggestions about my findings, I'd love to hear them. Again, I don't have a massive data center, so I can't do an accurate comparative study between vendors, disks sizes and alike, so do take my results with a pinch of cardamom. But I believe my numbers are quite indicative for home usage scenarios, so if you're mulling how to handle your data down the long trouser leg of time, you have some indication of where to start, and how to hedge your odds. Take care.
Cheers.