Updated: March 23, 2016
Where has all the good space gone, and where are all the bytes? Where's the street-wise nerd to free up my disks? Anyhow, recently, I've come across a rather bizarre disk usage problem on one of my Windows 7 boxes. The 120GB SSD was suddenly running short of space, even though it was supposed to be only about 50% utilized. I set about exploring.
The obvious choice for this kind of task is Windirstat, which we have seen in use about 18 months ago. It's a simple, no-nonsense tool, and it will display a detailed map and listing of all the files and folders. Except in my case, it wasn't really helpful. The presented data only amounted to about 60GB. But there was a big chunk, labeled unknown, occupying 55GB of its own. I had no idea what this was, but it looked like something that should not be there. So I set about exploring some more.
The mystery unfolds
There are approximately a million different answers and solutions to this potential problem. If you search online for "Windows windirstat unknown file" as you may have actually done just before landing on this page, you will see people mentioning all sorts of wizardry related to how Windows works and counts its disk space. This makes it a little more difficult to find the right solution.
Which is why this article is also about what NOT to do. Deleting random files is a big no. Fiddling with system files should not be done. I also most warmly recommend you backup any important personal data before commencing on a data freedom journey. And only then slowly, carefully start implementing non-intrusive, non-destructive fixes first.
One of the recommendations is running chkdsk. Not bad. But then, it does not really help us understand why or how we may have ended up with 55GB of space eaten by some mysterious data somewhere. Before we can fix the problem, we need to understand if it is one in the first place. And since Windirstat does not help us properly identify the unknown files, we need a different tool.
TreeSize to the rescue
I'm a lumberjack, and I'm okay, I disk all night, and I check all day. Anyhow, this utility has a similar functionality to Windirstat, but it does it in a slightly different manner, plus less graphics. However, it might help us get a better view of the disk usage.
Indeed, TreeSize reveals a different result. Windows usage is 90GB rather than 35GB as Windirstat reported, and it turns out most of it is located inside System32\spool\drivers. This is a rather curious result, because there's no reason why you ought to have 50GB worth of printer drivers. Just as curious is the fact Windirstat did not reflect this information in its own report. Perhaps these are corrupt files?
Browsing through the x64 directory, I soon learned there were roughly 12,500 of these, each containing the exact same set of DLL. For some weird reason, I had these thousands of instances of drivers for a particular network printer, absolutely identical to one another. At this point, armed with enough knowledge and intuition, I removed all of them apart from a single folder. I also did not touch any other directories and files that were unrelated to the network device. Finally, I verified the printer functionality was unaffected by my delete action.
A subsequent scan with both Windirstat and TreeSize showed matched values in both tools, as you would expect, so the discrepancy was now resolved. The former had no more complaints about unknown files, and the disk usage was down to about 60GB. And there you go, problem fixed!
Conclusion
I do not know what triggered the weird creation of 12,500 folders containing the exact same driver set. It must have been a crazy startup script or a similar error. Apparently, the files were useless, or maybe even created in a non-standard fashion, which was why Windirstat struggled with the report. It does not matter, we solved it.
This article might not necessarily apply to your use case, but the principle is 100% valid. If you encounter unknown, inexplicable data, make sure you run at least one more check with a different tool. No matter how weird, the data should still be consistent, and if reporting tools struggle to give you a unified view, that's your first problem. Hopefully, this guide teaches you how to approach similar issues in the future. Correlation, consistency, comparison between good and bad systems, all a necessary part of healthy problem solving. Take care, and may your disks be free.
Cheers.