Does Linux need an occasional cleanup?

Updated: March 23, 2022

If you were old enough to use computers in the mid-2000s (just the decade right), and you happened to be running Windows and Linux, and, being a nerd, you also participated in discussions around the benefits of this operating system over that, then you must have come across the following statement: you don't need to do any system maintenance on Linux, it's smart enough to handle it all by itself.

Indeed, on the Windows side, there was often talk around systems getting slower over time, dire need for defragmentation (in NTFS as opposed to Ext3), cleanup of temporary files, and such. Linux was often touted as maintenance-free. Now, the question is, how true is this statement really? I actually had a chance to test it for myself, all through temporal chance.

What happened?

I was doing some package installation and removal in the Kubuntu 18.04 instance on my Slimbook, and all of a sudden, the operation failed. Apt, the (command-line) package manager told me that it could not complete an operation, as there was no space left on device [sic]. What.

gzip: stdout: No space left on device
E: mkinitramfs failure find 141 cpio 141 gzip 1
update-initramfs: failed for /boot/initrd.img-4.15.0-163-generic with 1.
dpkg: error processing package initramfs-tools (--configure):
installed initramfs-tools package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
linux-image-4.15.0-166-generic
initramfs-tools

The interesting thing here is: I was working normally, no disruptions. There WAS space on my disk, albeit a relatively small one (and smaller than I thought, but still some ~20GB left on the 500GB SSD), and more importantly, NO message or warning from the system that there was a problem.

But then, I examined the contents of the /boot partition, and lo and behold:

df -lh /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       705M  705M  0M  100% /boot

What. For some odd reason, my not insignificant /boot partition had been filled to the brim, which meant any kernel or initramfs operation from now on would fail. This is not a good thing, especially when I think about, hey, what if there was a kernel update?

I then decided to run the "autoremove" command for apt, which cleans unused packages. After several minutes of rigorous purging, the situation was back to normal:

df -lh /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       705M  155M  500M  24% /boot

The lesson here

It turns out, with a full disk encryption setup in place, which I had no control over when implemented by the installer three odd years back, as the sizes of the different partitions and such were set automatically, I ended up running out of /boot space after roughly 3.5 years of use, with the typical Ubuntu family cadence of kernel updates.

Now, of course, the remedy is obvious - purge unused packages occasionally. To be fair, apt ALWAYS tells you this when you run any maintenance command. It tells you that there's a lot of stuff that can be removed, which ought to trim down the disk usage. On the command line.

But. But. No such message is ever shown when running GUI package managers like say Discover. This means that ordinary people using Linux in the classic desktop fashion, i.e. NO command line wizardry, will eventually end up in not-so maintenance-free problems on their Linux machines, if they keep using them long enough.

As part of my cleanup, I also ran ncdu, and found all manner of cruft on the disk, including lots of leftovers under /var, containers and virtual machines long unused and never cleaned up by their respective services. Again, something that can be remedied by forethought, which would include: a) better separation of core system directories from user files (no keeping virtual machines under /var) b) better auto-cleanup c) actual warnings and notifications to the end user.

I have highlighted this overall problem of limited cross-service awareness in Linux a long time ago. There isn't really any true cohesion. If your system is stuck in a boot loop, nothing will break this loop unless you manually intervene. If there is a service that slows the system down, a buggy piece of code, something that doesn't integrate well with the system, there is nothing to tell you that. Everything lives in its own space, and worse, without any self-check or self-healing mechanisms.

Conclusion

This is my little anecdotal story. For the first time ever, I ran into a problem. Previously, I'd use a single root partition for my systems, sans disk encryption, which meant that /boot was as expandable as the system itself, thus space after updates was never an issue. But here, I stumbled across something that could and will bite people who use their systems in non-nerdy fashion (via GUI) and don't do any manual maintenance labor.

The true solution is to make Linux systems more resilient and have much, much better error management, for each service or program on its own, and then across the entire system. That would also help resolve tons of other problems and bugs that often go undetected or unreported, as people simply have no tools of figuring out what went wrong, or how different components interact. Well, there you go. Perchance you will find this interesting. See you around.

Cheers.