Updated: February 3, 2020
Three years ago, I wrote an article that explaining how to recover from a failed boot following a major version upgrade in Fedora. At that time, I was working with Fedora 25, and suddenly, I was no longer able to get to the desktop. The issue turned out to be a buggy initramfs, which is an issue I've only encountered once in the past, back in Ubuntu, back in 2009. Since, it's been quiet.
Well, the wheel of time has dumped us back at the beginning. The same issue happened again. I had (somewhat) recently upgraded an instance of Fedora 29 to Fedora 30, and lo and behold, I found myself facing the same problem. Almost. I had a black screen, and a message that said: Cannot open access to console, the root account is locked. At this point, trying to do anything didn't yield any results. I could only reboot. I did try another kernel, and this helped - I got to my desktop. While the issue seems to be similar, I had to go a slightly different way about fixing it.
Problem symptoms in more detail
I found myself quite stumped by the lack of accessible diagnostic tools or a usable environment in which I could troubleshoot the issue. To have to reboot means losing possibly valuable information. At the very least, I could boot into other kernels, which meant I had something to work with.
In the latest-1 kernel, I first tried to follow my own advice from two years ago, but the systemctl status boot-efi.mount command did not show anything of use. I am always stumped by how fragile, complex and not really human-friendly the new and modern boot framework is. The previous issue prompted me to write my Progress through complexity article, and its conclusions still hold, years later.
The simplest obstacle is the availability of information under /var/log. Back in the good ole init days, you would typically find the following there: syslog/messages, boot, old boot and kernel logs. These were text files you could easily and instantly inspect with cat, less, whatever.
In Fedora, you do get some - but not all of these files. For instance, boot.log is there, but then:
sudo less boot.log
"boot.log" may be a binary file. See it anyway?
Funnily, it IS a text file (with some weird characters), and you can actually display it with cat. But this file did not yield any useful information that would help me pinpoint the issue.
I then spent a bit more time reading on different options for journalctl, and it does give you the option to see the previous boot logs. You can do this by providing a negative integer value to see the old logs. This is not very intuitive, but at least it gave me what I needed, although I resent the binary log format idea in principle. You've seen this recently when I debugged the borked laptop issue. Similar theme.
Here, I went through the lines, looking for errors. While the systemctl thingie didn't help earlier, with this command, I did eventually come across the critical boot problem:
Jul 11 13:55:07 tester systemd: Mounting /boot/efi...
Jul 11 13:55:07 tester mount: mount: /boot/efi: unknown filesystem type 'vfat'.
Jul 11 13:55:07 tester systemd: boot-efi.mount: Mount process exited, code=exited, status=32/n/a
Jul 11 13:55:07 tester systemd: boot-efi.mount: Failed with result 'exit-code'.
Jul 11 13:55:07 tester systemd: Failed to mount /boot/efi.
Jul 11 13:55:07 tester systemd: Dependency failed for Local File Systems.
Jul 11 13:55:07 tester systemd: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jul 11 13:55:07 tester systemd: local-fs.target: Triggering OnFailure= dependencies.
Very similar to what happened three years ago. So again, we seem to have a badly assembled initramfs, which seems to be happening way too often for my taste. Plus, it's correlated to the version upgrade. I do wonder what can be so tricky about the FAT32 module, but then, that's a question for someone else. As far as initramfs files are concerned, I had the following under /boot:
ls -ltr initramfs*
-rw-------. 1 root root 73443963 May 19 2018 initramfs-0-rescue-efe3eec4bb6646fe864735812f4d094b.img
-rw-------. 1 root root 22953495 Apr 2 15:54 initramfs-5.0.4-200.fc29.x86_64.img
-rw-------. 1 root root 22961687 May 20 13:11 initramfs-5.0.16-200.fc29.x86_64.img
-rw-------. 1 root root 23015208 May 20 21:17 initramfs-5.0.16-300.fc30.x86_64.img
The last one was the culprit, while the one before (above) worked fine. Whatever happened during the update rendered the second initramfs corrupt, without the vfat module to allow the correct mounting of the filesystem. Out of curiosity, I decided to extract the images to see the differences - which confirmed my suspicion. Again, this wasn't the most trivial of exercises, because you cannot use zcat and cpio to extract the initarmfs files like in the past, you need a more complex combo:
/usr/lib/dracut/skipcpio initramfs-"version".img | zcat | cpio -id
Well, you have several options here. One, if you have a second copy of Fedora on the same box, and it is working, then you can copy its initramfs file over and use it, like I did in Ubuntu back in the day. This isn't a trivial option, but if you have it, you're lucky!
If you don't, then older kernels should help - like they did in my scenario. Then, you can run a system update or manually recreate initramfs files. You can read my Ubuntu slow boot article for details on how to do this. If you can't boot into any other kernel, and you have no other Linux instances on the host, then your last option is to use the live session and then perform the recovery there - or reinstall.
I am surprised and somewhat dismayed by this situation - all of it. The error itself, the inability to debug live, the fact this happened to me after a Fedora upgrade (again), the fact I ended up with an unbootable distro after ordinary system activities, the overall complexity of systemd. All of these leave me with a sense of unease.
In 2020, the world of technology isn't any more abstract, robust or resilient than it was ten years ago. On the contrary, errors keep on happening, and when they do come to bear, the surrounding ecosystem is far more difficult to use and work in than in the past. This makes troubleshooting and problem solving more frustrating. So yes, I fixed it, and perhaps that's what counts, but then no, that's not what counts. A seamless user experience is the end goal. Alas, day after day, the Linux desktop is slowly drifting farther away from this noble mission, becoming more and more irrelevant in the greater scheme of things. Anyway, on the technical side, I hope this article was useful. Take care.