Updated: June 19, 2017
Madonna sang, once upon a time: Fedora don't boot, I'm in trouble deep, Fedora don't boot, I've been losing sleep. But I've made up my GRUB o-oh, I'm keeping my distro, hm, I'm gonna keep my distro hm. As you may have guessed, one day, I fired up one of my two instances of Fedora 25 on the G50 laptop, and it stopped booting.
Out of the blue, just like that. Now, remember the recent successful upgrade? Well, now I had one less healthy instance and a whole lot of paranoia, and then I also remembered how systemd made another Fedora go wonk, and how I was unable to recover from the problem. Time to investigate and see what can be salvaged.
Now, let me stop being witty for a moment and give you a precise account of the problem at hand. Fedora is not booting, and it drops into the rescue shell, where you can or should examine the issue manually, if you have the technical savvy.
Before you can or SHOULD do that, you need to understand the problem details. This is not an easy task when you work with systemd, because it keeps its logs in a binary format, and working with its control utilities is a pain. But then, you just may be lucky and see the boot messages on the screen, or they may be contained in the boot log.
In my specific case, I had two identical Fedoras on the host, so I was able to actually compare the two systems directly, and see if there's any correlation between the two distros. While this one particular instance was misbehaving (/dev/sda14), the other one, running from /dev/sda15, was working beautifully. And so we were all other distros and the Windows 10 operating system on the machine.
But we need to work methodically. This is the art of problem solving! We have several options here, including hard disk issue, partition corruption, filesystem corruption, systemd journal corruption, bootloader problems, and more. Let's analyze.
This is an interesting point. How do you work most efficiently, starting with the simplest tools? Deleting things before reading logs might be dangerous, so we will definitely start with some education.
In the rescue shell, I was able to actually open and read the /var/log/boot.log file. This information should also be displayed on the screen, but you may have refreshed or cleared the screen buffer. The boot.log was pointing to an issue with boot-efi.mount systemd unit file. I decided to examine it.
systemctl status boot-efi.mount
boot-efi.mount - /boot/efi
Loaded: loaded (/etc/fstab; generated; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2017-01-26 19:32:26 GMT; 1min 11s ago
Process: 926 ExecMount=/usr/bin/mount /dev/sda2 /boot/efi -t vfat -o umask=0077,shortname=winnt (code=exited, status=32)
Jan 26 19:32:26 tester systemd: Mounting /boot/efi...
Jan 26 19:32:26 tester mount: mount: unknown filesystem type 'vfat'
Jan 26 19:32:26 tester systemd: boot-efi.mount: Mount process exited, code=exited status=32
Jan 26 19:32:26 tester systemd: Failed to mount /boot/efi.
Jan 26 19:32:26 tester systemd: boot-efi.mount: Unit entered failed state.
From this log, we learn that for some reason, the system was unable to mount the EFI partition, which means the system cannot boot. This could be an issue with how the GRUB2 bootloader is configured. I compared the two Fedora instances, it is identical. Next please.
There might also be a misconfiguration with the /etc/fstab file, but no, the mount point and the filesystem for the EFI partition are all written down correctly. Next. Filesystem. Hm. Maybe we do have a problem with the filesystem? So I ran an fsck operation on the partition from within the other Fedora instance, and it came completely clean.
I thought at this point to see if systemd is not the baddie, as it had let me down once before, so I moved the journal away, and a fresh new set of logs was created on next boot, but this did not affect the problem at all.
I tried to manually mount the EFI partition, and realized the system could not do that, and then I had a sense of deja vu. Where have I seen this same issue before? Turns out we discussed this in my Ubuntu & initrd guide from 2009!
I compared the healthy and bad Fedora /boot contents, and indeed, they had initramfs.img files with different sizes! For the same kernel! Seemingly, apparently, for the same reason it happed with Ubuntu a long time ago, the creation of the initramfs image may have failed, and the FAT32 drivers were missing, preventing the mounting of the EFI partition. Aha!
And now, the solution became to simply copy the initramfs.img file for the relevant kernel from the healthy instance into the /boot partition of the bad Fedora. I could have also manually changed the GRUB bootloader of the distro in charge - MX-16 to boot a different kernel, or made a simple vmlinuz symbolic link hack, which would also do the trick. And indeed, my Fedora 25 was booting again, just like before.
It is funny to encounter the same solution to a slightly different problem eight years apart, but also encouraging in a way. Linux problem can be sometimes easy to fix. Everything is a file, right. BIOS, UEFI, still the same thing.
This guide may not be the biggest pearl of wisdom, but it sure does have its merits, and it is enshrined in the problem solving methodology proper. Step by step, simple things first. Component search. Comparison to a healthy system slash baseline. All there. Well, if you encounter problems where Fedora, or any other distro, is having woes with EFI, please consult this howto, and it may save you some bacon. We are done.