Updated: July 10, 2020
Do you know how you know you have a funny Linux problem? When it takes you more time to think of a suitable title for the article than the actual debugging. Because I encountered a rather bizarre network-related issue, and I spent a while trying to figure out what gives. I did solve it, and I'm sharing it now.
In essence, this is what happened. I found myself testing some new routers. In my KDE neon instance, I connected to the new wireless access point, and tried to browse. Nothing. I tried with a wired cable, and everything was fine. Then, I booted into a different Linux instance on this eight-boot machine, and the Wireless connectivity was working without any issues. Both systems were Ubuntu based, both using the 18.04 baseline. Well, time to figure out why my wireless was not behaving in neon.
Problem in more detail
To get a better understanding of the issue, I tried to do some basic pinging. This will give you a good indication whether you can actually reach any remote hosts, plus offer you an inkling of understanding if your DNS is set up correctly. Instantly, I noticed an issue:
From 192.168.2.107 (192.168.2.107) icmp_seq=1 Destination Host Unreachable
From 192.168.2.107 (192.168.2.107) icmp_seq=2 Destination Host Unreachable
From 192.168.2.107 (192.168.2.107) icmp_seq=3 Destination Host Unreachable
Trying to ping ANY address (say dedoimedo.com) would resolve to 192.168.2.107. What makes it worse is that the access point range was 192.168.8.X. So something was hard-coded to this different internal range, and was messing up my Internetz. I was able to confirm this by checking the routing table:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 192.168.2.1 0.0.0.0 UG 303 0 0 wlp2s0
link-local 0.0.0.0 255.255.0.0 U 1000 0 0 wlp2s0
192.168.2.0 0.0.0.0 255.255.255.0 U 303 0 0 wlp2s0
192.168.8.0 0.0.0.0 255.255.255.0 U 600 0 0 wlp2s0
For some ugly reason, whenever I'd connect to the 192.168.8.0 network, the 192.168.2.0 network would also be added - and set as the default gateway, which explains why there was no connectivity. But this only happened with the Wireless adapter and not the wired one. Clue, right there. But first, let's eliminate a few other options.
At this point, or rather, just before this point, I was inclined to blame systemd. But it turns out, there's nothing wrong with it. Now, there were/are bugs in it, which is why it comes up quite high if you search for any connectivity problems in Linux. To make sure that it was not the culprit, I disabled it:
sudo systemctl disable systemd-resolved
sudo systemctl stop systemd-resolved
I also removed the /etc/resolv.conf symlink, and edited the network managed configuration:
In the [main] section, I added:
No difference so far. We need to move on.
Another possible villain can be the Network Manager. Indeed, to verify it wasn't doing something odd, I deleted all the saved Wireless configurations, I deleted its configuration file, rebooted, and started fresh. This did not make any difference, either. Next culprit.
We can see that we can't blame DNS or the Network Manager. We can also see that the wrong gateway is configured every time I connected to the non-192.168.2.0 network. This most likely indicates something in the DHCP configuration - automatic (static) assignment of IP addresses. Indeed, perhaps there's a static rule somewhere? Lo and behold, there was! Under /etc/dhcpcd.conf, I had the following:
Something - and I'm using the phrase vaguely - had configured a static IP address and router, hence the problem. And then, I was able to remember what this something was. I had tested Pi-Hole on this box a while back. And after removing it, apparently, the configuration wasn't cleaned up. All the different bits and pieces were gone, but not the DHCP configuration. Not a problem when you use a network that matches the static routing, but a big one when you try a different range. As soon as I deleted the two entries for the wireless interface, things were fine! Issue resolved.
It is quite possible you will never encounter a problem like this. But if you do, jump to conclusions you must not, and you ought to work carefully, methodically. I must admit I suspected systemd right away, but then I slowly examined the different components in the network stack, until I had the villain isolated. As it turns out, an uninstalled piece of software had left configuration changes on the system, causing the issue. Worse yet, this happened many months later, so it's not always trivial connecting the two.
Hopefully, you've gained some insight into how to tackle network issues. Start simple, check basic connectivity, try to correlate information, and then work your way to the root cause. Here, I had Internet per se, but due to misconfigured routes, I couldn't really get to the right websites - or resolve the addresses correctly. This can throw you off course, but then, I eliminated systemd from the equation, figured out it wasn't the network manager that was doing anything wrong, and finally zoned in on the DHCP static IP/route assignment. Well, if this helps, buy me a virtual beverage of some kind somewhere sometime. We're done. ACK.