Here’s a tangled web to unpick at your leisure!
Back in December, I finally switched off my 7 year-old laptop (a sturdy Thinkpad X40) that had been doing practically nothing except running Windows XP, on which was running my ISP’s Internet Connection software, and also sharing that Internet connection with Microsoft’s built-in ICS (“Internet Connection Sharing”). Effectively, my Thinkpad had been running as a router for years. In its place, I plugged the wireless Internet USB device into a Scientific Linux server and switched on IP forwarding: a little bit of re-configuration of my various clients (laptops, netbooks, desktops) and all could still access the Internet, but now without a flaky laptop (or a flaky XP installation) in the path.
So far so good.
Cut to today, when I’m building a Centos 6.2 virtual machine, using my now-standard technique of “installation over the network” (as described here). Every time I started the installation, it bombed out with a warning that ‘Network Manager couldn’t configure the eth0 interface’. I tried Scientific Linux 6.1, which I’ve used a bazillion times in the past… same story.
So I fiddled. Since it was apparently a networking problem, I switched my virtual machine from “bridged” to “NAT” and tried Scientific Linux 6.1 again: and it worked! I didn’t pause to think why NAT worked when bridged didn’t, but simply ticked it off as ‘fixed’.
Back, therefore, to trying to build a Centos 6.2 VM, this time using the now-proven “NAT” technique. Except that this time, it didn’t work. It started to, sure enough: at least I didn’t get the ‘Network Manager can’t configure’ error again. Indeed, the installation process correctly read my kickstart file and a couple of the installation files as stored on my web server. But then it failed to read the install.img file:
Now that just made no sense to me: the path mentioned was absolutely, 100% guaranteed to be correct… and I checked it multiple times to be certain of that. What’s more, “install.img” is merely the third file read during the installation process (updates.img and product.img are read first). So how come the network was fine for two files but bombed out on the third?!
I spent the best part of a day checking and triple-checking my entire setup: physical host to virtual web server connectivity was fine; paths as specified in my kickstart file were fine; IP configuration as specified in the kickstart file was fine… all dead-ends.
And then, for no particular reason that I can now recall, I decided to do a quick yum install dhcp on my web server. A slightly less quick edit of /etc/dhcp/dhcpd.conf and I had myself a working DHCP server.
And guess what? The Centos 6.2 install worked perfectly!
What’s more, I then went back and tried both Centos 6.2 and Scientific Linux 6.1 installations with the “bridged” network option, and both of them worked fine too!
Apparently, therefore, DHCP is essential for network installs of Linux. Why hadn’t I ever noticed this before? Because of Microsoft, that’s why!! (They have to be responsible for all the bad things that happen, right?) The Internet Connection Sharing software my faithful Thinkpad had been using to act as a router automatically switches on both DNS and DHCP. That ancient laptop had, in fact, been this house’s DHCP server without me ever really worrying about it. When I retired it in December, therefore, I lost DHCP -and although I’d been sharing the Internet connection on the Linux box, I’d not installed DHCP anywhere to take its place. I knew all of that: but it didn’t concern me.
The reason I wasn’t concerned is that we simply don’t use DHCP in this house! (Not knowingly, anyway!!) All my servers, desktops, laptops and what-have-you are carefully assigned fixed IP addresses. I even have a spreadsheet showing which addresses have been allocated and which are free! So, since we don’t use DHCP, losing a DHCP server by retiring ICS really didn’t bother me.
But suddenly, today we have proof that DHCP was necessary after all
Now, thinking about this with 20-20 hindsight , it’s all a bit obvious. How can a server you’re trying to build talk to a remote web server unless it already has acquired an IP address? Sure, the kickstart file it finds on the web server will eventually grant a proper, static IP address… but how does it get to read that file in the first place unless it already has some form of network communication with the remote server on which it’s stored?
That initial IP address is acquired easily in NAT mode: it’s the same address your physical host uses. Hence both builds in NAT mode at least started off fine, and Scientific Linux’s was completely successful. When you run in bridged network mode, though, your VMs have to acquire that initial IP address the same way any physical machine would: from a DHCP server. And since I didn’t have a DHCP server any more, not even a non-obvious one from Microsoft that’s surreptitiously switched on by enabling ICS, all my bridged builds started to fail.
The moral of the story: you need a DHCP server if you’re going to do networked installs of Linux, even if you don’t intend using DHCP after the initial installation has completed.
Now, a trawl around Google tells me that this is not necessarily a cast-iron rule: apparently, there are non-DHCP ways around this. But I’ve not tried the workaround suggested -and DHCP is so trivially easy to get going anyway that it’s not something I’m desperate to avoid using.
Of course, one mystery remains: in NAT mode without a DHCP server, the Centos installation started well enough and then bombed out; in exactly the same configuration, the Scientific Linux install sailed through the whole thing without incident. It would seem as if Centos starts off by using the physical host’s IP address (thanks to NAT) but then tries to switch to acquiring its own address (which, without a DHCP server, it can’t do), whereas Scientific Linux appears to make do with its initial IP address just fine.
Why this difference in behaviour? As I say, it’s a mystery at this stage -which makes this sort of thing fun and infuriating in equal measure, I guess!