Subnavigation

The great crash of 2009

No, I'm not talking about the economy. That would be past anyways... And would be completely off-topic for a technology blog.

What I'm talking about a crash of the way Internet works. Several years ago, it was predicted that, unless we moved on by 2009 or 2010, we'd face serious problems. And, like many predictions, it may or may not come true. But this blog is about a symptom of this larger problem: IPv4 is running out. Quoting from Wikipedia:

David Conrad, the general manager of IANA acknowledges, "I suspect we are actually beyond a reasonable time frame where there won't be some disruption. Now it's more a question of how much."

Background

I've been passionate about IPv6 for several years now. In fact, that's what brought me to KDE in the first place in 2000, which indirectly contributed for my being here where I am now, in Qt Software. I added IPv6 support to KDE as my first contribution, thus making Konqueror one of the first browsers that were IPv6-capable. For several years, I was able to see the dancing turtle at the KAME webpage. It only stopped when I moved to Europe in 2006 and had to use one of those router boxes in my ADSL lines. Since then, I haven't had the opportunity to use IPv6 anymore.

And that's the truth for most people and companies. That means there's a proliferation of intranets using addresses reserved by RFC 1918 -- the famous 10.x.y.z, 192.168.x.y, etc. A couple of years ago, you'd still find companies that used public IP addresses on their workstations. Today, those are rare exceptions.

We're seeing problems with that setup now. Almost every router box you can get puts you on 192.168.0.x or 10.0.0.x. And since corporate networks use the same reserved addresses in their own networks, many people are unable to use VPN solutions because they lose access to their own local network.

Case study

Trolltech used to be a small company. In the past, the workstations in the Trolltech office in Oslo were given their own public IPs, but were firewalled against intrusions from the outside. As the company grew, we exhausted our address pool and then decided to move to private addresses, leaving the public ones for servers.

A simple setup was devised: the 10.1.0.0/16 network was assigned to the Brisbane office, 10.3.0.0/16 was assigned to Oslo, 10.4.0.0/16 was assigned to the Berlin office, 10.5.0.0/16 was assigned to Munich, and 10.6.0.0/16 was assigned to the U.S. office (then in Palo Alto). The network administrators at Trolltech probably had the good foresight of not using 10.0.* because that would clash with routers.

Now, you can expect many medium-sized companies to have similar setups, right? How about large companies? In those, however, the situation is very different. Instead of assigning large blocks per location, smaller blocks are allocated per network, just like on the Internet. That's because the range of IPs and blocks are limited. So you can expect a 50,000- or 100,000-employee company to use a much complex scheme.

The problem

And then Nokia acquired Trolltech.

Obviously the Trolltech IP allocation was too liberal for the Nokia network. It wasn't enough to plug cables in our routers and set-up the VPN links to an entry point to the larger Nokia intranet: we couldn't continue using the same IPs! Every single workstation and device that was/is plugged to the network needs to change addresses: not just our workstations, but also the printers, our embedded devices, our testfarm of exotic Unix flavours and jurassic Windows versions, etc.

In order to support a progressive migration, our network administrators set up NAT in the Nokia-Trolltech router devices. That is, every machine in the Trolltech network got an IP address in a temporary network in the Nokia intranet, so that the services can be routed. Then, once the machine/service is migrated, it gets its permanent IP. That means each machine has 3 different IP addresses reserved for it and, along with it, 3 different hostnames.

For example, my machine was called "lothlorien.troll.no" (IP address 10.3.5.19). To access it from the Nokia network, one could use "lothlorien.nokia.troll.no" (10.215.5.19). Now that I'm connected to the new intranet, it's known as "lothlorien.europe.nokia.com" (172.24.90.18).

Compounding the problem

As if that weren't enough trouble, there are more issues. We have several services hardcoded in our internal tools, like our internal pastebin, our task tracker database, our wikis, etc. More than that, we have unit tests for the Qt network classes, for which the test server's hostname was hardcoded.

That means someone in the Nokia intranet cannot run the unit tests: the host will resolve to the wrong IP address.

Change the tests, you say? Well, the migration isn't complete. In particular, the machines doing the automated tests aren't migrated, so we can't. Even if we did, remember that we'd have to do it again...

So the trick is to add it to /etc/hosts? That solves some of the trouble, but not all. For one thing, the Qt3Support network classes use Q3Dns, which bypasses /etc/hosts. For another, the classes doing FTP tests will not work, since FTP servers doesn't work behind NAT. And finally, our proxy tests won't work, since we end up telling the proxy server to connect to the wrong IP address (problem solved in Qt 4.5 by using the hostname instead).

So, in the end, those of us who are doing network development are in the situation in which we have to ignore FTP and some other errors because of this issue.

(The actual solution we're working on is to reinstall the network test server onto a VMWare image so that we can run it from anywhere, as well as distribute it sometime in the future).

What if we had had IPv6

Well, if we had had IPv6 to begin with, none of this trouble would have happened. First of all, we'd have been using public IP addresses from day 1, even if we were firewalled. Just like the old days in the Trolltech Oslo office.

The smallest allocation that is usually given is a network with a 48-bit prefix. That's enough for 65,536 networks of 64-bit (good for 18446744073709551616 IPs, but usually you'd split the network long before that). The 64-bit number is not chosen by accident: it allows IPv6 to operate in the so-called "stateless address autoconfiguration" mode. Each network interface has its own 64-bit unique identifier known as EUI-64, which it concatenates to the 64-bit network prefix (the 48 of your allocation, plus the 16 of the network ID), forming the 128-bit IPv6 address. To do that, all it needs is a router advertising the prefix over ICMPv6. For Linux-based routers, you can install the radvd daemon to do exactly that.

That means we wouldn't even need DHCP. That solves yet another problem of the transition: the policy of the Nokia network administrators is not to give static IP addresses to workstations. The rationale is that, by using DHCP, they are able to reuse addresses of machines not seen in a long time, as well as to find out when a network is reaching saturation. With IPv6, there's no such problem: there's no DHCP server to be maintained, there's no need to reuse the address, since the chances of exhaustion are remote. Network congestion will happen long before that. And we get to keep a static IP address, so that I can ssh into my workstation from my home via VPN.

That's assuming that we wouldn't need a change in the network prefixes themselves. That could happen, though, since we got a brand-new 34 Mbit/s pipe, or because Nokia wanted everyone to be inside their main corporate prefix. Well, even if there were the case, there wouldn't have been any disruption:

  1. First, we install the new hardware (routers, level 3 switches, etc.) and the new Internet connection
  2. Once the prefix is routed properly to our network elements, the routers in our networks start advertising a new IPv6 prefix
  3. Automatically, all IPv6 nodes assign a new IP address, without losing the old one. Yes, two IP addresses on the same interface (you probably have one more anyways)
  4. Then the network administrators do a search-and-replace on the DNS zone files, changing from the old prefix to the new one
  5. The routers then stop advertising the old prefix
  6. Since there would have been no NAT in place, we'd never have had the problems I presented for our FTP and proxy tests.
  7. As a final step, after the automatically-configured IP addresses expired, the old Internet connection is disabled

As a side note, if the A6 DNS record had been put on production instead of AAAA, our administrators wouldn't even need to do a search-and-replace in step 4: it would be a single change, in one place.

Conclusion

This was just one case study, showing how stubborness to move away from IPv4 can make our lives more difficult. The predictions I mentioned above are from several years ago. And we still have not seen the migration start. I guess it will really take hitting people where it hurts the most (their pockets) for the transition to start.

I wish circumstances were different.


Comments