Tuesday, January 18, 2011

Can Large Scale NAT Save IPv4?

I've written previously that as we make the slow - and long overdue - transition from IPv4 to IPv6, we will soon be stuck with an awkward interim period in which the only new globally routable addresses we can get are IPv6, but most public content we want to reach is still IPv4. Large Scale NAT (LSN, also known as Carrier Grade NAT or CGN) is an essential tool for stretching a service provider's public IPv4 address space during this transitional period.

I've yet to work an IPv6 project involving LSN in which someone does not eventually, with great hope in his eyes, say, "If LSN extends the life of our IPv4 space, why are we going to the pain and expense of deploying IPv6? Can't we just deploy LSN and forget about IPv6 for now? Perhaps until I retire?"

A first look LSN does indeed seem to promise an extended lifetime for IPv4. Could it even mean that the Internet never has to transition to IPv6?

This article looks beyond the mechanisms of LSN itself to examine the implications of LSN in a practical network, and why this useful technology should never be viewed as anything other than an interim solution.

A Quick Review of LSN Architectures

A traditional broadband service provider network conserves IPv4 addresses by assigning a single public IPv4 address to the outside interface of a NAT residing at the edge of each customer network. Behind the NAT, all devices are assigned a private IPv4 address. The NAT works by mapping each application flow - as identified by the combination of a private IPv4 address and a TCP or UDP port - to the public IPv4 address and one of its TCP or UDP ports. In other words, NAT multiplexes the addresses of many inside devices to a single outside address by mapping application flows.

Ports are 16 bit numbers, so potentially 65,536 TCP flows and 65,536 UDP flows could be mapped to a single IPv4 address. The average household or small office does not generate nearly this many flows at one time, making address translation at the edge of such small networks an inefficient use of a public IPv4 address.

An LSN is a "centralized NAT" placed in the service provider's network. Whether this is in addition tothe NAT at the customer edge, as with NAT444, or instead of the customer NAT, as with DS-Lite, the LSN concept is the same: The public IPv4 addresses are pulled away from the customer edge, where their multiplexing capacity is not efficiently exploited, to the outside of the centralized LSN where many customer networks can share a single public IPv4 address.

LSN architecture design, then, is mostly figuring out the strategic placement of each LSN to best use the capacity of each public IPv4 address without oversubscribing the address or overtaxing the LSN itself.

Although only a few studies of per-user port usage have been done, an LSN should be able to support 3000 - 5000 users per public IPv4 address.

These numbers, coupled with the tens of thousands of public IPv4 addresses broadband service providers currently hold for customer assignment, do appear to make LSN a practical alternative to near-term IPv6 deployment, adding years to the life of IPv4.

Before coming to such a conclusion, the implications and practical impact of LSN must be considered.

Who Are You?

A long practice in the networking industry is to identify a user by IP address. This is especially the case when the user might not want to be identified or when the identification of the machine is more important than the identification of the individual using it.

By centralizing public IPv4 addresses, each address no longer represents a single machine, a single household, or a single small office. The address now represents thousands of machines, homes, and offices related only in that they are behind the same LSN. Identification by IP address becomes difficult or impossible.

Obfuscation of the network behind a NAT has long been considered (incorrectly, in my opinion) a security benefit. Obfuscation of large groups of networks, with nothing in common except the use of the same broadband provider, creates an unprecedented set of challenges.

One of those challenges is not administrative or technical, but an opening for an undesirable social behavior within certain Internet communities.

Making Mischief

I enjoy participating in a few political discussion groups on the Internet, for the learning experience and for the fun of debating political issues. As I was contemplating the ramifications of LSN one evening I realized that LSN could introduce a new and unwelcome phenomenon on such sites.

If you have ever participated in an open Internet discussion group, particularly one that deals with contentious issues, you are probably familiar with the concept of a "troll." A troll is someone who is not really interested in the discussion at hand, but instead enjoys making outrageous or inflammatory remarks just to upset the other participants. They are a part of many websites where the general public is allowed to register and leave comments, and they are particularly attracted to political and religious websites. I remember the occasional troll even on the old Cisco Usenet newsgroup, comp.dcom.sys.cisco, in the mid 1990s.

Sometimes a troll will go too far, and the moderators of the discussion group will "ban" him by deleting his user account. And sometimes a banned participant will simply create a new Hotmail or Yahoo e-mail address, register back to the site under a different user name, and continue trolling until banned again.

To prevent this "repeat offender" behavior, some websites will ban a misbehaving user by IP address rather than user name. This is assumed to be more effective, by banning the user's machine rather than any account he might create from that machine. If the IP address is on the outside interface of a home or small office NAT, blacklisting it might restrict others in the home or office from accessing the website but altogether few "innocent bystanders" are affected.

What happens, though, if a website bans an IPv4 address on the outside of an LSN? In the effort to restrict a single user, thousands of people will be inadvertently restricted - generally all subscribers on a CMTS or a group of DSLAMs behind the LSN.

A malicious user with a grudge against a particular website might, if he knows his provider is using LSN, intentionally get himself blacklisted by IP address on the site in order to simultaneously get a few thousand of his neighbors banned - he will have performed a small-scale DoS attack by causing the site administrators themselves to unwittingly perform the denial of service.

Black and White

Remote sites are not the only ones occasionally needing to black-list a user based on an IP address. The local provider also needs black-listing capability. Some also use white-listing: the addition of some preferential treatment or pre-approval. Generally white-listing and black-listing are used in conjunction with spam and virus control, but black-listing can also be applied to enforce use policies.

Black- or white-listing may need to be split in an LSN architecture. Polices applying to incoming sources must be implemented on the outside of the LSN; once the packets are translated, they cannot be easily identified by IPv4 address without some correlation with the LSN's mapping table. Policies applying to outgoing sources - that is, sources within the customer networks - must be implemented on the customer-facing side of the LSN for the same reason.

Lawful Intercept

Centralized address and port translation within the provider network presents serious challenges to the compliance with lawful intercept requirements such as CALEA. DHCP assignments to traditional networks with NATs at the customer edge change infrequently, making interception easy. Lawful intercept might still be reasonably easy with NAT444 architectures, as long as the interception happens between the CPE NAT and the LSN. The dependency here is whether both the inside and outside addresses are of interest, or only the inside addresses.

Because of its IPv4-in-IPv6 tunneling, interception in DS-Lite architectures must be performed on the LSN itself. Timestamped logging of the address and port mappings at the LSN must be maintained, which in turn can add a heavy resource burden to the LSN devices. Logging to a storage device off the LSN may also contribute to network load.

Wiretapping of a single subject may mean statically mapping the user to a certain range of ports on a single address, to remove the need to follow dynamic port mappings. A single IPv4 address, or some range of ports for each address, might be set aside for wiretapping purposes to simplify such procedures. But any requirement that all users behind an LSN be logged is going to mean logging not only traffic but all changes to the mapping tables.

Traceback

The timestamped logging of address and port mappings is essential not only for lawful intercept but also for tracing back specific users when a problem is identified from the outside of the LSN. Such a problem is usually a misbehaving user - a spammer, a DoS source, or someone violating a usage policy - and identification of the user might result in black-listing, cancellation of service, or covert observation for legal action. Without time-specific logs of the address and port mappings, a misbehaving user stays well hidden behind the LSN.

But where lawful intercept might require logging of one or a few users, logging for traceback purposes could mean logging all users, at least at some sample rate, causing a massive consumption of device resources. A compromise step might be to begin traceback logging only when a problem is detected; while using far less resources, it assumes that the undesirable action will continue long enough for all or most of the traceback to be performed in real time.

Double Trouble

A longstanding complaint about NAT44 is that it breaks some applications that reference the IP address of its packets. In a perfect world - or at least the conceptual world of IP networking - applications would be agnostic to the network layer and thus immune to the address changes through a NAT. But the reality is that many applications do reference the IP address. For the ubiquitous user edge NAT, work-arounds have been created for some applications.

The double-NAT structure of NAT444 can be expected to break some applications that will work through a single NAT layer. A few MSOs are currently conducting trials to determine what will be affected by NAT444, and therefore what impact it might have on their customers.

DS-Lite avoids the double NAT problems of NAT444, and presently appears to be the preferred solution for most broadband providers. But some LSN vendors still have DS-Lite on their roadmaps rather than in their products, and CPE with DS-Lite support is rare. This solution is therefore not as immediately available as NAT444.

An Imperfect Necessity

There are a host of other concerns around LSN: Single points of failure, potential address pool depletion attacks, performance and scalability, effects on fragmented packets, effects on asymmetric traffic flows, required modifications to provisioning systems, required modifications to internal accounting systems.

Because we have waited far too long to begin implementing IPv6, Large Scale NAT has become an unavoidable necessity for supporting dual stacked broadband customers in the face of a depleted IPv4 address supply. But the problems and complexities LSN introduce to a network mean that it should never be viewed as anything but a transitional technology. It is no substitute for IPv6.

No comments: