Future Tech

Um, what ever did happen with network automation?

Tan KW
Publish date: Tue, 18 Jun 2024, 05:30 AM
Tan KW
0 449,169
Future Tech

Systems Approach In thinking about the decade-plus worth of efforts to automate the configuration and operation of networks - of which intent-based networking may be the most well-known and ambitious example - are we actually any closer to the automation of networking that we were a decade ago?

I remember the moment when software-defined networking (SDN) started to make sense for me.

It was 2011 and I was in an auditorium at Stanford listening to Scott Shenker give a
talk titled: "The Future of Networking and the Past of Protocols." Up to that point, I had struggled to understand why the desire to make networks "software-defined" required a centralized controller that was separated from its data plane.

The crux of Scott's argument was this: Reasoning about the behavior of fully distributed algorithms is hard. And distributed algorithms are at the heart of networking. There is a reason why people who can configure BGP correctly in complex environments are highly sought after and widely viewed as wizards.

The solution proposed in Scott's talk, like so much of computer science, was to create a new layer of abstraction. The central SDN controller presents the abstraction of the network as one big switch, which is much easier to reason about than the distributed algorithms - notably, routing algorithms - that underpin traditional networking.

There is a reason why people who can configure BGP correctly in complex environments are highly sought after and widely viewed as wizards

Of course, abstracting away the complexity of the distributed algorithms inside a centralized controller is easier said than done, and therein lies the challenge of building a working SDN system. But I, like many others, was sold on this new vision of how to build networks. At the same time, many of my colleagues remained unconvinced, having learned that (a) central control didn't scale, and (b) you could never sell a central controller into a real network because it was a single point of failure. Getting past those issues is indeed one of the key challenges of SDN.

Coincidentally, I ended up getting a chance to meet Martin Casado a few days after Scott's talk. Martin, Scott, and Nick McKeown had founded Nicira a number of years earlier, and all that I really knew about the company was that they were active in OpenFlow standardization and implementation. Over the next few weeks I learned a lot more about Nicira as my interest in it grew from curiosity to "maybe I should try to work there."

By January 2012 I was indeed working for Nicira - just in time for the startup to come out of stealth mode and launch the Network Virtualization Platform (NVP). While "network virtualization" was the use case of SDN that launched Nicira and led to its later acquisition by VMware, it's easy to lose sight of the importance of network automation in this story. In fact, all the Nicira customers that I can recall from before the acquisition were using NVP for some sort of network automation project.

A typical use case for NVP in these early days was to support a developer cloud. A developer wanting to run some code in a distributed environment with multiple virtual machines interconnected by some network topology would request the VMs and network resources via a self-service portal. Provisioning the VMs was a fairly well-solved problem by 2012, but the network provisioning was painfully manual.

The key insight at Nicira was that a central SDN controller could expose an API to be called (by the self-service portal) as part of the provisioning process. So while the motivation for central control in Shenker's talk was to make networks programmable, it also served to make them automatically configurable. The self-service portal calls the API to make requests such as "create a layer 2 network," "connect the following VMs to that network," or "insert a router with NAT between the L2 network and the internet."

The SDN controller provides a single place to receive all those API requests, so that the self-service portal need not have any understanding of which network devices serve particular VMs. And importantly, even if VMs move around (as they are liable to do in modern virtualized datacenters) the SDN controller ensures that the requested network capabilities followed the VMs around.

In the time-honored tradition of software startups, we had our own "dogfood" environment that was used by our own developers. Anyone working on the NVP product could log into our local self-service portal (provided via OpenStack) and instantiate a set of VMs with appropriate networking services to run their code in a proper distributed environment. Our portal could provision a standard developer environment with one button push.

We used to refer to this as Inception - as in the movie - because of the layers of virtualization going on. Many instances of NVP could be running in our developer cloud, each serving up virtual networks, with the networking that each instance required being provided by an underlying instance of NVP.

In a sense you could view this as the beginning of "intent-based networking." Rather than provisioning VLANs and NAT rules on switches and routers, a developer only had to specify their intent: I want a network topology to interconnect this set of VMs I'm spinning up. The term "intent-based networking" would come later, and the vision was more expansive - but the basic idea was there.

Scaling out automation

But a funny thing happened as we tried to scale out the business of network virtualization. It turned out that automation was out of reach for most of our customers.

When we joined VMware in late 2012 and started to talk to customers about NVP, one of the first questions we would ask is: "What's your automation strategy?" And in most cases we would get a blank look - because there was no such strategy. The early customers of Nicira were, it turned out, unusually sophisticated. They had automated the provisioning of VMs to a level that was largely unheard-of among typical enterprise customers. And because they had lowered the friction needed for end-users to obtain virtual computing resources, the pain point of network provisioning had become extremely obvious.

Thus, the need for NVP was clear, but only after automation had been put in place for computing. And it didn't help that automation platforms were seriously immature at this point - the main options were OpenStack and the VMware platform later known as vRealize Automation (VRA), neither of which was for the faint of heart.

Fortunately for VMware and the network virtualization team, the introduction of distributed firewalling in 2013 opened up a whole other use-case: Microsegmentation. That turned out to be a huge success, as we've discussed previously. But that left network automation languishing on the sidelines again.

A good example of how a suitably sophisticated organization can automate its networking was reported by Jeff Mogul and his colleagues at Google in NSDI 2020. It's well worth reading the paper or watching the presentation. It illustrates both the complexity that needs to be managed before you can automate anything, and why smaller organizations might not go to such lengths to automate their networks. The pain is somewhat lower for a smaller network, and the resources needed to solve the problem cannot be found or justified.

I have the impression that the situation improved somewhat with the rise of microservices, and Kubernetes in particular. For all its shortcomings, Kubernetes has brought the automatic provisioning of distributed computing and networking resources to a much larger audience.

But the success stories that I can find for network automation are largely limited to situations where the network is virtual - whether it is to interconnect VMs as we did in 2012 or the more common case of container networking today. I am admittedly less plugged in to the world of physical networks than I once was, but my conversations with colleagues make me doubt that we are closing in on full automation for physical networks.

I'll be happy to be proven wrong here, but I'm not sure the industry structure and incentives are set up to make this happen - a topic that I probably need another column to cover. And just as self-driving cars have been "just a few years away" for more than a few years, I suspect that automating the management of physical networks is going to remain out of reach (for most of us) for a while longer. ®

 

https://www.theregister.com//2024/06/17/network_automation/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment