Welcome back. We are continuing the module on programmable data planes and we are also continuing our discussion of how to make programmable data-planes more scalable. In this part of the lesson we'll focus on techniques that can be used to make hardware more programmable. Before we jump into the capabilities of current hardware and what we might do to make data-plane hardware more programmable, let's first explore what we really want from SDN and examine whether or not current hardware really provides the desirable features that we would like from SDN. One of the main goals of SDN is to support protocol-independent processing. In other words, we should be able to process traffic traveling through the network independent of any particular control protocols. We should be able to control network behavior and repurpose our network devices in the field without redeploying hardware and we'd like these functions to be implemented with fast low-power chips. Unfortunately, the hardware that is deployed in today's networks still constrains what we're capable of doing. OpenFlow is protocol dependent because of the constraints of conventional switching chips. The OpenFlow protocol has had to map its functions onto the capabilities of existing chips. Now, that mapping has enabled quick adoption, but at the same time, it's constrained what we might think about putting into a control protocol like OpenFlow. So it's worth asking the question, what we would do differently if we could completely re-design the data-plane. We'll explore this question in the context of two different projects. Both of the projects that we'll look at are rooted in the following insight: there are relatively few data plane primitives that a network device needs to perform. In other words, the set of functions that we want to perform on packets is actually pretty limited. We might need to do some bit shifting or parsing or rewriting of different header fields, various types of manipulations, traffic shaping, forwarding decisions and so forth, but it's fairly easy to enumerate that list of functions. Now we might compose those functions in different ways, but ultimately the building blocks that we need are in fact pretty limited. So this insight leads us to the conclusion that we can in fact build a flexible data plane by developing a fixed set of modules and coming up with ways or methods of integrating. In other words, we design hardware that provides the building blocks and then allows us to plumb those building blocks together and we get a fast programmable data-plane. We will look at that approach in the context of two different architectures. One is OpenFlow chip that provides generalizable, programmable match-action primitives. The other is a programmable, modularizable FPGA-based data plane called SwitchBlade. Let's first take a look at the OpenFlow chip. The OpenFlow chip design is a recent design exercise to see whether a chip could parse existing and custom packet headers and perform a number of sequential stages of match-action to build a much more flexible hardware data-plane. Let's take a quick look at the design of this chip. The chip is laid out with a RISC-like architecture, in other words, a reduced instruction set, that allows processing to effectively ride Moore's law. In other words, as the chips get faster and faster, we can process packets at higher rates yet we can still compose these instructions to perform fairly complex forwarding operations. The chip has as many as 32 stages of match and action. Let's take a look at what happens for matching and actions at each stage of the pipeline. Match tables need to be flexible. The match tables are tables that are laid out in two parts of memory, TCAM and SRAM. The table structure requires some creative memory management because often processing does not require 32 stages of match and action. And yet, there may be tables that need to be fairly big, often bigger than the memory that has been laid out for one particular match stage. Therefore, the memory management of the chip needs to be such that we can create logical tables that span multiple physical stages. Each action processor performs actions on one or more fields in the packet header using the VLIW instruction set provided by the chip. Because each action processor takes less than a square millimeter of area on the chip, the processing pipeline can afford many action processors for each stage potentially resulting in hundreds of action processors across the chip pipeline. This architecture permits a very flexible match-action based programmable data-plane for only about a 15% overhead and chip area. However, the data plane is still based on performing sequences of match and action. In practice, network operators may wish to perform increasingly complex and sophisticated operations on streams of packets such as on the fly transcoding or encryption. To perform these more complex operations we need to place more sophisticated packet processing in the data-plane. Of course, customized hardware can do this, but what if we wanted to support this with a programmable data-plane? That's the idea behind SwitchBlade, which is a programmable, modularizable FGPA-based data plane. The main idea behind SwitchBlade is to identify modular hardware building blocks that can implement a variety of data-plane functions and then allow a developer to enable or disable these building blocks and connect them in a hardware pipeline using high level software programming managers. Potentially, we may also want to allow custom data-planes to operate in parallel on the same hardware. For example, if we had specific traffic flows that needed to be transcoded or encrypted, we might only want to apply that on a subset of the traffic or traffic coming in on specific virtual interfaces. So we'd like the hardware to support that type of virtualization as well. In other words, we'd like to get the advantages of both hardware and software with minimal overhead. SwitchBlade pushes custom forwarding planes into programmable hardware. You might have a programmable software router like Click, running in one or more virtual environments on a hardware platform. Instead of having all of that forwarding take place in software, we'd like to push that forwarding function, some of which may be custom, down into the hardware. We'd also like to have multiple virtual data planes, each of which supports different custom packet processing pipelines. The first stage in the SwitchBlade pipeline is the virtual data plane selection. That is when traffic arrives we need to determine which virtual data plane or packet processing pipeline the traffic should be directed to. The idea is that SwitchBlade should support separate packet processing pipelines, lookup tables and forwarding modules for each of these virtual data planes. A table in memory maps the source MAC address on the incoming packet to the virtual data plane identifier based on that virtual data plane that the packet is mapped to. Switchblade then attaches a 64-bit platform header that controls the functions that may be performed in later stages. The header can also be controlled from high level software programs using a register interface. SwitchBlade does have a traffic shaping step, but we will not talk about that step in this lesson. Let's proceed to the step where SwitchBlade performs preprocessing on the traffic. SwitchBlade selects processing functions at this step from a library of reusable modules that have already been synthesized in the programmable hardware. The preprocessor thus allows a programmer to quickly customize the packet processing pipeline without needing to re-synthesize or re-program functions using a hardware description language. Rather, the programmer can control everything from a high level programming language. We've shown how this library of reusable modules can be used to implement a variety of custom data planes including a multi-path routing protocol called Path Splicing, IPv6 and OpenFlow. The preprocessor hashes custom bits in the packet header and then inserts the value of that hash into the SwitchBlade platform header. The ability to select custom bits from the packet header to create that hash platform header is what allows SwitchBlade to perform custom processing and forwarding decisions based on arbitrary bits in the packet header. One example of a protocol that can be implemented using SwitchBlade is OpenFlow, built a limited implementation of OpenFlow with no matching based on VLANs and no wildcards. The preprocessing steps are quite simple. The preprocessor essentially parses the packet and extracts the relevant tuples corresponding to the OpenFlow flowspace. It then passes those bits to the hashing module in the SwitchBlade preprocessor, which outputs a 32-bit hash value that controls both packet processing and forwarding decisions. Adding new modules in SwitchBlade, of course, requires Verilog programming but it is possible. The idea is that synthesizing or adding new modules should not be that frequent based on our intuition that many custom data plane operations can be performed with relatively few hardware primitives in the data plane. Forwarding consists of three steps. There's an output port lookup process, which performs custom forwarding, depending on the bits that have been set in the platform header. Wrapper modules allow matching to be performed on custom bit offsets. And, custom post processors allow other functions to be enabled or disabled on the fly. SwitchBlade also provides the capability to throw software exceptions. So if a programmer wants a particular packet operation to be performed, but the hardware modules do not support it, the programmer can specify that some packets be redirected to the CPU. Those packets are passed to the CPU with the virtual data plane identifier and the SwitchBlade platform header, which allows for software exceptions to be executed that are specific to that packets virtual data plane. This combination of virtual data planes and custom postprocessing allows SwitchBlade to perform different packet processing operations depending on the type of packet that arrives. So an IPv6 packet might be subject to a number of operations such as a TTL decrement whereas a layer two OpenFlow packet might be simply directed straight from forwarding logic to the output queues. Other custom protocols, like path splicing, might also be passed through a custom set of post processing modules that are selected from the pre-synthesized modules. In summary, another way to make programmable data planes scale is to make hardware more programmable. We've built on the insight that if we optimize a few primitives and provide the ability to compose those primitives, then the hardware data plane can in fact be quite simple and yet extremely flexible. We've seen two such examples of programmable hardware data planes that build on this insight. One is the OpenFlow Chip and the other is SwitchBlade.