Okay, today we're talking with Brighten Godfrey who's a professor at University of Illinois, Urbana-Champagne, in the Computer Science department. He also has a startup, Veriflow Systems that's in stealth mode. And we're going to talk today with Brighten about network verification. So he's done a lot of work in this area, dating back many years, and we want to learn about a little bit about both his research and what's the latest and greatest in network verification. So thanks Brighten for chatting today. >> Sure. My pleasure. Thanks for the invitation to come chat. Cool. So I think there's been a lot of interest in formal verification of networks, I think in SDNs in particular. It seems to be at least in networking research, it's kind of one of the hot topics you might say these days. Of course debugging networks is nothing new, and operators have their favorite tools for doing this already. You know OPNET, and various simulators, and trial and error, I guess you might say. But I think in your research certainly, you've definitely thought about bringing formal techniques to bear off of these kinds of problems. So, I assume one of your goals is to kind of eventually bring these formal verification techniques to its operation and practice. Can you talk just a little bit about how you came to start working on this problem and the techniques that you've developed in your research. How those sort of applied operations, if and how those kinds of things are sort of making their way into practice. Yeah, sure. So, this research goes back in our group to really around 2010. And of course, we're building on work in the research community that's in the area of debugging, and checking and verification of other parts of networks that dates back long before then. So I think, the way we started looking at it was, how would you be able to provide assurance on what's actually happening in the network. That is, how do you see, there's moving from some of the existing tools that are more simulation oriented. Like let me simulate what's going to happen to this packet or to that packet. How do you go from there to something that's really more like formal verification? They give you, that explores all possible behaviors in a network and give you a confident answer about what's really happening. So, that was kind of the start of the philosophy was to see as close as possible to what's really happening in the networking. And that led to this approach, which was the new aspect in our original work was what we now call beta playing verification where we're picking out information that's close as possible to the hardware instructions basically that control packet processing throughout the network. So that's where this kind of verification started. I think what's new and what's exciting about it is that, as I said, it's verification. You can explore all possible behaviors, so, there's an art to, I think, like finding the right domain where you can actually do that, right? The idea of figuring out what software does goes back many decades in computer science, but for general purpose software, it's a very complex system. So, if you can find a domain like networking, like data flow with the networking where you really can, actually, it's feasible to explore that behavior and get formal verification, that's an exciting thing. So, I think, in terms of how it becomes operationalized, that's something that we've been pushing on from the beginning. And I should say that my comments here are in my role as a professor on a research side, since the startup is in stealth mode. And on the research side, for example, we worked with an operational network, did verification on the backbone. And this is a backbone of the network that's supporting around 100 thousand machines. And we didn't really know what we'd find when we first started that, because I didn't have very much operational experience, where you actually find problems in networks using these formal techniques. And so there were in fact problems that we could pick out. Root detection is maybe an easy one that comes to mind, but then you move on to properties like consistency checks and so on, that can actually figure out real operational problems. So, I think, another good example of how this area of network verification moves towards operations is, after some of the early research work on this technique of data plane verification, there's been some great work at Microsoft Research to operationalize it. And, that includes work that was led by Nuno Lopez, published in NSDI just a few months ago, describing the systems they built, the data plane systems they've built as described in use at Microsoft's Azure data centers. Interesting. Actually one of the things that you sort of talk about a little bit just now, was scaling. And sort of the I guess difficulties that some of the other techniques, simulation and other sort of per packet kind of properties have as far as just scaling difficulties. Of course, some formal verification techniques also have scaling difficulty if they're exhaustive. Right? But of course there's been a lot of work trying to get those techniques to scale. Are you finding that things that people have done, tricks that formal methods folks have discovered to get these techniques to scale are applying well in the networking setting? You know, I think some of them either apply or there are analogies. And then there are other areas where it's more network specific, right. So, if you look at what we did in Veriflow, you can squint at it and it's pretty similar to a kind of symbolic execution. Going back even further, actually where we started with the original work, which is this project called Anteater. Before we moved onto the Veriflow research project, the earlier research project Anteater, we actually started with and we Represented. There's actually some work there, and there's some earlier work than ours, that was actually taking networks and representing it in terms of Boolean formulas, right? There's early work that does that. That was work by Geoffrey Xie and Jen Rexford. They were both on that work. So, in our anteater work, we represented networks including packet forwarding and transformations to packets and how packets slow multi through the networks. And you know, thought okay, how do you code that into a sat formula to check it properly, right? Where satisfiability is you know this lower level representation of, is a Boolean formula involving true and false values and constraints between them. Can it evaluate to true? So, there's definitely concepts that transfer and then there's other concepts that are pretty network specific like for example, if you look at VeriFlow. The base of computation in it is graph orientated, right? We have the topology of the network, and we're traversing this topology that in a way that understands what the network does. >> Interesting. So, I guess with regard to that are there certain >> Is there's certain types of properties that lend themselves pretty well to that kind of checking with SAT solvers? I think like some of your research work is focused on reachability properties of certain kinds. Maybe there are others as well that you wanted to mention, but I'm curious too as far as operationalizing some of these things that you mentioned. Sort of trying these things on the operational network. Are there certain things that you've found mapped very well to what operators wanted to check, as far as the formal techniques go? And I guess the other side of that is are there gaps? Are there things that the operators really wanted to check? That just don't map quite as well to the formal techniques or how do you see it? >> Yeah, that's a great question. It's potentially a big and interesting question. So things that map well. So the basic thing we're doing in data plane verification is modeling how data's going to flow through the network, all possible ways data can flow through the network. So, reachability properties are, of course, one, so there are security properties. For example, taking some examples from the Microsoft work, you have a Cloud data center. Can my customers VMs access my SDN controllers? They shouldn't be able to reach the control network or the management network. Are my customer VMs actually able to reach the other customer VMs that they're supposed to reach, right? So some of these reachability properties, that would be the most basic ones. But then you can certainly go beyond that, even just within stock data plan verification, because you're modeling how data can flow through. So you can ask questions like, are there any unusually long paths in the network? Are there loops in the network? Are there inconsistencies in data flow? So, for example, you know, take two routers or two back up routers, two kind of components of the system that should have similar properties. And do they actually have similar properties? Do they, you know, if one of my paths from one of my core routers flows through a firewall and the other one doesn't flow through a firewall, for a specific kind of traffic, that might be you know something suspicious that I want to look at all right. So, those are a few examples of understanding overall how data flows through the network. And it, I think, goes beyond just reachability. Like, do you terminate at the right destination, to what happens in between, right? And looking across multiple paths and seeing if there are consistency issues and so on. So another example of that might be, if there's traffic that's supposed to remain localized in one region, does it flow outside of that region? And that could be a performance issue. It could be a security issue as well. Okay, now, what doesn't fit so well in there and there are some interesting areas. Some of them are becoming really fascinating areas of research now. One area that doesn't fit so well within the published work on network verification is, that's a real active need in industry is performance problems. So, there's some performance left in an application. Experienced by a user, why did that happen and where? Is it the network's fault, where in the network, right? And it can be something that's very dynamic. Something that could be caused by many different system components, not just the data plane forwarding layer, but it could be involved. Right? So how that fits into verification is an interesting question. And then another direction I think, that's pretty fascinating is not so much on the side of what question you ask but what you're asking it about. That is, what of the network are you trying to model? And where the research work began was sort of forwarding properties, routing and you know IP, great layer two through layer four forwarding properties of the network. But now imagine, that we're getting more and more towards network functions incorporated into the models. So, you know, as you move in that direction, you're taking steps that move more and more towards, it starts looking more like verification of general purpose software, right? Not necessarily all the way in that direction, but you're taking steps in that direction as you move towards this trend of network functions being viewed as part of the network. And so there's some research that's starting to push in that direction now. And an example would be Katerina Argyraki's work on verifying software router implementations. Where you're actually looking at code implementing a software data plane. Or another example is student, is leading this work on verifying properties that are kind of more stateful in the network, and incorporate data planning functions. So, those are two directions, where I think this area of verification that's more on the forefront of the research. >> I'm glad you brought up the network function stuff. I was going to ask, actually. I mean, I think you're right. It seems like, as you get towards placing these functions in the network, you get to almost general purpose software verification. In that work that you mentioned, are there limitations that people are, like constraints that people are placing on what those functions can do that make the verification problems more tractable? Or I mean what do think are the sort of limitations there, or what's achievable, actually? >> Yeah, I think that's really very much an open question of what is achievable, because you can imagine pushing incrementally more and more in that direction. So I think I'm not going to talk too much in detail about those specific projects. Just so I don't say anything that's incorrect but I think there are, for example the paper that I had in mind on software data plane verification focuses on single box at a time, right? Which is, I think, a very cool achievement even there, right? So, I think as you do more and more stateful processing, more and more intelligent processing, and then moving towards verifying the actual SDN control programs themselves, which is another direction it's being pushed on, that get's really fascinating. I think it's, I don't want to say what we can't do, because then I'll be surprised when someone does it. [LAUGH] >> No, definitely, definitely. So just a little bit more about your own work too, like on the research front. Like, you focused a lot on checking properties of the data plane, snapshots and things like that. And you kind of mentioned for example, performance being a gap that's yet to be filled. But I wonder also, you can check the data plane snapshot with the formal approach. And I think in the course too, I talk about heterospace analysis, which data plane properties as well. But I think one thing that's not immediately apparent there is, actually a couple things. One is like, the control plane had to basically do something to put the data plane state there. And if your formal verification is wrong, you might actually want to know, what happened that caused it to be wrong? Like what's wrong with my control program? >> Yup, yup. >> And so what's going on and what are your thoughts on that actually? I mean, what are kind of the sort of limits of the analysis of data plane snapshots and what's going on, if anything, in kind of control plane verification? And particularly, it seems you can't do that in isolation either, because you mention Katarina's work on verifying software router implementation, that seems related to this a little bit. What are kind of the limitations of data plane snapshot verification and are people yet looking at how that ties to control plane [CROSSTALK] and things like that? >> Yeah, so I'm glad you asked me, because that kind of puts the approach in context. Because the name that this approach has taken, data plane verification, is complementary to control plane verification. And there was earlier work, in fact, for example, your work on verifying BGP configurations from what, I think ten years ago now, can tell you, all right, is this configuration correct in at least some aspects, before you push it out. And I think that's complementary to the approach that we took that's pushing in the direction of, how do we see as close as possible to what's really happening in the network, and get the maximum confidence? And these are complementary, right? Because if you, even if you're sure that a config is right, it's going to be processed through the, you hand it off to the control software, which then runs distributive protocols and is, itself, very complex. And maybe you're not modeling what it's doing correctly, or there's some bug even within that software, right. Or maybe it's something simple, the config doesn't get pushed out to the device correctly. There's this trade off between checking the configs versus checking something closer to what's really happening. So you'd like to get the best of both worlds, which is ideal. >> I was going to say, is there a way to do both? Or, both in isolation but I don't know if that gets you- >> Can you both check the configs before they run and also see something that's very confident about the live running behavior? So- >> Yeah, and I guess to complement that, could you look at your data plane snapshots and if you see something that violates a property there, can you go back to your control plane and figure out maybe what caused it? >> Right, yeah, those are really interesting questions. I think one way to do this is through simulating the control behavior, right? And so this is kind of an intermediate point, it's interesting, right? So you can simulate control behavior, and then hand that off to a data plane verifier to get a very confident statement about what was the result of that particular control plane behavior. Right so, maybe for example, you simulate some link failure scenarios. What's going to happen if I have an outage? And then if I have an outage, [COUGH] is my control software going to behave correctly and actually produce data plane behavior that has the correct forwarding properties, the correct multipaths, the correct reliability, and so on. So you can kind of do a hybrid there. And this is an approach, so another paper from Microsoft Research recently, the name of that project is Batfish, where they're taking this approach. Where they're actually saying all right, we want, in fact, to understand the configs. So we're going to build a piece of software that can translate the configs through, I would call it a kind of simulation, can translate the configs down to data plane state, and then run a data plane verifier on that. Right, so this kind of two-step process. And then you can do the traceback of okay, what was the cause of this behavior? And there are some other ways to do that without doing a full simulation. But that's one general approach to link the configuration root cause with the data plane, which is the ultimate forwarding behavior, what's happening to the live traffic that you care about. Now, this is interesting actually within the context of SDN, because In SDN, you can do this more easily right. You can- >> Right. >> You have a, there's a defined behavior for what the you know the open protocol for example is supposed to do. >> Yeah. Great point yeah otherwise you gotta infer like what these messy control plane protocols are doing. >> Right you can it's basically much easier to >> To simulate your control software because you don't have to simulate it, you can just run it directly. >> Right. >> And you may have talked about some of these advantages in your class already, and your students are using Mininet, right, where you can take the same live >> Control software configurations and control software. Exact same code and one that lies but in an emulated environment and then you can imagine doing data plane verification within that environment. >> Right exactly and that's one of the points I raised about is like >> One of the interesting things I think, from an operational perspective, is you can run something in an emulated environment and your control plane could be exactly the same code as what you're going to deploy on real switches. >> That's right. Yeah, you can do a kind of unit testing or integration testing of networks in this way. >> That seems like a good way to kind of do some amount of checking in advance. I guess that's another kind of question I had for you is It, what's your senses like, what's possible to do, kind of ahead of time or in advance, as opposed to at one time? I mean it seems like the data plan snapshot stuff actually now, probably we we're just talking about. You could do a lot of that data plan snapshot checking, or some of it you could do in advance, right, you could sort of Run something in Mininet. See what the data plane state is in your emulated network and then run your data plane verification, your formal checks on a data plane in an emulated environment. Like how much of that do you think applies to sort of check things in advance versus like Are you kind of, do you think the more appropriate approach is to focus on the runtime if you will and really pull the data point state off of switches to be verification? >> I mean I think if I were running a network I would want to do both. >> Right. Actually I haven't seen too many people doing both or at least I'm not aware of it. Maybe you know. Well no, I don't disagree with you. I think that the ability to Really almost deploy your control software but in this emulated environment first. This is one of the big advantages of SDN, right? It makes that easier. Another project I mentioned that. That let's you do to that to some extent is Cisco's VIRL project, V-I-R-L, where you can run an emulated network effectively. So it's not necessarily tied to And you could say OPNET is trying to do a similar thing, although it's not quite emulation, it's simulation. But yeah, I want to get something that's very close to what's going to happen in reality, and test that up front. Just like when I'm building a big software project If I check in code, it's gotta be tested first right. It's going to run through unit test and integration tests before it gets pushed out. >> Yeah. >> You know even before it gets pushed out to other developers >> Yeah. >> let alone pushed out to deployment. >> That's a great I mean that's a great analogy. How far do you think we are from like a unit test for network config or what you think that would look like even. I mean it's It sort of seems like we're talking about, maybe there's some someone writes, a piece of control software and then run it, a unit tests an emulation, and then something like that. You think we're close to something like that? >> Well, I think that's where we want to go. And I think There's some good progress. I think probably the most challenging thing there is that we're actually very far from having such a clean SDN employed environment that you really could emulate at all. And for the forseeable future, most networks are not just SDN. And maybe they're not even one kind of SDN right. Maybe you will use different tools for different parts of the job and you know it's going to be a pretty messy situation for the foreseeable future, for years certainly. Even though we can see the vision of where we're going to, where the behavior is well defined, the control software is separated from the simpler, lower level hardware functions. And with that separation there's at least a possibility that if you take that approach and you push it further Towards deployment, and towards more complete deployment then you really could get to that point of unit testing, and integration testing in advance. But I think that's probably the biggest challenge is that we really don't have SDN everywhere. >> Right. Yeah the networks are messy enough that they're just too tough to emulate as is. Actually there was another So the question I wanted to raise, or a topic I wanted to raise was kind of the nitty gritties of the real world. Which is a lot of these things look nice and clean, particularly if you can assume SDNs, but then you start looking at real networks and there's just a ton of legacy stuff and Anywhere from like a little bit of SDN on the side to no SDN. Or, oh yeah, we're looking at that but we're kind of looking at our end-of-life cycles, and things like that. So, I'm wondering, you mentioned the operational network that you worked on, and in your exposure in general in. So if to the industry, how much were you kind of faced with people or operators who, what's the spectrum here, how many people are just sort of saying like, yeah here's my legacy nyetwork and I just want you to check it, in which case maybe something like the data point snapshot stuff might be Really apropos because the data plane is there, you can snag it independently of what's going on above. Versus like, I mean, how many people are saying, yeah, I'm half SDN or [LAUGH] >> Right. >> Or have you found people that are like, no, I really want to check my network and this stuff is just too hard, so I'm going to basically. Sort of migrating 100% to SDN and I think I've heard that from at least one operator I've talked to so I'm wondering what you've heard. >> Well you know I think you know I think you're right that there's. I mean really traditional networks are pervasive. Right, and even if you have a few non-SDN boxes in your network you don't know what they're going to do if you're assuming everything is OpenFlow. Right? It's just, it wouldn't work. So it's interesting actually because when we started this verification work The first phase of the project, which was doing data plane verification offline, like grab a snapshot of the network and just spend some time checking it. We worked with an operational network and that was completely traditional, right? It was a mix of about five different vendors with traditional routing and switching gear. And when we started we weren't even really thinking about SDN at that point. And so I think that need has continued, the second phase of the research projects in our group with myself and Matt Cesar and Ahmed Kerscher and others. Then move towards the SDN world because there are exciting opportunities there, right? Because you have a much more well defined model of what the hardware is doing. Makes it much easier to get data and model that data. And there's centralized control, right? So, you can watch what the controller is doing, you can get incremental changes, you can verify incrementally, and then, with the right algorithms. And this is another example of something that's kind of network specific, rather than off the shelf formal methods techniques. You get to these changes for you know control changes from the SDN controller and you can check just in those incremental changes and push the latency down to milliseconds. So, that's a pretty cool capability. It's something you can imagine will even be part of SDN controllers in the future. So you verify before you push the controller, verify it before it pushes out any change to the network. So, I think there's a lot of exciting capabilities there that you get form SDN, but at the same time, you can apply this concept to traditional networks. And it ends up being more work, because there's a range of devices that you have to understand. >> Right. >> But there's a lot of need there, just because most of the world and a lot of the complexity is in traditional gear. Actually you mention another thing too, which I thought to ask as well. Is just sort of the ability to kind of do incremental testing, and that seems sort of incredibly important. I think a lot of the formal verification stuff, that I've read at least, maybe you know of some newer work too, but a lot of it is comprehensive, right? You'll basically take the existing state and run some exhaustive checks or tests on that state. But, certainly in software testing there's these notions of regression tests, and behavioral regression, and things like that. How much are you seeing, like those ideas applied to verification? because it seems like that could really help scaling as well like. And also I think another thing there is, operators may not have a lot of experience writing down sort of formal properties but they might say, yeah, things are kind of like basically working right now. And I want to make sure that the next thing I do isn't going to break everything. And I don't really know what that means, but I know that things were kind of mostly okay right now. I mean is there a notion of that. I mean it's not really the way that formal methods folks typically think, but it seems to me like operationally that might be a useful thing. I agree. I think this is a little bit similar to the idea of consistency checking, which we used in one of our projects and also as in the Microsoft work. You can think about consistency checking across devices in similar roles like do my core routers all have the same behavior in some sense. Or you can think about it across time, right? Like, okay, the behavior across time if I change something. Are the, you know, is the overall network behavior consistent across time or is it, did it change in the way I expected? So just at a high level, I agree that that's useful. And then so the second point you brought up is I think really fascinating. So, it's the act of writing down what the network should be doing might be non-trivial. >> Right, right. >> And this is a really interesting area. So, you know, how do I express what the network should do? Right? And there's a lot of ways to attack this problem. And you know, kind of higher level policies are one. Consistency is one where I don't have to expressly say everything that happens, I can just say, tell me if there's anything weird. >> [LAUGH] Exactly. Yeah. >> But I think it's actually bigger than that. This is a question for SDN controllers as a whole. I mean in my opinion, the biggest area of SDN that still remains to be resolved collectively by the community is great, now we can program networks. How do we want to program them? Right? >> Yeah, yeah. So basically. >> We can probably express the. >> Doesn't say how to do anything, right? >> Right, right, and we're talking about it in this discussion in terms of verifying. What do you want to check for? What do you want to verify? But it's really almost the same question as, how do I tell the network what I want it to do? And so how do I represent those higher level goals or intent that the operator has? How do I express that? And, you know, is there one controller that sort of handles the whole space of everything I want to express, all the intents I want to express. How do I represent that conveniently or are there different kinds of controllers that are attacking you know the killer app four white area. Rounding optimization might be different than the controller for the killer app of virtualized data centers, for example. Right? So I think that whole space of how do you express what the network should be doing is, I think it's fair to say that that's very unresolved in both the research community and in industry. Interesting, cool. Well I think we touched on a ton of open areas. That's usually a question I kind of ask to wrap up. I mean you mentioned performance. We talked about sort of control data plane interaction. We just talked about a whole another area sort of you know policy expression and incremental kind of verification. Seems like, there's a ton left to do. I mean, I guess before we finish, is there any other topics you think deserve attention that we didn't touch on, you know, students, or whoever watching, might want to pay attention to? Let's see. That's a good question. >> Yeah. Even in terms of like unsolved problems or maybe new technologies that are coming that might crack open some of the things that we talked about. I think that's a really interesting space overall that's evolving very quickly. I think one interesting area we touched on is how do you push verification towards richer and richer models, whether that's software routers or network functions. And what's the connection between this and other trends, like white box switching or new kinds of forwarding hardware? Right, right, right. And how these affect our conception of what SDN is. That's something we didn't really touch on at all but I guess maybe next year, or so, when the picture's more clear we can talk again about all this. The students looked at programmable hardware like and stuff like that. I think people are starting to think about verification there too but what does that whole thing mean for data plane state verification as something? >> That's a great question. You can imagine ways where it means that it's becomes more challenging. >> More degrees of freedom. >> Exactly, you can do more with the hardware, it again becomes one more way in which it becomes like general purpose software. You can also imagine that if we're moving towards a mindset where we really care about the network as being critical infrastructure and we want to have confidence in what it does. If we take that mindset as we push in this direction then there are opportunities too because SDN and white box hardware, it lets us see what's going on much more easily than we could before. That's the first thing you need to understand, you need to be able to get access to the information, and know what it means, and what it's going to do. I think that opportunity is there, so that means it's pretty exciting. Cool, thanks for your time today. I hope people appreciate your perspective, I'm sure they will. It's been fun. [SOUND] Sure. So thanks so much for the invitation, I really enjoyed it. Enjoy the course, and I had a great time chatting. Cool. Thanks man. All right. Bye bye. Bye.