[MUSIC]. So, here in Lecture 12.2, we're going to start our exploration of the logic side of timing analysis. And so, what do we need to know? We need to know what the basic assumptions are about the timing universe and so that first assumption is going to be synchronous logic. Things that leave storage elements, like flip flops, go through a big block of logic and return to storage elements like flip flops. And in this lecture, we're really just going to talk about a combinational side of things. next we have to talk about well, where's the delay? And the first thing we're going to have to talk about is, what do delay models for individual gates look like? And, one of the things that's a little bit surprising is they can be really complicated. And so we're going to talk about how complicated they can be but then we're going to restrict things to a sort of a simple, reasonably realistic universe of things that are, are, in a, in a commercially viable but, but simple enough that we can actually, actually do some examples. And then we are going to talk about the fact that in the actual way we do timing for logic, we stop looking at the logic. We stop looking at anything that looks like Boolean algebra. And we just look at these things as big complicated graphs. And so, we're going to talk about why, from a complexity point of view, we do something called a topological timing analysis, and not logical timing analysis. So, let's go start looking at sequential things, the combinational parts thereof, the delays through the gates, and topological timing analysis. >> So, when we say that we're interested in doing timing analysis at the logic level, what are, what are we actually talking about? Well, our goal is to verify the timing behavior of our logic design. So, here's the scenario. I give you a gate-level netlist and I give you some timing models of the gates and, maybe after the placement and the routing, I give you some timing models of the wires. And you have tools in place that can tell me the following answers. when signals arrive at various points in the network, the longest delay is through the gate level network. does the network satisfy timing requirements? So, suppose I tell you that I want this chip to run at 1 gigahertz, which means there's one nanosecond between the edges of the clock that control the flip flops. Is it the case that all of the logic, the combinational logic is such that, if a signal enters a block of combinational logic, it arrives not longer than one nanosecond later. All right. That's the kind of questions I want to know. And if we do this analysis on our logic and it turns out that the answer is, oh yeah, the logic's too slow I can't get all of the paths to the logic in 1 nanosecond. Some of them are 1.05 nanoseconds. where do I look? you know, modern design has millions and millions and millions of gates. It would be great if the analysis techniques come back and they pinpoint exactly where my problems are. And so, I'm going to show you some techniques that can answer all of those questions. And in particular, and maybe in a surprising way, answer the question exactly where's my problem? What should I go focus on to fix? Now, the thing that's unfortunate is that, the, it is the nature of the, the way you know the, the you know, the electrical and the physical models work that a lot of this delay stuff is just complicated in the real world. So, we're going to talk about that for a little bit. And we're going to, you know, talk about how we sort of simplify that for the purposes of this lecture. first, however, I just want to do a few acknowledgments. very early versions of this lecture used some material from, from my friends Karem Sakallah, who is now at the University of Michigan, and Tom Szymanski, who at the time was at AT&T Bell labs. and this version has been benefited extensively from inputs from my friend Dave Hathaway at IBM. Dave is actually the principal designer of Einstimer, which is IBM's production static timing tool. So, every, every processor, every ASIC, every big chip that gets that gets built by IBM runs through Dave's Einstimer tool, which is doing very, very sophisticated static timing analysis. And you know, the current version also benefited from versions of this lecture, these lectures, that were taught by, by John Cohn, my former Ph.D. student at IBM and Dave who were teaching this material to some, some folks at the University of Vermont, and also some folks at at IBM. So, lots of thanks to everybody for actually giving me lots of, lots of useful feedback, lots of useful criticisms very, very much help, the believe the quality of this lecture. I just want to acknowledge all of them for the help. So, lets talk about analyzing the, the performance of a design. So the first thing where i have to assume, this is really important is that the design is synchronous. And so, that means all of this storage is in explicit sequential elements. So, you know, things like flip flops. So, the simplest way to draw this sort of thing is that there is whole bunch of flip flops. and they are at, if you like the start of the combinational logic, the input. And then, there's a whole bunch of flip flops that are at the outputs of the combinatinal logic. And there's a common clock that's connecting all of those things. And you know, the clock edge comes along. And it, I'm going to write this carefully. And it launches the data out of the flip flops into the combinational logic and so, it goes through the delays. Right? It takes however much time it takes to get through the logic. And then it arrives at some flip flops, where we hope it is captured. And so, I'm just writing capture. And you know, we often draw the clock in you know, kind of a very special way. Right? So, there's just one cycle of the clock, you know. We often talk about you know, sort of the launch edge of the clock and the capture edge of the clock, assuming that we are actually talking about something like you know, a positive edge-triggered D flip flop. And although, this is a highly stylized kind of of a diagram, you know, please just, you know, be aware that, I mean, this just a finite state machine. Right? You know, logic and some I'm sorry, you know, flip flops and some logic that goes between the, the flip flops. It's not necessarily the case that the flip flops on the left are different than the flip flops on the right. I mean, you know, we really, you know, we really could just have a flip flop that I can draw over here, you know, with a D input and a Q output, Q output, you know, and a place where the clock goes in. And you know, the the output comes from Q and it goes to this, you know, cloud of logic here, you know, and it goes back in the D input. you know, those are the kinds of circuits we're actually talking about. We're just not be talking about any of the subtle timing things that happen right at the inputs or the output of the flip flop. I'll mention that again when we get to the end. There are ways of incorporating all of those all of those effects into the models that I'm showing you. We just really don't have time to talk about that stuff. So here's, you know? A question you're, you're very possibly thinking, if you haven't, you know, kind of encountered this kind of timing analysis in, in a, you know, in a real commercial ASIC design scenario. Can we just simulate this stuff? You know, we have great simulation tools. You know, we, we have you know, we have[INAUDIBLE] simulation tools, and we have VHDL simulation tools and we have SystemC and all these other, all these other great things. You know, if I want to know how fast the logic goes, can’t I just simulate it really, really, really hard? You know, and run a really, really, really lot of inputs into it and see how slow it is? and the problem is that, you know, what, what logic simulation does is it determines how the system will behave. it, it simulates the logical functions so, you know, it gives the most accurate answer when you have good simulation models. But it's practically impossible to give a complete answer especially with respect to timing. you know, in order to, to be really confident that I understand what the worst case delay of a big block of a few million gates is, I'm, I need some exponential number of inputs, because I don't want to just know that for all of the inputs I tried. You know, the delay is such and such. I need to know that under any possible scenario of inputs, the delay will never be longer than some number. And you just can't get that sort of a guarantee from simulation, you know, you need, you need a different kind of a technique. So there's no way I can ex-, examine all possible input vectors with all possible relative timing. And there's some, you know, nasty stuff that happens on the nanoscale with how you know, manufacturing imperfections change the timing behavior of transistors that are, you know, a couple hundred atoms across. we just need a whole different solution. So simulation is great. We rely on it for functional correctness. We cannot rely on it for this kind of timing. We need a whole different technology. And it needs to be, you know, not only just different, it needs to be fast because we're going to do this a lot. So, first the basic model for our timing is that we know something about the clock cycle, right? I need to know how fast this thing is supposed to run in order to understand if it's running fast enough or if it's, you know, got a problem and if it's slow. [COUGH]. So just, you know, concrete example. let's say I assert that the clock is 1 gigahertz, which means there is one nanosecond between the clock edges. And so, I'm just going to draw the little picture over here. So, here's my clock and you know there's a positive up going edge and then there's you know, it goes over, it goes down, it goes back up... And, you know, the difference between those clock edges is 1 nanosecond. And so, again, I've got my diagram of the, the kind of the logic that I'm looking to analyze. There's a bunch of flip flops going in on the left of this logic and a bunch of, of flip flops on the output of this logic. The flip flops on the left are launching data into the logic. The flip flops on the right are capturing data from the logic. Like I said before they, they might not be the different flip flops but this is just, you know, sort of conceptually a nice way to think about this. You know, what do I know. I know that for this logic to work successfully the longest delay through this network of logic must be shorter than 1 nano second. And so, I'm just going to you know, put a great big arrow, you know, over the top of the logic. I know that when things show up at the output of the flip flops, they better to be able to get through that big gray cloud of logic in less than 1 nanosecond, because 1 nanosecond later, the positive edge of the clock comes along again and grabs the output of that logic and captures it in the flip flops. So, I better be able to get through that logic in less than a nanosecond. That's the kind of question that we're going to answer. You give me a million gates of logic. You ask, so how fast can it go? You tell me actually, I'd like it to go at a gigahertz, please. I'll analyze it and I'll come back and I'll be able to tell you things like, Yes, I can qualify this, that all the paths are shorter than 1 nanosecond. Or, No this 35,000 paths are longer than one nanosecond and, by the way, here are your problem points. These are the places in your logic you really ought to go look. If you fix this, maybe you can fix the whole thing. So, that's what we're about here. what do we need to do this? Well the first thing we need are Gate Delay Models, right? So I've got you know, another picture of a cloud of logic here. And I just got a bunch of N Gates, kind of in a row. and I've also got some AND gates, where I've got the wires that are connecting the AND gates sort of sort of ball. I've got a little fan out here, one AND gate feeds a couple of AND gates. And, you know, at the top there's a big question mark that says, so, what's the network delay? Well, you know, in order to answer that sort of a question the, the, the most straight forward thing, first, they've got to be able to answer is, what's the delay of one gate? All righ? So, I've got one gate here that's sort of highlighted and I'm just gona say, well, let's say that the delay through that gate is a number and that number is delta. Right? That delta is you know, it's probably measured in picoseconds. You know thousands of a nanoseconds these days. And I'd like to be able to answer the question. You know, how, how long does it take to get through one gate? If you give me a million gates you know maybe I can figure it out. I can figure out how long it takes to get through one gate. And you might think, okay, how hard can that be? And gosh the answer is really just surprisingly hard, really surprisingly amazingly complex. So I'm going to give you a just sort of the high level tour of what's going on here without a lot of details. just so that we can get into the, you know, the interesting heart of the problem. Okay. So, you know, what matters when we're talking about logic delay. Well, the gate type affects the logic delay. Not all gates are created equal. So, I've got a picture a picture of an AND gate here and I've got a picture of an OR gate and the little picture says that the AND gate with the delta, the delay, is not equal to OR gate delta delay. That, that certainly makes some sense. You know, different gates have different transistor level electrical components, contents you know, you expect may be an inverter is pretty quick. As a gate, you expect maybe you know, a great big exclusive OR gate with a lot of transistor level contents is kind of slow. Maybe an And-Or-Inverter, an Or-And inverter is kind of slow. Yes, correct. All right. So, you know, what kind of a gate it is, that affects the type of delay. So, you know, you have a few thousand gates in your technology library. they've all got potentially different delays. Well, unfortunately, loading also affects the delay. And so, I've got a little picture here where I've got an AND gate driving an OR gate. And there's only exactly one output connected from the AND gate's output to the input. And on the right side, I've got the same AND gate connecting to the inputs on two OR gates. And so, you know, what's really going on here is there's some fan out. And, you know, one of the things that's just going to be true is that the AND gate driving not much electrical load is going to be faster and the AND gate driving more electrical load is going to be slower. All right. One of the things that happens is gates, you know, gates supply electrical current to, you know, kind of push the signal to the, you know, to the gate at the output or, you know, the sink current from the gate at the output. And, you know, if there's sort of more stuff happening you have more electrical load and you know, it's probably slower. So, that's actually in effect, that's both a logical effect, a logic level effect, if you will. more gates and also a wire effect because, you know, the wire actually contributes to that electrical loading. So, you gotta know what you're driving. And you even need to, to do this really well, you need to know what the wire looks like. the waveform shape effects the delay. It would be terrific if all logic signals really looked like you know, textbooks where they're all perfect little waveforms that go up and down from 0 to 1 in no time. But that's simply false, you know, these things are actually electrical signals and they have a rise time or a fall time. And sometimes the rise time is sharp, right or, you know, rapid. So, I'm just going to write sharp on the left AND gate that has a very quick sort of a high slope. and, you know, I'm going to write just a slow, on the slow kind of lazy looking waveform that takes a long time to go from a zero to a one. and all, you know, I'm going to tell you is the gate on the left is going to be faster than the gate on the right is going to be slower. Which is to say, that the amount of time it takes the output to go from one logic value to the other will be lower just because of the speed at which the you know the waveform is making its transition. And so, real-timing analyzers actually model stuff like this. you know that adds some electrical complexity, we're not going to talk about that. So, for us we're just, we're just going to ignore that stuff. you know, it just keeps going in terms of, you know, effects. the transition direction affects the delay, right? So, if the output of, I've got again an AND gate on the left and an And Gate on the right. if the output is rising, as I'm showing on the left, the the slope, the, the waveform went from zero to one, versus the waveform in this case the right falling you know, the amount of time it takes to go from a zero to a one might be different than the amount of time it takes to go from a one to a zero. Why is that? Because, you know, we make tran, we make logic gates out of Complementary MOS, CMOS, and you know P type transistors aren't the same as anti-transistors in terms of their, you know, kind of fundamental speed. So, there's just some asymmetry stuff that's happening here. And so, you know, whether you're going rising or falling. And this gets even worse depending on if you have complex gate types. So, you know the way an inverter behaves is pretty simple the way complex gates like exclusive OR gates behave is even, even worse. All right. So, there's just, you know, yet more, yet more complexity there. there's even more stuff that you know, I'm just trying to sensitize you to, even though, we're not going to treat all of it. which pin you're talking about actually affects the delay. So, I've gotta, we got two AND gates here, and the AND gate on the left, it appears to be that a logic signal is going into the top pin of two pins. And I'm just going to write a little kind of a squiggly line, you know to the output. And then on the AND gate on the right side the input seems to be going into the bottom pin and I'm going to write a little squiggly line going to the output. And, you know, the delay of the top pin moves might be different than the delay if the bottom pin moves. All right? So, you know, which pin is actually experiencing the logic transition event can affect how slow the gate actually appears. And, you know, the reason for that, I'm just drawing a little picture here, is that different transistor level circuit paths exist you know to the output. So, I'm drawing a little picture of an AND gate here. So, you know, there's a power rail that is going horizontally at the top and a power rail horizontally at the bottom. And we're just going to say 1 volt is a logic 1 at the top. So, there's two p-type transistors in parallel going from the, the 1 volt signal and then they, they connect at the bottom. And then, there's a two n-type transistors in series at the bottom, and the, the top two transistors are, you know, having inputs A and B. The bottom two transistors have A, A's on the top, B's on the bottom. You know, A's closer to the output. B's not so close to the output. And so, you know, you get an asymmetric you know, change in the delay depending on whether A is switching or B is switching. And you know, things get even worse when you have some slightly more complicated gates. So, which pin is actually seeing a logic change actually affects things. It's not really the case that a logic gate as a kind of a macroscopic hole has a delay. The way these things are really modeled, we're going to see this very shortly, is that it's really which pin comes in going to the output. That's the delay. And, just to even further you know, scare you, I'd say it, at the nano scale, delays are really statistical. they're not even really deterministic. So it's, it's really not accurate to say that the delay is, you know, 10 picoseconds, or something. You know, what you can really say is that the delay has a mean of 10 picoseconds, and it has a standard deviation in the statistical sense of, you know, so many picoseconds. And, you know, the problem on this is that you know, you're manufacturing things that are, you know, measured in, you know, kind of atomic distances and you know, you're mesh/g, you're making millions and millions of those on the surface of a chip. And your making millions and millions of chips. They don't all come out with the same value. I mean it'd be great if they all came out with you know 102.6 picoseconds, but they don't. I mean, they come out as I'm showing you here with something like a probability distribution. So, there is a most probable delay. Right? You know, there is a mean value. And I'm just going to say, I'm going to draw a mean here, right? At the middle of the Gaussian distribution here. So, there is an average delay. There is a most likely delay. but to be honest, when you're looking at, you know, successfully manufacturing millions and millions of logic gates. And then, successfully fabricating millions and millions of chips. For things like longest path analysis, you don't actually care so much about the mean. You care about something like the 3-Sigma point, you know? You want to know, of all those millions and millions of gates that you put on a chip. And of the millions and millions and millions of chip that you. Chips that you potentially fabricate every month. What is the slowest thing that's likely to happen? Right? And that might be say, the 3-Sigma point on the, on the distribution. And just to make things even more horrible the delays on the logic gates are not independent. So if you remember back to basic probability, you know, you roll one dice, you roll another dice, they're independent. You know, the fact that you got a six on the first one doesn't mean, you know, that you're going to get a six on the second one. sorry, you know, you build some logic gates and they're actually really physically close to each other on the surface of the chip. They are probably dependent, which is to say they are statistically correlated, which means that if you know something about one gate's delay you know a little bit about the other gate's delay. And you actually have to model that stuff in the statistics. So, there's a whole other kind of timing analysis, statistical timing analysis that takes into this, takes into account this stuff. Very interesting math on that stuff. Unfortunately, don't have time. So, what are we going to do? we gotta make some forward progress. So, we're going to use the most simple, useful, practical model of delay, which is at the laser fix, their constants, and they are modeled as Pin-to-Pin. So, you look at the input pin on the gate, you look at the output pin on the gate and there is delay from the input to the output. So, there's no slopes, there's no electricity, there's no statistical distributions. And if there are any loading effects, we just sort of shove it back inside the gate itself. Then, it turns out this is enough, this is enough to see almost all of the interesting stuff that happens in timing analysis. So, we've got a little picture here of an AND gate. It's got two inputs. It's driving one additional gate, which is kind of a dotted, dotted line AND gate. And there are two blue lines, right? And the blue lines are going from each of the two inputs on the AND gate to the output. And they each say delta equals 3. And so, what we're going to say is, look. The delay from one of the input pins to the output is 3 units of whatever. And just to keep our lives simple in the examples I'm going to show, gates always have the same delay from all of their pins. But in the real world, they don't have to. And none of the techniques I'm showing you require this. This just keeps the examples simple. And similarly, if I had this same NAND ga-, the same AND gate but I now connected it to more load. And so, now I'm showing it with a big wire. With a whole bunch of fanout, a bunch of dotted AND gate with a dot, dot, dot, saying, wow. It looks like there's a lot of load on this. Instead of saying that the delta is 3 for the delay through this gate, I'm saying that delta is 4.1. And whatever the appropriate units are, again, from the input pin to the output pin but it's a constant for each pin to pin edge and it's just one number for the gate for simplicity. So, this is a pin to pin delay model. This is pretty realistic. the only thing I'm not doing here is I'm not putting the complexity on the individual edges in this model. So, that's what our gate model looks like. We can, we can work with this. I've got another question. Do I actually consider the logical function of the networks that I am analyzing? And, you know, there's a kind of an interesting question here, which is, which is, does this matter? well, let's go take a look. So, I'm, I'm showing you here I guess, a logic network. But I've erased the gates. And so, what that means is that, instead of showing you AND's and OR's, and AND's and NOR's, and XOR's and XNOR's, I'm showing you circles. And they're all gray. Okay? So, there's three primary inputs on the left and one primary output on the right. And the two primary inputs on the left go into two circles. One says delta is 8, and one says delta is 1. And those two things both converge on a circle that says delta is 2. And that circle fans out to two more circles. One of them's delta is 8, and one them's delta is 1. And those converge again to an output that's, to a gate that says delta is 2. And those two gates that have converging inputs, that they both say delta is two, one of them is connected to the third primary input. And the other is connected to whatever the primary, whatever happens to the primary input when it goes through a circle that says delta equals 1. So, it's a very simple network. And, you know, let's ask a question. how fast do we think this thing goes? this is not a really hard question to answer. All right. So how fast do we think this things goes? Well, if I stare at this and ask what do I think is the longest path? I think the longest path is from the top left primary input through the 8 delay, and then down to the 2 delay, and then up to the 8 delay, and then down to the 2 delay. So, I think this thing's longest delay is 8 plus 2 plus 8 plus 2 is 20 and I just don't see any other way that I could have another answer. Okay. this doesn't look very hard, why am I, why am I focusing on this? Okay. Let's stop erasing the logic gates. Let's put back some of the logic level identity. And I'll tell you that three of those gray circles are actually logic. So, again, two primary inputs going to two gates with delays 8 and 1 in parallel. And you don't need to know what they are for this example. And they converge just as they did before. But now the thing that they converge on is a two-to-one multiplexer. I know what it is. It's delay is still 2, but the 8 gate goes into this zero input on the mux and the 1 gate, delay 1 gate, goes into the 1 input on the mux. And the select line, I'm just going to write an S here. So, were clear the select line comes from the third primary input. Now, the output of that multiplexer goes again into fans out into a gate with a delay of 8, a gate with a delay of 1. And we don't know what they are, we don't need to, but those things reconverge on the inputs to another multiplexer. A two-to-one multiplexer. And the 8 gate also goes into the zero input, and the 1 gate goes into the 1 input. And again, there's a select line, and the select line is now the complement of the primary input that, that connects to the other mux. So, in other words, whatever controls the leftmost mux, you invert that and that controls the rightmost mux. Okay? Now I'm going to highlight the, what I thought was critical path from the previous slide, and so it goes from the top primary input through the delta equals 8 gate, through the 0 input of the 2 to 1 mux, through the delta equals 8 gate through the 0 input on the two-to-one mux and they are what I said I thought the delay was. And here we see a problem because one of the things that's true is that I have just asserted that the delay. The worst case delay goes through the zero input, on one of those multiplexers. Okay, then the select line on that multiplexer has got to be a zero. And I have asserted that the same critical path goes through the zero input on the second multiplexer. Which means the select line on that multiplexer has also got to be a zero. Which immediately creates a problem because if we stare at this logic a little bit, what we see is that the select line on the right multiplexer is the complement of the select line on the left multiplexer. So, my critical path requires something that logically cannot happen in this logic. Uh-oh. That's got to be some kind of a problem. So, there's some language for this. You cannot sensitize this path. You cannot make a logic change at the input that will propagate down this path and change the output. I know that you can, if you will, trace your finger through these nodes and add them up and say, the longest thing I can get is a 20, an 8 plus a 2, plus an 8 plus a 2. But the problem is you can't actually get a logical value to propagate down that path. So, there's a name for this. When you ignore the logic, it's called topological analysis, topological timing analysis. And, topological, from topology, just means, I am just focusing on the, the connectedness, the, the, if you will, the geometric structure of this thing. I am not focusing on the Boolean algebra at all. So we only work with the graph and the delays. We do not consider the logic. And so, you can get wrong answers. And what we actually found is something famous. The thing that we found is called a False Path. You can trace it from the input to the output through wires and gates with, you know, your finger. But you can't actually make a logic value propagate down it. So, going forward, we're just going to ignore this. We're going to ignore the logic. It's just to tough to deal with. And, we're going to assume that all the paths are sensitizable. And, there's actually a technical term here, which is, we're going to assume all the paths are statically sensitizable. And, that just means that we're going to assume that if you can find a path from the input to the output, that there's just some set of inputs that you can apply to all the primary inputs that will make that path actually sensitive. And the one thing that I want to sort of emphasize is, there's nothing new here. You know exactly what this is. This is exactly the Boolean difference stuff. Remember what we did with the Boolean difference? I said, you take the, you know, the output. You calculate the Boolean difference with respect to the input. you get something that's not dependent on that input. You find a val-, a set of inputs that makes the Boolean difference once. Right? The value of that thing, that vector of input that makes the Boolean difference 1, that statically sensitizes the path. So, if you change the input you change the output. We're just going to assume that you can actually, you know, find that vector of inputs that would sensitize the output to that input. Knowing that that's a false statement, I mean, knowing that that's just wrong. So, the answer is, we are going to do this analysis and it's going to be what is called, pessimistic, which means you know, it's going to be wrong. And when it's wrong, we're going to tell you the logic's a little slower than it really is. So if, you know, if you're going to be wrong, that's an okay way to be wrong. but it's just the way it works. It's just not, you know, technically, computationally, from a complexity point of view, possible to figure out all the fals-, false path stuff. So, all of this, all of these ideas together, the notion that we are not going to look at the logic. The notion that we are only going to look at the topological stuff. The notion that we will have false paths, and we just ignore them. Okay? The notion that all paths are sensitizible and exactly the way the Boolean difference from the computational Boolean algebra stuff taught you, that is a very famous name. That's called Dtatic Timing Analysis, STA. And so, what we're going to do on the rest of the logical side of this lecture, is tell you how STA works. So let's go do that. [SOUND]