Module 1: Why Use Concurrency? Topic 1.1: Parallel Execution. So, Go language, a big property of Go language is that concurrency is built into the language. Now, we're going to talk about this term concurrency. Right now, I'm talking about parallel execution. Concurrency and parallelism are two closely related ideas, and we will be talking about the difference between them starting with parallel, but an important feature and it's built into the language. Now, this is, as compared to other languages like C and things or anything, Python, whatever, these languages you can do concurrent programming in these languages, but it's not built into the language, which means, usually what you do is you import some library, some external library, and then you can use the functions and it interacts with the operating system and so on. Go lang, though, they decided to look, "Concurrency is important enough. We're going to actually bake it right in so the constructs are part of the language." So, that's a good thing. Actually, usually, the constructs are easier to use. So, we're going to talk about that. Right now, I'm just describing a parallel execution because I really want to motivate why you need concurrency. Why do you need concurrency? Now, parallel is not the same with concurrent? We'll get to it, but they are similar. So, parallel execution is when two programs execute at the same time, exactly the same time. So then, you can say these two programs are executing in parallel. So, at some particular time, any particular instant in time, you can say there's an instruction from one executing and an instruction from the other executing at the same moment than they are executing in parallel. Now, generally, processor cores are made to execute one instruction at a time. Now, there are different architectures that do things differently. There are a wide range of architectures, and this is an architecture class. But generally, one core runs one thing at a time, one instruction at a time. So, if you want to have actual parallel execution, two things running at the same time, you need two processors or at least two processor cores. You need replicated hardware. You would need to have one CPU and another CPU 2. So, for instance, you have two completely separate computers, they can clearly run two different programs at the same time, or they can be the same computer but maybe some multi-core processors got four cores, then you could run four different instructions at the same time, one on each core. But understand that in order to get actual parallel execution, you need multiple versions of the hardware. You need replicated hardware in order to get parallel execution. Now, this is not the same as concurrency. Sorry, this is not the same concurrency, but we'll get to the difference in a second. So, why should you do parallel execution? What's good about it? The best thing about it is the tasks can complete more quickly. Now, by this I mean, a particular task doesn't complete more quickly just because you're running it in parallel with another task, but you get better throughput, meaning, overall, all the tasks are completed more quickly if you're doing two things instead of one at a time or multiple. So, parallel execution can speed things up, give you a better throughput overall. My simple, simple example is that say you got two piles of dishes to wash. So, when you wash a dish, you got to wash it and then dry it. So, if you have two dishwashers, they can cooperate and work as faster. How they cooperate is going to depend on the hardware they have available to them. So, some tasks have to perform sequentially, though. So for instance, so when you're drying and you're washing a dish, you clearly have to wash the dish before you dry the dish. That's sequential. They have to go in that order. They cannot happen at the same time. So, parallel execution is not this catch-all that makes everything faster. Certain things can't be executed in parallel. They have to be executed sequentially one at a time. So, you got to wash the dish before you can dry the dish, but you can have a system. Say I had one sink, one sink and one drying rack. So that's my hardware available to me, and one dish rag for washing or for drying. So, I could have these two people work where one person washes the dish and hands it to the other person who dries the dish. So then, in that way, somebody can be washing at the same time, somebody's drying, just keeps passing it over. So, then you're essentially getting some kind of staggered parallel execution, but both people are working at the same time. So then you get speedup. You get a good throughput. But even though these tasks can have to happen sequentially, you can still get a speedup. But I guess the key thing here is that some tasks are more parallelizable than others. So, like dishwashing task, you can't just say, "Okay. You two guys both wash this dish at the same time and you both dry it." If I only have one sink, only one person can get into the sink so one person can wash at a time, and I only have one drying rack so one person can dry at a time. So, because I don't have the hardware, I can't parallelize them. I would need two sinks to parallelize that, to do washing at the same time, and two drying racks to do drying at the same time. But in addition, I can never parallelize the washing of a dish with the drying of the same dish. You've got to finish the washing before you start the drying. So, some parallelization even if you have extra hardware, I can have a 100 washing sinks in there, I still can't do washing and drying of the same dish at the same time. So, some tasks cannot be parallelized even with the excess hardware, and that's important to know because people often think, "Well, I can just take anything and speed it up through parallelization." That is not true. Some things cannot be parallelized and that's that. It's just the nature of the computation. Certain things have to be done in a certain order, can't be done at the same time. Module 1: Why Use Concurrency? Topic 1.2: Von Neumann Bottleneck. So, we're talking about doing things in parallel using concurrent programming to execute code in parallel. But why do that? Coding and parallel, writing concurrent code is hard. I haven't brought that up yet, but as we get into it, we'll see it is hard for a bunch of reasons. Actually, if you look at most undergraduate curriculum, students don't learn that; you learned sequential programming. Writing C or Python or Java or whatever language, you learn sequential programming. Now, maybe sometimes an undergrad will have a class on parallel programming but maybe it's optional, maybe they take it, maybe they don't. But the vast majority of programming classes that you see at the undergrad level, anyway, they are not talking about concurrent programming. They're just talking about sequential regular old code runs one instruction at a time. So, programming concurrently is actually very hard, so why do it? Can we get a speedup without doing it? Clearly, if you do things in parallel, you could get a speedup. But do we need it? I would argue and most people would argue, "Yes, we do right now." In the past, maybe we didn't need it. But now, we really need it. So, one way to get a speedup without parallelism is to just speed up the processor. So just make a new processor that runs faster than the old processor, and then you get speedups and you don't have to change the way you write your code. Now, this had been the way of things until I don't know. Until recently. When I say recently, five, six, seven years ago, something like that. Until recent years, that had been the way of things. So, the way things sped up, majority of the way code sped up was because the processors were being built faster. Now, I'm from hardware design community and I've always felt that these programmers got away with murder because their speed is all on our back. We design better faster processors, they don't have to do anything and they just get better, faster running code. But that has really stopped now. So, part of the problem, there are several reasons why that stopped. It used to be when I was a kid, and I'm old. When I was a kid, not when I was a kid, when I was in undergrad, '86 to '90 I was undergrad. Back then, I remember a machine would come out and you'd buy and you think, "Oh, this is the fastest thing out there." and seriously, a few months later, maybe half a year later, they have for the same price something that was significantly faster and it would just kill you. You're like, "Wow, I just bought this machine and now they got something faster." and when I say faster, I mean, the clock rate went up. So the clock rate would be 20 percent higher than mine than the thing I just bought, and that was very frustrating. But this used to happen all the time. Every short period of time, clock rate would get faster and faster. As the clock rates gets faster, the code executes faster generally, modulo, memory bottlenecks which we'll talk about now. But processors will get faster and faster. Now, another limitation on the speed that you have now even now is this what you call the Von Neumann Bottleneck. So, it's the delayed access memory. So, if you think about the way processor executes code, it is the CPU is executing the instructions and there's memory, and the CPU has to go to memory to get the instructions, also to get data that you want to use. You want to say x plus y equals z, you got to go to memory, grab the data out, add them together, put the result into z or rather into x, but you have to access memory. So, the CPU is regularly reading from memory writing back to memory, and memories are always slower than the CPU. So even if you crank up the clock speed make that thing work a lot faster, the memory is still slow. Now, memory speedup slowly over time, but a lot slower than the clock rates would speedup. So, you get what's referred to as a Von Neumann Bottleneck where even if you crank up your clock or you could double your clock speed but your code only runs a little bit faster, and that would be because even as you crank up the clock speed, you're still waiting on memory. So, you'd waste a lot of time just waiting for memory access. So, what people do for that or have done in the past with that, and they can do less so now, what they have done for past that for that is they built cache. So, fast memory on the chip so you don't have to go to main memory, which is too slow, you go to fast cache. So, they like to pack more and more cache onto the chip and it speeds things up. So, that's what traditionally had been done until, when I say recently five, six, seven years, something like that, that's what happened. So, clock rates would go up, memory and cache capacity would go up, and so speed would go up. These performances on these processors will just go up and up and up, and as a programmer, you don't really have to do anything. You could just write your code the same way you wrote your code and expect that it would speed up magically because the processor themselves are getting improved. So, that's how it used to be. That's not how it is now. It's changed for a couple of reasons. First thing is that Moore's law, which I'm about to describe, has really died, I don't know what the right term is, it doesn't really happen anymore. Okay. So, Moore's law, it basically predicted that transistor density would double every two years. Eighteen months, no, two years was the number. Transistor density would double every two years. Now, these processors, they're all packet transistors, lots of transistors that are used to do computation. So, if you can double the transistor density, then the transistors are getting smaller and smaller and they switch faster when they're smaller. Meaning, they can go high to low faster. So as they get smaller, they get faster. So, as the density increases, you get this speedup or this natural speedup in the transistors. Now, this Moore's law is not really a law, law is sort of a bad term. It is not a physical law, it's just an observation. So, what was observed was that if you draw, actually you can see this in, I don't have a picture of this image, but you can see processor speeds clock rates over time. So, if you have time going up, clock rates going up, and they were going up very fast. Now, they've tapered off now. If you look recently, you can see that it's tapered off and the clock rates just can't get any higher, it can't get much higher than they are. But what we were getting for awhile is this exponential increase in density over time. So you get an exponential or roughly exponential increase in speed. So, Moore's law was sort of doing everything for us or most of the things for us. It was just given us the speedup, programmers didn't have to worry about things. Now, of course, in order to make Moore's law happen, hardware designers had to work our butts off in order to achieve this. I mean, it's not magic that transistor would just get smaller every few years, that was work. Hard work on the part of hardware designers. Figuring out how to get these transistors smaller and still accurately made and all these. Very hard work but they were consistently doing it. So, software people had an easy time of it, but that's not how it is anymore. That type of thing has gone away. So, software, in order to continue getting speedups has to do something else to keep achieving those speedups over time. Module one: why use concurrency. Topic 1.3: Power Wall. So, as I was saying, the speedup that you get from Moore's law, so density increase which leads to a speedup and performance improvement, it can't continue. So you might say, "Well, why? Why can't that just go on forever." The reason why is because these transistors consume power. So, sure, the density of transistors can go up and up and up on these processors, but these transistors consume a chunk of power, and the power has now is becoming a critical issue and they call it a power wall. So, as you increase the number of transistors on a chip, increasing the density, that would naturally lead to increased power consumption on the chip. Now back in the old days, power consumption was really low and people didn't use battery power so much. So, nowadays though, everything's portable running off of a battery. Also, the density has gotten so darn high, the power use has gotten high. Now, power use even if you are plugged into the wall and you have access to power, high power leads to high temperature. If something is running, it is consuming a lot of power, it's going to be physically hot. So, I don't know if you've ever opened up a desktop computer. You look inside, if you look at the motherboard, there's at least a bunch of cooling fans. Actually, you can see this picture it's a fan. Actually, this is a very common thing. Inside the board, there's a processor, the main processor, say I'm using an i7. So, on the board, there's going to be an i7, but on top of it will be a fan just screwed on top of the thing. A fan with a bunch of cooling aluminum fans just to release the heat to let the heat dissipate, and then a bunch of other fan just to air cool it, to blow the heat away. So, this is necessary because these chips are running at such high-power that they're heating up, and you need the cooling. If you don't have the cooling, if you don't have this heat sink to dissipate the heat and the fan to blow away the heat, then it'll hurt the chip. Eventually, the chip just melts, physically melts. So, this is basically what you're calling the power wall. So, even if you could put more transistors on there, you got to be careful about power and specifically in power and its impact on temperature. Temperature is probably the biggest wall there, but power is also an important thing because if you want to have portable devices, you have battery, you don't want to run out your battery instantly. So, power all by itself is important, but the temperature is probably the biggest limitation because you will melt the chip if you don't cool the thing. Just to note, air cooling is the standard. I mean, anybody who has desktop, laptop, you air cool it. Or server, too, you air cool it. Now, you could go to extreme where you say, "Look, I'm going to water cool it." Actually, supercomputers do this. They have pipes. They're plugged into a liquid cooling system, and they typically don't use water, they probably use liquid nitrogen. Some super cool fluid they pump that through the device to cool it much better, it gives you much better cooling. But nobody wants that on your laptop or desktop to have to plug it into a water system. Maybe the supercomputer, that's okay. But air cooling is the best we're going to do. In practice, people want air cooling, they don't want to have to go to liquid cooling. So, you had this limit where you can only dissipate so much heat. So, to be a little bit more specific about these limitations that happened due to power use, one shown up here is this generic equation for power, P, P power. There's an alpha. So alpha is that percent of time switching. So, what that means is that these transistors they consume power, or it was called dynamic power, they consume power when they switch. So when they go from zero to one or one to zero, they consume power. If they're just holding constant, if they're switching, if they're not switching at all, then they don't consume dynamic power. So, that alpha is a number from zero to one which indicates how often they're switching. Now, note that if you design your system well, they're switching a heck of a lot. Yeah. I mean, you probably want to use the transistor to do computation so that alpha should be fairly high. C is the capacitance. We don't want to go into detail what that is but it's related to the size of the transistor. So capacitance goes down as the transistor shrinks, which is a good thing. So power will go down to some extent. F is the clock frequency. That is what you want to increase, right? To make your device work faster, you want increase the frequency but note that if you increase the frequency, you're increasing the power. Then there's V squared. V, that V is the voltage swing, from low to high. So what that means is that basically whenever transistor goes from zero to one, one to zero; those binary values, those are actually analog voltages. So zero is typically zero volts and a one might be five volts. Like for instance with an Arduino or something like that, it goes from zero to five volts. Now, in a real processor, they're going to reduce that so that v is important because, notice that v is a squared factor, right? So if you reduce the voltage, you can significantly reduce the power. So maybe you're going to use voltage swing from 0 to 1.3 volts, 0 to 1.1 volts something like that. So voltage is sort of the first thing that you want to reduce if you want to save power. So Dennard scaling is another thing. It's sort of, it's paired together with Moore's Law. Dennard scaling is what gave us these speed ups so we get over time. The idea with Dennard scaling is that voltage, the voltage swing should scale with the transistor size. So as the transistors get smaller and you get more density, more transistors on a chip, you would also like to scale down the voltage at the same time. Because basically for the power reason I just told you, that equation, voltage has a big impact so if you can scale down the voltage, then you can keep power consumption and then temperature low or within limits. So that's what you want to do. You want to, you would like to have Dennard scaling. The problem is Dennard scaling can't continue forever. Voltage can't go too low for physical reasons. First reason is that the voltage swing between low and high has to be higher than the threshold voltage of the transistor. So transistors have what's called a threshold voltage. Below a certain voltage, they cannot switch on. So you got to have at least enough voltage to hit the threshold. Now, as you shrink it, you can manipulate the threshold voltage but it can only go physically so low. Then another thing is that noise problems occur. So one good thing about having a big voltage swing, from zero to five volts is if there's some kind of noise on your signal, right in your system which altered your voltage. Say your voltage goes plus and minus 0.5 volts. So instead of five volts, you get 4.5 volts. That's okay, because you know it's only plus, it's only minus 0.5 volts. So you know even though the voltage is 4.5, you know it had to have been five. That's a high, right? If instead of zero volts, you get 0.5 volts. You say, "Look, it's not exactly zero but it's close." Right? So you can tell the difference. But the reason why that's okay is because the voltage swing from zero to five volts is pretty big. If you have your voltage swing come down to say one volt, say zero volt is a low and one volt is high. Then if you have the same noise in their, 0.5 volts of noise. Then when the voltage is 0.5 volts, you can't tell if it was high or if it was low. So you can't recover from the errors. So you become less noise tolerant, and that's a big problem because there's always noise in any kind of practical system. So for these reasons, Dennard scaling can't continue. You can keep scaling the voltage down, there's a limit to that. In addition, none of this considers leakage power. So the equation that I showed for power is what's called Dynamic power. The power that the transistor uses when it switches. When it goes from low to high high to low. There's also another kind of power called Leakage power, which the transistor leaks off power even when it's not switching. Now in the old days, leakage power was pretty low compared to dynamic power mostly because everything was big. So leakage happens when you have sort of thin insulators, right? So if you think about basically how leakage happens is there's conductor and a conductor and current leaks from one to the other. There's insulator in between and the insulator is not thick enough. Then in the old days, everything was big. When I say big I mean, scaling was different. It's not big, it's still microscopic but relatively big. So you have thick insulators. So it's hard for leakage to happen. But as you scale everything down, these insulators become thinner and thinner and leakage can occur. So leakage power has been growing over time. So leakage power even scaling the voltage doesn't save you with the leakage power. Leakage power is actually increase over time. So for these reasons, the power you just can't even, you can't continue the Dennard scaling, you can keep scaling the voltage down so that power equation keeps going up, and that's what they mean by Power Wall. Basically, we are at a place where if you get this thing running any faster, the temperature is going to go so high that things are going to actually melt. The device will actually melt in the system. So what happens is, so there's our power equation again. You can't increase the frequency, not without melting things. So what do you do? So what do designers do in order to improve performance even though they can't increase the frequency? Beyond just improving performance, say you are in Intel, you want to sell chips, right? So if you say you come up with a generation chips that's running at four gigahertz and the next generation is also at four gigahertz, people may not buy the chip, right? They need some reason to buy this thing. It's got to be some improvement. So what do people do, they increase the number of cores in the chips. This is way, you get multi-core systems. So you probably heard of this, a processor core, it basically executes a program roughly and you can have multiple of them. So I say seven, might have four processor cores, or something like this. There are variable numbers of cores. But they're still increasing density. They're putting, they're just having more of this replicating hardware on the chip. But they don't increase the frequency, they don't increase the clock frequency. They keep it roughly the same. So for instance, clock frequencies, they go up slowly but much more slowly than they used to go up but the number of cores still continues to increase. Now, the thing about having a multi-core system, having lots of cores is that you have to have parallel execution, parallel slash concurrent execution to exploit multi-core systems. Meaning, if you got four cores in your processor and you can't run anything in parallel, right? Then what's the point of having four cores. When you're using one core and the other three cores are sitting idle. So in order to exploit these multi-core systems and get speedup. You have to be able to take your program divided amongst a bunch of cores and execute different code concurrently on the different cores. Okay, so this is where parallel execution becomes necessary. In order to keep achieving speedup in the presence of multi-core systems, you've got to be able to use, to exploit this parallel hardware. So you need to be able to write concurrent code just to exploit it. So that is really the motivation. This is why concurrency is so important nowadays. Because the programmer has to tell the program how it's going to be divided amongst cores, right? That's really what concurrent programming is doing. It's saying, "Look, here's a big piece of code. You can put this on one core, this on another core, this on another core." Right? These things can all run together. That's what the program is doing when you're doing concurrent coding, and you have to do that. Meaning, there are, actually it's funny, there are things called Parallel compilers, okay. There's a big research field or there was anyway, there still is, in parallel compilers and what they supposed to do is they take sequential code or regular C program or whatever language, and they parallelize it. So they do concurrent programming automatically for you. They say, "Look, here's a big piece of code. I'm going to chop it up into these different tasks that can all run concurrently." That is an extremely hard problem. It basically, I hate to say it but it doesn't work that well. Now, I know and I say that I'm going to offend a bunch of researchers but it doesn't work that well, and so what has to happen is since you can't automate that easily, you need the programmer to actually do that. That's what concurrent programming is. The programmer is saying, "Look, I can decompose this task in the following way." It's hard but it's important now. Because if you don't, if the core programmer doesn't do it, then it's not going to get done, everything runs on one core and then what's the point of having four cores if you're running everything on one core? Thank you.