Welcome to the perspective unit of the second week in Nand to Tetris. During this week we built an ALU. And in the next week we're going to put this ALU on the side and focus on building some memory systems leading up to a RAM unit. But this, this is something that we'll do next week. Let us now look back and bring up some questions that typically come up when we talk about combinational logic and ALU design. The first question is, we've already built about 20 chips in this course, are these chips standard? Namely, are they typically being used in many other computer systems, or are they specialized to the computer that you built in this particular course. Noah? >> Most of the gates that we build in the course are completely standard. The half adder, the full adder, the adder. These gates are completely standard, as are the gates we built last week, like the multiplexor the or Xor gates and so on. What is not completely standard is our ALU, which is extremely simple. In this course we clearly emphasize simplicity over almost anything else, because we want to fit everything into one course. So for that reason our ALU is extremely simplified in its implementation, and in the respect it's a bit unique among usual ALUs. >> The next question is directly related to the previous one and the question is how come the allele that we built does not feature more operations like multiplication and division. Well the answer is that indeed there is no problem to write HDL code that specifies chips that carry out multiplication and division by operating directly on zeroes and ones, on, on bits. And in fact, there are some very elegant and very nice algorithms to do just that. But in general, when you build a computer system the overall functionality that the system provides is divided between the hardware and the operating system that runs on top of it. So, it is the designers freedom to decide how much functionality to put in each one of these layers. When we design the heck computer which is the computer that will continue to build in the course. We decided, as Noam explained before, that the ALU will be extremely simple. And that operations like division and multiplication will be delegated to the software that runs on top of this computer. And indeed, in the second part of this course, in Nand to Tetris part two, we are going to design among other things an operating system, and the operating system will have several libraries and one of these libraries is gone, is going to be called math. And the math library will feature all sorts of of very useful mathematical operations including multiplication and division. So at the end of the day the programmer who writes programs that have to run on this computer will not feel the difference. The programmer will not really care if certain algebraic operation is being done by the operating system or by the hardware. It will be completely transparent for the high level programmer. But of course there's there's some tradeoff here. And typically when you design an operation in hardware typically it runs much faster but it's costly to design and it also costs money to manufacture the the more complex hardware unit. So once again it's a matter of tradeoff, cost effectiveness. And that's how we decided to build the the heck computer. Simple ALU and many extensions later when we build the operating system. We hope that we convinced you that the ALU that we designed in this course is indeed simple. And now the next question is is this ALU actually efficient? >> Almost everything that we did in the construction is completely efficient, so you really can't say much more. But there is one component which is where some important optimization is still possible, and it's probably worthwhile to talk for one second about the kinds of optimizations that we're talking about, and this is the adder. So let us see what is the main problem with the adder and how one may, may decide to improve upon it. So let us recall the implementation of the adder. The adder was constructed with a sequence of full adders inside it. [NOISE] Each full adder got some inputs from the input of the adder gate, and then the important thing that one of its output's carry went on to the next adder, the next full adder. Similarly, the outputs are carried from this full adder went down to this full adder and so on. Okay. So this chain of connections is really the problematic one that one may wish to op, wish to optimize. So let us look at the distance, if you wish. How many gates must the signal, signal traverse between the input and the final output of the last full adder? It needs to go inside the first full adder, go through a few gates, then go here through another few gates. Again, another through, few gates and so on. So assume that there are, like, three or four gates in each full adder. If this is, say, a 32-bit full add, 32-bit adder, altogether we have 32 bits times 3 or 4 bits of delay from input to output. And when you actually run the such hardware in a system you will have to take this delay into account because it actually takes time for the single to perverse for all the capacitor to the capacitors and then implementation of all these full adders to actually completely a load and so on. So having such a long chain is not considered a good thing because it actually increases the delay. So what could you do instead? Really, if you want to shorten the chain then maybe you can compute this thing, the carry that goes into one of the full adders at the top, it's, it's a most significant bits, computed in a different way, not throughout this long chain. And in fact if you actually look at the logic that's computed here there are more efficient ways, ways that have less delay. And what you could do is do what's called carry look ahead. Compute this carry independently of this long chain so that may duplicate if you wish some effort, but at least you're minimizing delay. So instead of just having the carry be computed by this long chain, have the carry be computed separately for each one of the full adders and the least delayed way possible. And that is called carry look ahead and that would allow you to run this chip at a much faster rate because the delays are shorter. >> All right, moving along. The last question that we want to focus is why do you recommend using built-in chips in project two instead of the chips that we actually built in project one? Well first of all you're welcome to use the chips that you built. You know, we definitely don't prevented and its it makes a lot of sense to use the chips that that you actually build because this is what this course is all about. We are building a complex machine which consists of various layers of construction and it is perfectly reasonable that when you build a certain layer you will use layers that you built before. However there are very good reasons why you should want to use built in chips that we supply rather than the chips that you've built yourself. And the most important of these reasons is the notion of we can call it local failures. And the idea is that if you build, I'm sorry, if you use built-in chips as your chip arts, and some and some problem raises its ugly head in the present project, then you are guaranteed that this problem can be attribute to bugs and problems that were created in this project only, and not in previous projects. So, this is also sometimes called the notion of unit testing. You test each unit separately from the rest of the system. And this principle of unit testing goes hand in hand with other very important principles like obstruction and modularity. And taken together, you know, one of these things that these principles imply is that once you've finished building a certain module you can put it away. You can stop worrying about its implementation and use only the interface or the API of this module when you build more complex functionality. This really is the only way to manage complex projects. And by adhering to these principles which we find extremely important by doing so, we can really take this super-ambitious project of building a modern computer from first principles and do it only in seven weeks. >> Shimon? >> Yes. >> I think we should also confess that our simulator is not that efficient, especially when we get to more complex projects, if you actually layer the chips that were constructed in previous projects, our simulator will have to face a huge number or arrays and will simply be slow. If you do it using just the, the finished end of the previous chips, just the, the specifications of the previous chips, then our simulator will be fast and nice to work with. >> Yes. In fact that's another very important technical reason why we want to use built-in chips and this problem will come up later in the course when we build memory systems and more complex functionality. And indeed this is a very good other reason why it makes a lot of sense to use software based built-in chips.