Congratulations, we have finished building a modern general purpose computer system, hardware and software, from the ground up. So now that the task has been completed we can look back and ask ourselves how can we possibly make this computer better? How can we make it faster, more useful, more versatile? Well, that's exactly what we're going to do in this module, in module seven. We'll present several extension and optimization ideas. And I bet that you guys can come up with similar ideas of your own. Each one of these ideas can become the basis of a followup Nand to Tetris extension project. And in order to carry out each such project, you must first begin with something called design, D-E-S-I-G-N, design. You see, a good design holds the key to every successful hardware and software implementation project. And so far in the course, the designers of Nand to Tetris were Norm and myself. We provided all the design documents, all the APIs, the extended implementation contracts, the test programs, the works. You guys played the role of system developers, working from our design. And indeed, Norm and I, worked mighty hard to come up with well designed architectures and APIs. And we did it because we believe that implementing well designed systems is the best way to learn, or at least begin to learn, how to become a system designer, also known as a system architect. That's why we asked you to implement all these designs. If you want to become a poet, you better start by reading some good poetry. So what makes a good design good? Well, we can start with the obvious things. Modularity, simplicity, elegance, clarity, beauty. We didn't really talk during the course about all these things, explicitly at least, because talk is cheap. Instead, we asked you to implement our designs, and by doing so we hope that you've developed an intuitive appreciation of what it takes to be modular, simple, elegant, clear, and beautiful. Now you see, once you get all these things in place then the next design objective happens almost automatically. In this objective, which is probably the most important virtue of any hardware and software project or system, is the ability to modify and extend the system with minimum hassle. This is very relevant because you see, that's what we want to do in this module. We want to talk about how to optimize and extend the Hack Jack system. And it's gratifying to know that the system that we are about to modify is well designed. It's gratifying because well designed systems lend themselves naturally to such modifications. So we'll be able to do everything in a manageable and predictable way. For example, let's take multiplication and division. Presently, the hack.lu can only add and subtract. Therefore, even the trivial operation of multiplying by 2 requires calling an OS function which is very inefficient. Now when you think about how to allocate your efforts among various possible optimization projects, you should always think about impact. How can you get the biggest bang for the buck? Now, multiplication and division happen all the times in computer programs. And therefore, it clearly pays off to try to optimize that. So let's focus on how we can optimize multiplication because division is pretty much the same, very similar. Well, I think it was unit 6-2, but I'm not sure, I think in unit 6-2, we presented a highly efficient bitwise multiplication algorithm. Also in chapter 12, in the book. So we presented this algorithm and then we went on to implement it using the Jack language. Now if we want to speed multiplication we can take the very same algorithm and implement it not as an operating system function written in Jack. But rather, as a hardware chip written or specified in HDL. Once you will have such a chip fully tested in place. You could integrate it into the hack.lu. But this will also require extending the Hack instruction set. Because you have to decide on some binary multiplication opcode otherwise you won't be able to tell the ALU to multiply. Now if you've extended the binary machine language then also have to extend the assembly language. You have to come up with some symbolic mnemonic for representing multiplication. And this, in turn, requires that you also modify the assembler that we built in Nand to Tetris spot one. You see, that's what design is all about. We have to decide what you want to do and then you actually have to do it. So as you see, refactoring multiplication into the hardware, so to speak, entails a very nice sort of cross-section surgery. Because it cuts through and affects multiple layers in our computer architecture. And because these layers are highly modular and well specified, every one of them, or every one of these modifications can be done and unit tested separately. Another thing which is missing in our ALU, is bit shifting. Bit shifting can really speed up things like multiplication, division, as well as low-level graphical operations that manipulate pixels on the screen. And so the trick, once again, if you want to implement bit shifting in hardware, is to design new chips and new instructions that carry out left shift and the right shift operations at the hardware level. And once we have these hardware capabilities, we should consider modifying our translators to take advantage of them, to exploit them cleverly. For example, whenever the compiler has to multiply or divide. I'm sorry let me rephrase it accurately. Whenever the compiler has to write code that multiplies or divides a number which is a power of 2. It can do so by generating code that uses bit shifting instead of standard multiplication. You see, that's an example of what people mean when they talk about optimizing compilers. And by the way, in this module, when we say compiler we mean both the compiler and the VM translator. So, you may have to modify either the compiler, or the VM translator, or both of them. This is something that you have to figure out yourself in any one of these optimization projects. And by the way, the VM translator can really be optimized in many different ways. Because right now, it simply translates commands from VM to assembly without trying to generate tight code. And therefore, even the simplest operations like adding 1 or subtracting 1 generates several assembly codes instructions. Where in fact, you could do it with far less. So that's maybe even an example of yet another optimization project. Now, multiplication, division, bit shifting, VM optimization, all these things are examples of optimizing existing functionality. How about creating new functionality? For example, how about adding mass storage and network access to our computer? That's what we'll discuss in the next unit.