So far we've be talking about running time, now we have to talk about the memory requirements of our programs as well. While the basics are we want to know how many bits the program use or bytes, 8 bytes at a time. And actually we'll be talking in terms of millions of bytes or billions of bytes. And actually surprisingly there is a controversy about even these basic definitions. Computer scientist think of a million bits as 2 to the 20th and a billion is 2 to the 30th because that's the number of possible things that you can fit into the 30 bytes and everything is consistent with our calculations. Other scientists stick to one million and one billion for lot of reasons. I will usually use a 2 to the 20th to mean a megabyte. Now on old computers for many years use the 32 bytes machine so that pointers were four bytes. Just in recent years, we've mostly switched to a model where machines are 64 bits, and pointers are 8 bytes. That allows us to address much more memory, but pointers use much more space. And actually, this transition caused a lot of problems initially, because programs were using way more space than people thought they should. You're not going to have to go through this kind of transition the way that we did because 64-bits is definitely enough to address anything that you might need to address. Two to the 64th is really a huge number. So in terms of bytes we have to start out with typical memory usage. Now again this is very dependent on machine and implementation but these numbers are reasonable and are found on typical implementations. So a boolean, it would be nice if a boolean just took a bit, because it's just true or false, but actually usually we have to count for a byte for a boolean. One byte is a byte, character nowadays is 2 bytes, 16 bit characters. Not that long ago, we used 8 bits for chars. Integer, a regular int is 4. Bytes or 32 bits and float is also 4 bytes, long int is 8 and a double is 8. Usually we use doubles for a floating point and ints for integers in most applications. So that's for primitive types. And then for arrays there is a certain amount of overhead for making an array and then if there's N items it's whatever the cost of the primitive type times N. So, in array of doubles is say 8N + 24. And two-dimensional array. Then well, we can go ahead and compute the exact thing, but now it's time to use the tilde notation. Even for arrays, we could say a double is tilde 8N, for one-dimensional. For two-dimensional, two-dimensional array of doubles is tilde 8 M N. And there's extra terms for the overhead but for large M and N that's going to be pretty accurate. So that's our basic usage for primitive types and arrays in a typical job implementation. Now a lot of our programs use objects like Linklys and so forth, so we also have to factor in object overhead. The cost of reference and also there's padding built-in in typical implementations to make it so that each object has to use a multiple of 8 bytes. So for example if you had a date object that had three int instance variables, then that object would take a total of 32 bytes each int takes 4 bytes. Object overhead is 16 bytes. I need 4 bytes for padding, so it's a total of 32 bytes. So, and the other one that often comes up is a string. And a string is a little bit more complicated than an array but the typical implementation of a string in Java has a reference out to an array of characters and then it's got int values for offset, count and hash value and then some padding. And adding it all together. The [COUGH] cost of a string is about two N plus 64 bytes. So these are the basics that we need to analyze the memory usage for a typical Java program. So for data type value, if it's a primitive type it's 4 for an int, 8 for a double and so forth. If it's a reference it's going to be 8 bytes, that's what a pointer takes. Array, 24 bytes plus the memory for each entry, and an object, 16 bytes. Plus the memory for the incidence variable plus if there's an inner class that's another eight bytes as we talked about with nodes for linked lists, and then there's the padding. So then we have to think about who's responsible for referenced objects In some cases. And we'll take care of that when we get to these situations. So as an example, a simple example of memory use analysis, let's take a look at how much memory our WeightedQuickUnionUF function from a few lectures ago uses as a function of N. And there's only a couple of memory elements, and each one of them are easily analyzed using the basics that we just gave. It's an objects so the 16 bytes of object overhead. There's two int arrays. Each one of them have an array overhead of 24 plus 4N for the N entries. Each of the N entries takes four bytes. And then there's four bytes for the count, and there's four bytes for the padding, and if you add it all together. It's 8N + 88 which tilde 8N. Again, all that saying is when N is large all we are going to care about in terms of analyzing the memory is that we've got [COUGH] two N integers. Two arrays of size N, each one of which takes 4 bytes for grand total of 8 N bytes. Okay, so in summary, we really can figure out how many times we have to turn the crank on modern computers. We can do it with empirical analysis where we actually execute the programs to do experiments and use assume power law, formulate hypothesis and make predictions. But we can do more, we can do mathematical analysis where we can identify the most costly operations, analyze the frequency of execution of those operations. And using the tilde notation to simplify analysis, we can actually explain the behavior, not just predict it. And this is a fine example of the use of the scientific method to understand artifacts that we're studying, algorithms. Our mathematical models are usually independent of a particular computer system and even applied to machines that are not yet built. But we always validate our mathematical models by running experiments on real machines, so that we can be confident when we're making predictions and analyzing algorithms.