0:09

While the basics are we want to know how many bits the program use or

Â bytes, 8 bytes at a time.

Â And actually we'll be talking in terms of millions of bytes or billions of bytes.

Â And actually surprisingly there is a controversy about even these basic

Â definitions.

Â Computer scientist think of a million bits as 2 to the 20th and

Â a billion is 2 to the 30th because that's the number of possible things that you

Â can fit into the 30 bytes and everything is consistent with our calculations.

Â Other scientists stick to one million and one billion for lot of reasons.

Â I will usually use a 2 to the 20th to mean a megabyte.

Â Now on old computers for many years use the 32 bytes machine so

Â that pointers were four bytes.

Â 1:02

Just in recent years, we've mostly switched to a model where machines

Â are 64 bits, and pointers are 8 bytes.

Â That allows us to address much more memory, but pointers use much more space.

Â And actually, this transition caused a lot of problems initially,

Â because programs were using way more space than people thought they should.

Â You're not going to have to go through this kind of transition the way that we

Â did because 64-bits is definitely enough to address anything that you might

Â need to address.

Â Two to the 64th is really a huge number.

Â 1:41

So in terms of bytes we have to start out with typical memory usage.

Â Now again this is very dependent on machine and implementation but

Â these numbers are reasonable and are found on typical implementations.

Â So a boolean, it would be nice if a boolean just took a bit, because it's just

Â true or false, but actually usually we have to count for a byte for a boolean.

Â One byte is a byte, character nowadays is 2 bytes, 16 bit characters.

Â Not that long ago, we used 8 bits for chars.

Â Integer, a regular int is 4.

Â Bytes or 32 bits and float is also 4 bytes, long int is 8 and a double is 8.

Â Usually we use doubles for a floating point and ints for

Â integers in most applications.

Â So that's for primitive types.

Â And then for arrays there is a certain amount of overhead for making an array and

Â then if there's N items it's whatever the cost of the primitive type times N.

Â So, in array of doubles is say 8N + 24.

Â And two-dimensional array.

Â Then well, we can go ahead and compute the exact thing,

Â but now it's time to use the tilde notation.

Â Even for arrays, we could say a double is tilde 8N, for one-dimensional.

Â For two-dimensional, two-dimensional array of doubles is tilde 8 M N.

Â And there's extra terms for the overhead but for

Â large M and N that's going to be pretty accurate.

Â So that's our basic usage for primitive types and

Â arrays in a typical job implementation.

Â 3:50

So for example if you had a date object that had three int instance variables,

Â then that object would take a total of 32 bytes each int takes 4 bytes.

Â Object overhead is 16 bytes.

Â I need 4 bytes for padding, so it's a total of 32 bytes.

Â 4:11

So, and the other one that often comes up is a string.

Â And a string is a little bit more complicated than an array but

Â the typical implementation of a string in Java has a reference

Â out to an array of characters and then it's got int values for

Â offset, count and hash value and then some padding.

Â And adding it all together.

Â The [COUGH] cost of a string is about two N plus 64 bytes.

Â 4:48

So these are the basics that we need to analyze

Â the memory usage for a typical Java program.

Â So for data type value, if it's a primitive type it's 4 for an int,

Â 8 for a double and so forth.

Â If it's a reference it's going to be 8 bytes, that's what a pointer takes.

Â Array, 24 bytes plus the memory for each entry, and an object, 16 bytes.

Â Plus the memory for the incidence variable plus if there's an inner class that's

Â another eight bytes as we talked about with nodes for linked lists, and

Â then there's the padding.

Â 5:27

So then we have to think about who's responsible for

Â referenced objects In some cases.

Â And we'll take care of that when we get to these situations.

Â So as an example, a simple example of memory use analysis,

Â let's take a look at how much memory our WeightedQuickUnionUF

Â function from a few lectures ago uses as a function of N.

Â And there's only a couple of memory elements, and

Â each one of them are easily analyzed using the basics that we just gave.

Â It's an objects so the 16 bytes of object overhead.

Â There's two int arrays.

Â Each one of them have an array overhead

Â of 24 plus 4N for the N entries.

Â Each of the N entries takes four bytes.

Â And then there's four bytes for the count, and there's four bytes for the padding,

Â and if you add it all together.

Â It's 8N + 88 which tilde 8N.

Â Again, all that saying is when N is large all we are going to care about

Â in terms of analyzing the memory is that we've got [COUGH] two N integers.

Â 7:06

We can do it with empirical analysis where we actually execute the programs to do

Â experiments and use assume power law, formulate hypothesis and make predictions.

Â But we can do more, we can do mathematical analysis where we can identify the most

Â costly operations, analyze the frequency of execution of those operations.

Â And using the tilde notation to simplify analysis,

Â we can actually explain the behavior, not just predict it.

Â And this is a fine example of the use of the scientific method to

Â understand artifacts that we're studying, algorithms.

Â Our mathematical models are usually independent of a particular

Â computer system and even applied to machines that are not yet built.

Â But we always validate our mathematical models by running experiments on real

Â machines, so that we can be confident when we're making predictions and

Â analyzing algorithms.

Â