Hi, folks. So, let's take a look at the diameter of random graphs and in particular, remember we just had a theorem that we stated about the structure of a random graph. In particular these Erdős–Rényi GNP graphs. And these are also known as Bernoulli random graphs, Poisson random graphs. We'll talk about some of the definitions in a bit. But if you recall the basic statement of the theorem, it was that if the degree was growing at least with the log of n, so one plus epsilon times log n, and if the degree wasn't becoming too large, so that that degree relative to n is vanishing, then when we look at the average distance with a probability going to one, the ratio of the average distance compared to log n over log d is going to one, and the same actually holds for the diameter. And if you recall what we just went through in terms of the basic ideas of how this worked, we went through some calculations that, you know, first of all looked at a tree calculation, so that if everybody had exactly this degree d starting and you're moving out in a tree like form, then you know from a root node to get to the others you would need roughly d minus one to the l power. To reach n minus one, l would have to be roughly on the order of log n over log d. And the difficulty with this was well, not everybody would have the same degree. So let's suppose now we had instead of a tree, we had a tree where the degree was randomly varying and not everybody had exactly the same degree. So what we can show- first of all it will show is that the fraction of nodes that have nearly the average degree is going to one. And this is going to be true as long as this expected degree is at least on the order of log n, and in particular what we're going to use to prove this are what are known as a Chernoff bounds. And let me just sort of state what Chernoff bounds are. I'm not going to try and prove Chernoff bounds here. You can find proofs in many different places. In fact, you know you can go to- Wikipedia nowadays has a lot of nice statements of different math theorems, you can find backgrounds on this. But the important thing here is that if we're looking at a binomial random variable, so if you are just flipping coins with some probability p and then counting how many come up- so how many come up it as a given- so imagine I'm looking at a particular node and it's forming all of its links and we ask how many links does it end up with, well, the probability that the number of links ends up with is between three times the expected number, and a third of the expected number is at least one minus e raised to the minus expected number. So if I'm expected to have a hundred friends, then the chance that I'll have somewhere between 100 over three and 300 is one minus e to the minus 100. Well, this is getting very close to one, right? So it's as if you expect me to have 100 friends, then very close to one is the probability that I'm going to be- have at least 33 and smaller than 300. So Chernoff bounds begin to say, "Okay, there's a very high probability that you're going to be within a factor of three of what the expectation is." Okay, so that's very useful. How is that useful? Well, the probability that a node is going to have a degree is very close to the average- node i is going to be within d over the expected degree over three times 3d is one minus e to the minus d. If we want then ask that everybody has something which is within a third to a factor of three. What we're going to end up with is we can just raise this to the nth power. So we've got n different nodes. This is the probability that any one of them is within a factor of three. This is an approximation for the probability that we end up within a factor of three for all the nodes. Why is that an approximation? Well, these degree of different nodes is not quite independent. It's approximately independent. So if we actually want to prove this theorem very rigorously, there's a lot of detail that I'm going to admit, omit. And here, what we're going to do is go through the basic ideas and why is this true. And you know, you can check Apollo Bosh's book if you want gory details on the proof, or you can go through Chunkin Lewis' proof. So there's different proofs that will give you more detail here. I'm going to go through just the basic point. Okay, so we get that a probability of the degrees based on these Chernoff bounds gives us a high probability that everybody is going to be within a factor of three of the degrees. So, what we've done is we've shown that the probability that all degrees are within a factor of three of the expectation is at least this expression: one minus e to the minus d raised to the nth power. And then given that d is at least one plus epsilon times log n, if we plugged that in for the expression up here, then we end up with this- the probability that all degrees being within a factor of three is at least one minus one over n to the one plus epsilon raised to the n, and then using our approximation for this, we know that this is converging to one as n goes off to infinity. So we know that the probability that all degrees are within a factor of three as n is getting large is getting closer and closer to one. So what we've done then is showing that if we have the degree being at least the order of of log n, then the probability that all the degrees are within a factor of three is going to one. And that allows us then to conclude that with probability one, we are going to end up having the- So basically what we've shown is that the probability that all degrees are within a factor of three is going to one. And so, the distance that it takes to get from one node to all the others if we had a tree expanding with these degrees being randomly distributed, but having the properties that they're within a factor of three with probability one, then we could bound the distances by saying that the shortest it could be is if we ended up with three times the degree as the average degree the longest it could be, is if we ended up only with a third of the degree. And so, that gives us bounds on what this could be. And in particular, for d that's growing so that d isn't too small, the log of 3d and log of d over three are both basically proportional to the log of d, for a large d. Right? So this is the log of d plus log of three. This is log of d minus log of three and for large d, the three part is washing out. And so this tells us that basically as it grows, this distance is going to be proportional to log of n over a log of d. So whether it turns out to be off by a factor of three really doesn't matter much in the average distance calculation, because that's proportional to logarithms, and logarithms wash out these factors. Okay, so that's a very nice point. And you know, we have this other part that some links may be doubling back. And when the part of that it's going to help with the expansion is that, you know, these links are being put down at random and most of the nodes aren't really reached until the last step of this expansion process. And so after K steps, we've reached about d to the k nodes, and there's still n minus d to the k to be unreached, and if k is still not- we haven't expanded all the way outwards, then there are still many more nodes out there compared to the nodes which are already have been reached. And so if we think about a given node and we ask where are its links going to, most of its links are going to- given that they're independently at random, are going to be reaching new nodes rather than nodes that have already been reached. And so it's not until you get to very close to having reached all the notes, that you have to start worrying about the fact that things are going to be really doubling back with any high order of magnitude. And so you know, just to get a feeling for this let's suppose that each person had a 100 friends, then you know at the first stage you would reach a 100. The second stage is 10,000. Then we get to a million. Then we get to 100 million and so forth. So things are actually expanding outwards, and there's always more and more nodes that haven't been reached than ones that have already been reached. And so the average distance is actually going to turn out to be very close to the diameter, because most of the ones are being reached at the last step. And that is also going to be a part of the proof in making sure that these things aren't doubling back at too high a rate. So this is sort of a heuristic treatment of why these things are true. But what it does is, it gives you an idea of how that Chernoff bounds can be used in these kinds of proofs. Basically, the proofs of properties of random graphs work by understanding different structures that underlie these things, and then proving that things happen with high probabilities using variations on laws of large numbers, showing that as things become very, very large- then properties are going to be true with probabilities going to one. Thats a basic structure that underlies a lot of proofs in random graph theory, and gives us an idea of why we're seeing the kind of theorem that we saw in terms of diameter and average path length being proportional to log n over log d. So now let's take a look at some actual average path lengths and distances and see whether we see that this seems to be accurate.