Okay, so let's talk a little bit about the distributions of degrees in a network, and why, why do we care about this? Well obviously, when we are trying to characterize networks in terms of basic properties that they have knowing whether the most nodes have very similar degrees or very different degrees can give us a lot of information. So for instance you know, whether everybody has one or two links or some of the nodes have eight and others have one, that's going to give us a very different kind of network. And that different kind of network's going to have different properties in terms of its difusion properties, its path links and other kinds of things. So, you know, what we looked at in terms of the GMP graph had, most nodes had order of magnitude, similar degrees to each other. let's look at this in, in a little bit more detail. So when we look at say these Erdos-Renyi GNP random graphs. well the, the chance that a given node has exactly d links, is just follows what's known as a binomial distribution. Right? So the chance that I have exactly d links present and d the rest of them not present, well I have chance p of a given link being present raised to the dth power, then 1 minus p raised to the n minus 1 minus d power. And how many different ways could I have actually picked the, the other d nodes that I'm connected to? well there's n minus 1 factorial over d factorial times n minus 1 minus d factorial. So basically this is what's known as n minus 1 choose d. that's the combinatorics of how many different ways I could have exactly d neighbors out of a population of n minus 1 possible other neighbors. Okay, so that, that's a binomial distribution. And these networks are also known sometimes as Bernoulli because you know, you're flipping coins, binomial or, or Poisson random graphs. Why Poisson? Well, if you want to approximate this if you want to approximate this formula for large n and relatively small p. the, when you look at 1 minus p raised to the n minus 1 minus d power, this looks roughly like e to the minus n minus 1, p. And this combinatoric choose the expression looks roughly like n minus 1 to the d over d facotiral. So when you, when you bring these two factors together it's roughly n minus 1 to the d that you're left with, and this is well approximated by e to the minus, n minus 1 to the p. So, you know, the binomial approximation here, by a Poisson distribution means that we can represent approximately the probability that a given node has exactly d neighbors by, this formula here which is known as a Poisson distribution. Okay, so lets have a look at some of these random networks, lets, this is actually a random network I drew on 50 nodes with a probability 0.02 of having any given pair of nodes connected. And this is one realization that was drawn in what's known as UCINET, you can find that on the web if you want a network package. It's one of several that we'll talk about on the course. so I drew this using UCINET. random graph 50 nodes 0.02. So let's have a look at the actual degree distribution, compared to a Poisson approximation. So if we have a Poisson approximation, where you had an expected probability of links of 0.02. Then you would get the square dots here, the realized frequency is by the diamonds. And actually, these two are very difficult to tell apart. So the frequency distribution of how many nodes have degree 1, degree 2, degree 3 and so forth. There's a couple of to few nodes that, that had degree two, but everywhere else we're pretty much on the graph. you know, a couple things to note, out of our picture that we had, when we go back here there's a bunch of isolated nodes. it looks pretty much, the rest looks sort of tree like, there's no cycles in it. so we see several components, no component has more then a small fraction, we're just starting to see one large component emerge. if we go and instead bump this probability up to 0.08 on 50 nodes, then we get a much more connected graph. We still have one isolated node, and we end up with y'know, more nodes having many more connections. What's the actual distribution look like? Well, now when we look at the actual, the Poisson approximation, we have a curve here. And then when we look at the actual realized frequency, we end up with something which is not exactly on target. the piece is a little larger, so the approximation's not quite as great, but it's still fairly close. And if you had more nodes and looked at a random graph on a larger set, then you know, these things would tend to coincide even more. But again, the poisson gives us a reasonable approximation of what what we would have expected to be realized. so this is a poisson distribution. What we'd see if we saw links formed indipendently at random. what thing, what sort of interesting is that when you look at some networks they tend to have what are known as Fat tails. And one of the early looks at this in the network context was by Price in 1969. And he was looking at citation networks, and so, look, looking at, at papers that cited other papers, and articles that cited other scientific articles, and then looking at how many sites given articles have, and then you, you know, you could just put down sites at random, or you see what you, you actually saw in the data. And he saw what were known as Fat tails. And [COUGH], what that means is, effectively you saw too many articles that had lots of citations and too many that had very few citations, so if you look at the extremes you saw those being over represented. And then numbers that were close to the average, you saw fewer having sort of a middle amount of of citations, you either got very highly cited or very low citation. this is related to a lot of other things. There, there's a lot of things that have Fat tails. these distributions go back to Pareto in the 1890s so things like wealth distributions distribution of cities, populations around the world, words usage. There's a whole series of different things that have these kinds of Fat tails. And in particular there's a well know paper by Albert, Jeong and Barabasi in the late 1990s where what they looked at was webpages and connections to webpages in the Notre Dame part of the world wide web at that point in time. And what is plotted here, is a comparison between the in, in the blue are the actual data of the world wide web connectedness, and in the pink color here is what we would see if, if the links had been formed uniformly at random. And the idea of a of a Fat tails here are the idea that when we look at the actual representation, we've got, so this is log of frequency, log of degree. we have a higher frequency of really high degree nodes, and a higher frequency of really low degree nodes, and then the middle degree nodes are under represented relative to things. So we have this Fat tails, and in fact the distribution in this log log plot is almost linear. so, you know, a line is a reasonably good match for what's going on in these data. we'll talk about that in more detail as the course pers goes along. But the important thing here, is that there are the distribution is a little bit different than what we'd have gotten if we had just thrown down the lengths uniformly at random. Kay, so this is known, roughly is what's known as a as scale free distribution. the probability that a, given link has degree d, can be thought of as proportional to the degree raised to some power. And, part of the reason that this is known as scale free is if we, you know, double the degree, then we, we, we scale up, the relative probabilities, scale up so that. the we'll still get the likelihoods, will be proportional to each other as, as we are increasing degrees. And in particular if you take log of each side of this what do we end up with? We end up with the log of, of the probability, the frequency of having a given degree is going to be proportional to, some factor times the log of the degree. and so it, it, if it was fitting this, this distribution exactly, we should have a line. And indeed, when we look at what we saw here and, and you know, we get a reasonable approximation by a line in those data. Okay, let's have a look at a completely different kind of network and see whether we also see Fat tails or not. So this is, Bearman, Moody, and Stovel's data, the high school romance data, that we talked about just, a few moments ago. and here, we're looking at one high school, nodes are colored by, gender, male, female. And a link between two nodes means that they had a romantic relationship during a time period that was captured in, in the surveys in, in this high school. And, in particular, you know, here we see a different kind of structure. We've got a bunch of, of, you know, 63 different diads, so just two people connected to each other. We see one fairly large component which has a couple of cycles in it, but looks almost like a tree. you see some other different shapes here, a number of, of different configurations. So, a bunch of small components, one larger component. And actually, there's some isolated nodes that are omitted from the picture. let's look at the deg, degree distribution of this one, and compare it to the uniformly at random. So, here again, log of degree, log of the frequency. And in this case when we look at the actual data compared to the uniform at random, the fit of, if we actually try and match these up, the fit in terms of matching up the data is a very close match from just having. The uniform at random compared to a power distribution, doesn't fit this as well, so this doesn't seem to have the fat tails that the other one did. What does that tell us? It tells us that degree distribution's of different networks can have different properties. And if we're looking at some, they might have something that looks much more uniform at random, what this tells us about romance in high schools. but it it's saying that this is looking much more like a a purely at random graph whereas the other looked like it had something else going on in terms of its degree distribution looking more like a power distribution. So, that gives us a a brief look at degree distributions. We're going to be visiting those a lot more in the course, they're [UNKNOWN] a very important way of characterizing a network. And understanding how dense the network is and also what the sort of variation is in degrees across the population. so let's have a look at some other definitions that are going to be important in networks.