0:00

Welcome to lecture ten of Networks, Friends, Money and Bytes.

Â In today's lecture, we'll try to formulate and answer this question.

Â Does the internet have an Achilles' heel? And of course the term Achilles' heel here

Â comes from the metaphor and we need to properly define what exactly do we mean an

Â Achilles' heel. Now, there's no shortage of vulnerability

Â of the Internet. In the second half of this course, we'll

Â be focusing a lot on the underlying architectural principles behind the

Â Internet. And we'll see that there are many holes

Â for security breach, for privacy intrusion, and also for unreliability.

Â 0:50

However, a Achilles' heel here refers to a reported existence of a particular type of

Â Internet security issue back in 2000. And this specific vulnerability has the

Â following, Issues.

Â There is a highly connected router in the middle of the network can be easily

Â detected and if you destruct this router then the whole network will split into

Â many parts. Later in chapter thirteen, we'll talk a

Â lot more about routing and routers on the internet.

Â But for today, we can just say that in the Internet, there are many nodes and leads

Â of various kinds amd a lot of these nodes are what's called routers,

Â They route the packets across the Internet.

Â What this is saying is that, in the middle of the Internet, there're some very highly

Â connected routers. And if you destruct them because they sit in the middle of the

Â internet, where the many connections to the rest of the network.

Â If you take this one down, the entire network will break apart in many

Â distinctive pieces. It turns out that, if we define this kind

Â of specific vulnerability as the Achilles' heel,

Â 2:19

Then Internet does not have an Achilles' heel.

Â And yet, back in 2000, it was reported that there is an Achiles' heel.

Â It was on the cover of Nature Magazine, and was a very alerting, report,

Â Except, it does not fit the reality of the Internet.

Â So why would there be such a report? And what can we learn from the way to

Â refute this wrong statement? Well, before we start answering that

Â question, we have to highlight that topology is only part of the story.

Â The story here is about a specific kind of robustness,

Â 3:11

Or the network called the Internet. And as we have talked about, in networks, we have

Â to think about both the topology, Often represented as a graph with various

Â matrices associated with it, as well as the functionalities, which are captured by

Â a variety of mathematical models. So there are many functionalities,

Â For example, network protection. There are many network protocols, that

Â we'll talk about later in the course that can detect failure and then try to recover

Â from the failures. And these are functionalities sit on top

Â of the underlying topology, And therefore it's not just about what the

Â graph looks like. It's, it's also about the control plane

Â that sends a control signal to measure, detect, and recover from a variety of

Â failures, including router failures. On the flip side, often because of the

Â bugs or the time to converge in the network management plane that caused a lot

Â of failure issues. So both robustness and protection and

Â failures actually stem off and from the functionalities.

Â Robustness from the good things about functionality filler from the bugs and bad

Â design in the functionalities, rather than associate it with the underlying graphs.

Â Now having said that, let's assume actually incorrectly that, robustness is

Â about the graph only, for the rest of today's lecture, and let's say what we can

Â say there. Now, since we are focusing just on the

Â graphs today, I have to talk about three different kinds of graphs of the Internet.

Â One is the graph of webpages connected by hyperlinks, just like what we saw in

Â lecture two, or three, I should say with Google's PageRank.

Â Algorithm in ranking webpages based on this connectivity pattern.

Â Let's graph type number one. Graph type number two is graphs of the so

Â called autonomous systems, And.

Â 5:35

These are business entities, For example, AT&T, British Telecom, Reliance in India,

Â and so on. Okay? Sometimes they are smaller entities,

Â smaller companies, or even campuses, for example, Princeton University.

Â These are different entities, profit or non-profit organizations, and they each

Â control a part of the overall network of networks that we call the Internet.

Â These individual autonomous systems, large and small ones, are connected by physical

Â as well as business relationships. That's the second type of graph.

Â The third type of graph, which is the focus today, is the graph of routers

Â connected by physical, often fiber, links. So if I just give you a picture of a

Â graph, This could be a graph where the nodes are

Â webpages and links are hyperlinks. It could be a graph where the nodes are

Â companies and organizations and the links are business relationships.

Â It could also be graphs where the nodes are actual physical routers and switches

Â and the links are physical fiber or copper wireless links.

Â When we talk about the robustness with respect to attack on hardware of the

Â Internet, we are talking about this third graph, the graph of routers.

Â 7:21

Discovering the graph topology itself it is a very challenging task.

Â In fact for the Internet, no one has a precise map of the Internet.

Â You may think somewhere in the world that there is a place that stores the map of

Â the Internet. Actually, there is no such map.

Â If we're talking about the AS, autonomous system relation graph, many links are

Â missing. These business or physical links are not

Â reported or recorded. There also whats called Internet exchange

Â points. For example, thousands of these just in

Â Europe. As pairing shortcuts, they decide that

Â instead of going through some bigger autonomous system, they will form a short

Â cut themselves, say between two smaller and medium-sized autonomous systems and

Â often these are not reported. So, it's widely reported that it could be

Â more than half of the links actually missing and therefore we are not even

Â close to having precise view of the AS graph.

Â But what about the router level graph with physical links that we are focusing on

Â today? Often, the topology is estimated or

Â inferred based on some measurements. These are limited measurements called say

Â traceroutes. But it is also widely reported that because of the protocol

Â details, traceroutes often lead to bias the sampling.

Â In other words, in a huge network, , with many nodes and many links, a lot of links

Â are not properly recorded by this measurement methodology.

Â And in fact, near the edge of the network, for example, near your, corporation, near

Â on a campus, or near your home, High-rise buildings as opposed to in the

Â backbone of the network. There's often no scalable measurement

Â platforms. So at the router level, the graph is also

Â not known for sure. There are many missing links.

Â But let's assume again incorrectly, that we can measure the degrees accurately and

Â see what we can say. So let's skip the important fact that

Â robustness is more than just about a graph,

Â And skip the fact that these graphs, we do not yet have a good view.

Â And just go straight to say, suppose we believe that we can measure the degrees

Â and that's all that matters. The topology is all that matters.

Â Let's see what we can say. Alright.

Â So what is the degree distribution we observe for router level graphs?

Â Well, we can tabulate these degrees into a histogram.

Â And we can, therefore construct the probability distribution.

Â Let's say probability that a node's degree x is assuming a certain specific value

Â small x. And often, people look at the tail distribution, which is the

Â probability that the node degree x is bigger than or equal to a specific value

Â small x. It turns out that,

Â If you look at the either AS or router level, although the router level graph is

Â as well focusing right now. The tail distribution roughly follows x to

Â the power of minus alpha with sum constant k in front of it.

Â This alpha here is a constant parameter that describes that the decay of the tail

Â distribution and this approximate symbol basically says that for sufficiently large

Â 12:14

As oppose to homeogenious networks, So homogeneous in what and scale-free in

Â what sense? Well, it turns out that a lot of

Â distributions, such as Gausian distribution or normal that we are so

Â familiar with, or exponential distribution, their tail distribution

Â decays much faster than this x to the power minus alpha.

Â And therefore, they do define a specific scale,

Â For example, for Gausian distribution, The scale is determined basically by the

Â mean and the standard deviation. And, it is not easy to get a node with a

Â degree [INAUDIBLE] degree that is way off from many standard deviation from the

Â mean. So that's why we call that homogeneous

Â networks, Whereas for long tail distribution, the

Â tail drops much more slowly, And for Gaussian, this would be basically

Â zero, But for long tail, this is still, still

Â signifigantly above zero. In this sense, it's called inhomogeneous

Â cuz there can be some nodes with a very high degree compared to the mean.

Â It's also called scale free, Because if you zoom into the tail, you see

Â that it also behaves like a heavy tail. So there's no characteristic scale.

Â No characteristic scale of, the node degrees.

Â So here is a cartoon to illustrate, this is Gaussian and this is long tail.

Â 14:07

Okay, there is a scale with respect to Gaussian, there is no scale, the scale can

Â be arbitrary. It is very difficult to be a way off from

Â the mean, whereas this can be way off in a heavy tail, in a long tail distribution.

Â Now, what kind of distribution is long tail?

Â A specific special case is called a Pareto distribution,

Â The same Pareto as Pareto optimal that we were talking about in lecture one.

Â So, Pareto distribution has the following tail,

Â Which is the probability that our random variable x, say the node degree, is bigger

Â or equal to a specific small x equals x over k, for some constant k to the power

Â of minus alpha. That we can take the differentiation of

Â one minus this tail, which is the cumulative distribution, differentiate

Â this expression that will lead us to the probability distribution function, PDF,

Â for Pareto distribution, that says the probability that a Pareto distributed

Â random variable equals, sorry, equals x is alpha, k to the power alpha, just a

Â constant, times x to the power one plus alpha.

Â Okay? As you can see, both the tail distribution and the distribution itself

Â follows a long tail, X to the power minus something.

Â Okay? Either minus alpha or minus one plus alpha with some constants in front of it.

Â Now, in contrast, for a Gaussian, rather than a Pareto distribution, that PDF, is

Â as you may recall, one over square root two pi, standard deviation of sigma times

Â exponential minus x minus the mean mu2 squared over two sigma2.

Â Squared. Hm, okay? So for Pareto distribution, it follows

Â this. For Gaussian it follows this. We can easily convince ourselves

Â numerically that, if you plug some x in here, you will see that this is way

Â smaller than this, as x deviates much away from the mean. Okay?

Â This is exponential minus x squared, this is only x to the some power, one plus

Â alpha, negative. And a visualization for the Pareto distribution is a straight line

Â in the log-log curve.

Â