[MUSIC]

At the end of week two,

we considered a few examples of different families of probability distributions.

You may recall things like the Bernoulli, the binomial, and

the Poisson distributions which we covered.

Now, one of the most important distributions

in statistics is something called the normal distribution.

Now, there's actually a bit of a distinction between those Bernoulli,

binomial and Poisson considered previously, and the normal distribution,

whereby the previous ones were what we call discrete distributions,

such that the values which available could take tended to be integer values only.

Now, we've introduced the concept of measurable variables within

this week of the course.

And the normal distribution is a distribution appropriate for measurable

variables which could be measured along a continuum, along a continuous interval.

So I'd just like just to show you a few examples of the normal distribution.

Both from a more empirical sampling perspective,

as well as the theoretical characteristics of the normal distribution.

So many of you may have come across the normal distribution may not have

known it's name.

But it's this sort of familiar bell-shaped curve.

So if we return briefly to the previous section where we introduced

the hypothetical returns on to stocks, the red stock and the black stock,

those are histograms.

You could see a reasonable approximation to a bell-shaped curve.

Indeed, in finance, quite often it is assumed that the returns of stocks,

shares, equities tend to follow a normal distribution.

However, you may recall back in week one of the course I said about the importance

of simplifying assumptions in modeling, but one should always be cautious

about whether these assumptions are truly born out in reality.

And in fact, there's some debate about whether the returns on stocks truly follow

a normal distribution or in reality might detail areas of the returns of stocks

perhaps be slightly fatter than a normal distribution, indicating that extreme

returns are more likely than a normal distribution might predict.

But ignoring those technicalities, and putting them to one side, one can say that

if these were stock returns that they follow reasonably a normal distribution.

Another example, let's take it from medicine, if we collected some data

on diastolic blood pressure of a large sample of patients and

produced the histogram of this, remembering that histograms are great

visual displays for a single measurable variable.

And if we look at a histogram of such a diastolic blood pressures across a large

sample of patients, one can see that it's a normal bell-shaped

curve could easily be superimposed on top of that histogram.

Now, in this case, we all thinking about drawing a sample of observations

from a wider population, this is not diastolic blood pressure of every human

being rather a random sample drawn from that wider population and we're going to

discuss more of sampling from populations in the other next week of the course.

But ideally any sample of our observations we observe should be fairly

representative of that wider population.

So it might be reasonable to assume,

that's our idea of making a assumption again, that perhaps the diastolic

blood pressure of all people follows a normal distribution.

And this large random sample we've drawn from that population roughly reflects

those same population characteristics within our sample or histogram.

So there could be many situations in life where a normal distribution could

reasonably be assumed, perhaps just one more variation of that.

If we return to our look at GDP per capita in the histogram reproduced for

this across a large sample of countries, here,

we see a distribution which is very much non normal.

Indeed, what we would call a heavily skewed distribution.

A large proportion of countries have a very low GDP per capita, either one or

two perhaps, outliers which on a GDP per capita basis are very wealthy indeed.

So income here on a GDP per capita basis,

clearly does not tend to follow a normal distribution.

Indeed, if we weren't looking across countries but

we're looking at within a country itself, I think it's fair to say that this will

heavily positively skewed distribution could be expected to be found.

Namely the vast majority of people in a country earning fairly or

modest incomes with just a few select ones earning a very high, high income.

These might be, for example,

those professional footballers, the CEOs of some top companies among others.

So although income clearly does not tend to follow a normal distribution,

sometimes we may be able to apply an appropriate transformation to

a variable in order to make it sort of converge more to normality.

Now, these are little tricks in modeling which you may come across in more advanced

courses.

But quite often, we may not work with income directly, but

the logarithm of income.

And sometimes this so-called log transformation, taking the original

variable and applying the logarithm to it, can turn some distributions which

appear very non normal into ones which seem more normal, and indeed for

those very much interested in this perhaps read up a little bit more on so

called log-normal distributions.

So in short, there are many situations in the real world where we could potentially

assume normality, either directly or perhaps through some log transformation.

But up to now, we've just considered some sample data sets.

I'd like to end this section

by considering the more theoretical characteristics of a normal distribution.

Now, we introduced, within this week, for example,

the letter X to denote some random variable.

So if we considered X perhaps being stock returns, height of human beings,

it would perhaps be another great example where we might assume normality.

Think of all the people you know.

You will perhaps know a few very tall people, a few very short people.

But pretty much everyone else of more sort of moderate height.

And if you do the histogram of the heights of everyone you knew,

you're likely to get something resembling a normal distribution.