and if I'm willing to assume that the data are roughly normally distributed in the sample

because they come from a population of

data points that are normally distributed or approximately normally distributed,

I can use just those two values to get this.

Another way to think about this 31 percent is that

the probability that any male on the population has

a blood pressure measurement more than 0.5 standard deviations

above the mean is 0.31 or 31 percent.

So, if I were randomly to select a male at random,

from this population 31 percent of the time he would have a blood pressure greater

than 0.5 standard deviations above the mean or greater than a 130 millimeters of mercury.

So, this type of computation we did to convert the Systolic Blood Pressure of

130 for the number of standard deviations above or below the sample mean,

is sometimes called a z-score.

There is nothing special about a z-score.

It is simply a measure of the relative distance and direction of

a single observation in the data distribution relative to the mean of the distribution.

This distance is converted to units of standard deviation.

This is akin or similar to converting kilometers to miles,

changing units of distance or dollars to rupees,

changing units of currency.

So, let me just give you a silly example to illustrate that this is all we are doing.

Suppose, you're a American who is apartment hunting in an unnamed European city.

You wish to find an apartment within walking distance plus or minus 1.5 miles

of a large organic supermarket which is on Main Boulevard which runs East to West.

You were only considering apartments on Main Boulevard and

you're only considering going in East and West's directions.

We ruled out North and South.

So, kind of a silly example but let's think about this.

The supermarket is two kilometers West of the main city square,

you were interested in three apartments.

Apartment one is six kilometers West of the city square,

apartment two is 0.75 kilometers West of the city square,

and apartment three is one kilometer East of the city square.

So, here's a schematic looking at apartment number one.

So, here's the supermarket as we've noted.

CS stands for city square.

The distance between these two is two kilometers,

and the distance between apartment one and the city square is six kilometers.

So, certainly we can figure out the distance between

apartment in one in the supermarket by taking

their observed distance from the city square and subtracting off

our reference point relative to the city square of the supermarket.

So, apartment one is four kilometers from the supermarket.

This is what we call the raw distance but I don't

know or understand the metric system, so,

I would need to convert this into units I am comfortable with in this case

miles to make sure that this is within my desired walking distance,

or to rule it out if not.

So, how many miles is four kilometers?

Well, one mile equals 1.6 kilometers,

so four kilometers is roughly equal to four kilometers

divided by 1.6 kilometers per miles or 2.5 miles.

So, my apartment in terms of miles is 2.5 miles West of the supermarket.

So, I've converted first.

I've gotten the distance between my apartment and supermarket

by subtracting the relative distance of the supermarket from the city square,

from the relative difference of my observation,

from the city square and then I've converted it to kilometers so that I can

evaluate whether it's close or not by my criteria.

Similarly, apartment two is 0.78 miles East of the supermarket,

and apartment three is 1.88 miles East of the supermarket.

So, in some sense the z-score is the statistical mile.

It allows us to convert observations from

different distributions with different measurement scales to comparable units.

When dealing with data that follow an approximately normal distribution,

these z-scores tell us everything we need to know about the relative positioning of

individual observations in the distribution of all observations.

We can compute z-scores for data

rising from any type of distribution doesn't have to be normal.

However, for data from non-normal distributions,

it will still inform us about relative positions,

but this may not translate into

correct percentile information and we'll

look at some examples of this in the next section.

So let's look at another example.

Here's a basic histogram of weights for 236 Nepali children,

males and females combined at one-year-old.

The mean of this sample is 7.1 kilograms, the median is 7.0,

so very similar in value and we can see that

this histogram is the distribution of these 236 weights,

is almost perfectly symmetric,

not quite, and bell-shaped in some sense.

So, it may be worth assuming that this data come from

a population of approximately normally distributed child weights.

Here I've superimposed a normal curve with the same mean

of 7.1 and the same standard deviation of 1.2.

Over this and we can see it doesn't exactly line up perfectly but it's not a bad fit,

and then you can imagine if we had a larger sample would fill out some of these gaps,

and the fit would be able to better ascertain the fit.

So, I'm going to use only the sample mean and standard deviation assuming that

these data come from a population of approximately normally distributed weights,

and let's estimate a range of weights for

most 95 percent of Nepali children who are 12 months-old,

using only the data from these samples.

So again we can estimate the middle 95 percent it's the range we would give is

the distance between the 2.5th percentile, the 97.5 percentile.

That interval would contain

the middle 95 percent where they estimate that again is I can estimate,

assume, because I've assumed these data come

from a population with an approximately normal distribution.

I can take the mean,

in this case the sample mean and minus 2 standard deviations and that gives me

a value of 4.7 kilograms for the 2.5th percentile,

and doing the same thing but adding two standard deviations to

the mean gives me the estimated 97.5 percentile.

So, based on that computation we estimate that most 95 percent,

the middle 95 percent and Nepali children who are 12-months

had weights between 4.7 and 9.5 kilograms.

So, again we're estimating on an underlying,

assumed to be underlying population normal distribution.

The value here is 9.5.

The value here is 4.7.

This cut off the 95 percent in the middle,

and we're left with 2.5 percent that are half weights greater than

9.5 and 2.5 percent with weights less than or equal to 4.7.

Just FYI if because I have these data I could actually use ahead all 236 observations,

I could actually find with the computer the 2.5th and

97.5th percentiles of these 236 values,

and those are 4.4 and 9.7 respectively.

So very similar to what we see when we

use the normality assumption and just the mean and standard deviation.

Suppose a mother brings her child to a pediatrician for his 12-month checkup,

his or her 12-month checkup,

and wants to evaluate where the child's weight is

relative to the population of 12-month-olds in Nepal.

Her child is five kilograms and wants to know is this child close to average weight,

way above average, way below.

Wants to plot it him or her relative to other children 12-months-old Nepal.

So the information we're trying to ascertain looks like this.

We want to figure out using the sample data and

assuming that these data come from data that is normally distributed.

We want to figure out the percentage of

children who are smaller in weight than this child.

Who have weights less than or equal to five kilograms.

So again, what we're going to do is create this z-score idea.

We're going to translate this measurement of

five kilograms into units of standard deviation so we can

plot or find out where this child's weights compares to

the mean of all such children in terms of standard deviations.

So, what we do is take the observed child's weight of five and subtract the mean for

all children in our sample of 7.1 and divide it by the units

in the standard deviation 1.2 kilograms per standard deviation.

This child is small,

weighs less than the mean, by 2.1 kilograms.

So, the difference in kilograms is negative 2.1.

When we standardize that we find out that this child's weight is 1.75

standard deviations below the sample mean of 7.1 kilograms.

So, the original question asked by the parent,

"How does my child's weight compared to the other children

of the same age" can be rephrased as,

"What percentage of observations in a normal curve are

more than 1.75 SDs below its mean?"

Well, we're going to go to our friend pnorm and actually

pnorm's perfectly set up for this because that's what it's calculating.

We know if we sell the typing pnorm negative 1.75,

what it's going to give us is the proportion of observations and

the standard normal curve than more than

1.75 standard deviations below zero, below the mean.