Let me elaborate on that point. It's in fact in the discrete case where x

can only take so many values one, two, three, four, then this definition of

conditional probability is exactly the definition that we used from events were A

is the event that x = x, and B is the event that y = y.

So there's no confusion. It exactly agrees with our definition of

conditional probability. The continuous one is a little bit harder

to kind of motivate why this is the definition.

The event that x takes on a specific value or y takes on a specific value has

probability zero for continuous random variables and so, that kind of fails our

basic premise from conditional probability associated with events that the

probability of the event that we're conditioning on has to have probability

greater than, than zero. Now, note we're talking not about

conditional probabilities, we're talking about the construction of the conditional

densities which govern the behavior of conditional probabilities.

So, we haven't violated that rule from earlier but it still kind of seems to

break the spirit of the rule and how do we get at this idea?

How can we have a meaningful definition of the probabilistic behavior or a random

variable, given that another random variable takes on a specific value.

Well, here's the motivation that I like. So, imagine if you define the event, A

that the random variable x is less than or equal to a specific value little x and the

event B is that the random variable y lies in this interval from y to y plus some

small amount, say epsilon. Then now A and B are events that have

positive probability. And we can apply our standard definition

of conditional probability to talk about the probability of the event A given that

the event B has occurred, right? That would just follow from our standard

definition. So, actually let's formulate this.

So, the probability A given B is the probability of x being less than equal to

little x, given that y is in the set y to y + epsilon.

And then now in this case, nothing has probability zero.

We can just directly apply the probabilistic formula.

And I don't think this is terribly important for this class.

I just wanted this argument be here for those who want to see it.

But then. You can just follow through the arithmetic

it's not the calculus here, and get that basically this construction.

Yields the conditional distribution function associated with the x.

Given that y = y, as we let epsilon get smaller and smaller.

So as the conditioning event gets closer and closer to y conditioning on it being

the specific value y. We limit to, conditional distribution

function associated with x. And then, remember that density functions

are derivatives of distribution functions so if we just take the derivative of this,

then we get the conditional density function.

So we can see right here that if we differentiate this conditional

distribution function, we get exactly the definition of the conditional density that

we gave you before, f(x, y) / f(y). So if you're interested in this at this

level, then you can go through those arguments carefully, and to be fair, these

only cover. The definition in the continuous case when

we have differentiable distribution functions.

But this is more than enough for our case. If you're interested in it at a deeper

level even than this, where you have mixed continuous and discrete densities, then

you can take an advanced probability course somewhere; but, for our purposes,

this is enough. And so just to summarize, we have the

conditional probability definition associated with events that kind of

governs all of our thinking about conditional probabilities and that's the

probability of A given B is the probability of A intersect B divided by

the probability of B and then in the event you are talking about random variables

what we want talk about the probability of a random variable x, given that the random

variable y has taken on a specific value. It's the joint density or mass function

divided by the marginal. And it has a nice sort of parallel with

the probability associated with events and here we've gone through the arguments to

show how we get from these statements about events to this definition for mass

functions and density functions. So conditional densities actually have a

very nice geometric interpretation. So if you have a joint density f(x, y)

that's a surface. F yields the Z value, and XY is the plain.

So f(x, y) is a joint density. It's a surface, and it's volume under the

surface has to be one for it to be a joint density.

Well what is it mean to get the conditional density of x given that y

takes a particular value. The event that y takes a particular value

that's sort of like a plane at the point, let's say y is five, at the point y equals

five, that's a plane, and that plane slices through this surface and yields a

function. That function is just f(x, y) evaluated at

the point five, f(x, five), okay? So we have this surface.

We have this plane. The y = five plane that cuts through the

surface and then we have the function that is on that plane at f(x, five).

And that is exactly the conditional density, with the exception of now it

doesn't integrate to one. So we have to normalize it by something

that integrates to one. Well, that' exactly what we divide by

there, f(5). Let's go through a specific example.

We have f(x, y) = ye^-xy - y. For, x and y both greater than zero.

Now the marginal density associated with y, let's just perform the integral.

We integrate from zero into infinity, of the joint density function over x because

we want the, marginal associated with y. And you can perform the integral.

It works out to be e^-y. And then our conditional density then f(x)

given y, is the joint density, f(x, y) / f(y).

So just churn through the calculations and you get ye^-x<i>y.< /i> And so if you</i>

wanted to know what's the conditional density, the governing behavior of the

random variable x, given that y is, say, three, then that density.

Would be 3e^-x<i>3.< /i> Okay, so you just plug in y = three.</i>

So, now this function, if you plug in any possible value of Y, this function will

now give you the associated density function for the random variable x

conditioning on the information that y takes on that specific value.