Proposition 2.5 is a very powerful result that will be used over and over again throughout this course. It says that for random variables X, Y and Z, X is independent of Z conditioning on Y, if and only if p(x,y,z) can be factorized as a(x,y) times b(y,z). For all x, y, and z such that p(y) is bigger than zero. Here a(x,y) and b(y,z) are not necessarily probability distributions: a is a function that depends only on x and y, and b is a function that depends only on y and z. We are going to prove Proposition 2.5 in two parts. First we prove the only if part. Assume that p(x,y) takes the form in definition 2.4. For all x and for all y such that p(y) is bigger than zero, with that a(x,y) equals p(x,y) divide by p(y). For all y such that p(y) is bigger than 0, and for all z, we let b(y,z) equals p(y,z). Thus we have shown that p(x,y,z) is equal to a(x,y) times b(y,z) and therefore, the only if part is proved. We note however, that the choice of a(x,y) and b(y,z) is not unique. For example, one can as well choose a(x,y) equals p(x,y). and b(y,z) equals p(y,z) divided by p(y). Next, we prove the if part of Proposition 2.5. Refer to definition 2.4, for X and Z being independent, conditioning on Y. First, we assume that p(x,y,z) can be factorized as a(x,y) times b(y,z) for all x, y, and z such that p(y) is bigger than 0. Let us highlight this assumption. Then for such x, y and z, we have p(x,y) equals summation z p(x,y,z) where p(x,y,z) is equal to a(x,y) times b(y,z) by our assumption. Now a(x,y) does not depend on z, so it can be moved outside the summation and we have a(x,y) times summation z b(y,z). Similarly, p(y,z) is obtained by summing over all x p(x,y,z) where p(x,y,z) is equal to a(x,y) times b(y,z). Again, p(y,z) does not depend on x. So we can move it outside of summation to obtain b(y,z) times summation x a(x,y). Furthermore, y is obtained by summing p(y,z) over all z. And here, we use the expression for p(y,z) that we have just obtained, as highlighted in blue. Where this expression and summing over all z, we obtain summation x a(x,y) times summation z b(y,z). And by our assumption, p(y) is bigger than zero, so it implies that neither of these summation is equal to zero. Now consider the fraction p(x,y) times p(y,z) divided by p(y). We are going to use the expressions that we have obtained from p(x,y), for p(y,z), and for p(y). Substituting these expression, we obtain this rather complicated fraction which in fact is very easy to obtain. What we have done is that we move this expression over. We moved this expression over and then we moved this expression over. Now, we observe that summation x, a(x,y) upstairs, can be cancelled with the summation x, a(x,y) downstairs. Now, this is possible, because we have seen that this summation is not equal to 0. At the same time we also observe that summation z b(y,z) upstairs can be cancelled, with summation z b(y,z) downstairs. And what is left is a(x,y) times b(y,z) which by our assumption is equal to p(x,y,z). And so we've shown that p(x,y,z) is equal to p(x,y) times p(y,z) divided by p(y). And this is precisely the first case in definition 2.4. Finally, we consider the case of p(y) is equal to zero. For this case we observed that p(x,y,z) is lower bounded by zero and upper bounded by p(y), which by our assumption is equal to zero. Therefore, p(x,y,z) is equal to zero and this is precisely the second case in definition in 2.4. Hence, we have shown that X is independent of Z, conditioning on Y, according to definition 2.4