So far in our development of signal processing theory, we have assumed that all signals are deterministic known in advance. We have written things like x of n is equal to sine of 0.2n. So we described the signal analytically, and in so doing, with the simple formula, we cover all the sample values. But interesting signals in the practice of signal processing are not known in advance. For instance, here I am talking to you and the samples that represent my voice signal are not going to be known in advance, otherwise it would be pretty boring to listen to what I'm saying. Maybe there's anyway. But even if we don't know the exact values of the samples, we usually know something about the signals. For instance, we know that my voice is a speech signal. So this signal will have certain properties although the exact values of the samples are not known. So here we're going to perform a paradigm shift in the way we think about signals. Instead of considering them deterministic objects, we will look at them in a probabilistic sense. We will consider them stochastic signals that we will describe in terms of their statistical properties. Now the question is, can we do signal processing with random signals? The answer is yes. We will see that everything we have developed so far both in terms of intuition and in terms of algorithmic procedures, still applies to random signals. On top of it, we will be able to introduce a new chapter in signal processing which is called adaptive signal processing. So we will use the statistical properties of the signals, to change the way our filters process the information. But first let's start with the brief recap of the main concepts in probability theory. We will not be exhaustive, we will just point out the fundamental ideas and concepts that will be necessary in the rest of this lectures. So a random variable is defined as a mapping from a random event, to a value that belongs to the set of real numbers. So for example, imagine tossing a coin, you can map heads to the number 0 and tails to the number 1. In this case, you have a discrete random variable, that encodes the face of the coin that lands on top. You can do the same with the die. If you have a cubic die with six faces, you have another discrete random variable that takes values from 1-6. You can have a continuous valued random variable. You can imagine for instance measuring the voltage across a circuit, and the continuous random variable encodes this unknown voltage that you're measuring. So a random variable and codes and event, but we're interested in measuring the probability that an event occurs. To do so, we define the cumulative distribution function or CDF for short called Fx of Alpha, and this measures the probability that the random variable takes values less than or equal to Alpha. Alpha is a real number, again. The key property of the CDF is that if you take the limit of this function for Alpha that goes to infinity, this limit is equal to one. That means that as you span all the possible values that the random variable can take, the total probability converges to one, which is the full a 100 percent probability. You can differentiate the CDF, and obtain the probability density function. This is a function that for a value Alpha of the argument, gives you the point wise probability that the random variable takes on the value of Alpha. If you integrate the probability density function, you go back to the cumulative distribution function. So let's make an example. Suppose you want to measure the temperature of melting ice. So you take a beaker full of ice, and you put a thermometer in it, and you repeat this experiment several times. So the continuous random variable that models this event is the measured temperature that you read on your thermometer. You know that this should be zero degrees Celsius. But many things can affect the experiment, changes in barometric pressure, whether the water is purified or not, whether the thermometer has some inaccuracy, so on so forth. So if you repeat these experiments many time, the value for the random variable, the measures of temperature of melting ice will never be exactly zero, but will vary and sometimes take positive values some time take negative values. If you plot, the cumulative distribution function for this temperature variable, you would get probably a curve like this. What this means is that it's very unlikely that your thermometer will measure something like minus five degrees Celsius. As you move up the scale you see the zero corresponds to 50 percent probability. A half of your measurements would be slightly below zero, and a half of your measurement will be slightly above zero. But as you see, as you go up in temperature, the curve flattens out, which means that these temperatures are again very unlikely. This is easier to see if you differentiate the CDF and turn it into to a PDF, a probability density function, and you get the standard Gaussion distribution that is the norm in repeated physical experiments. The most likely temperature that we can measure is zero, and as we move towards higher or lower temperatures, their probability decreases very fast. If we move on to a discrete random variable like a die toss, the cumulative distribution function is always piece wise constant because there are no values that the random variable can take except in the set of discrete possible values. So here for a die you have six possible outcomes, and so you have six steps in the CDF. You can still differentiate such a function if you use the Dirac delta formalism. You will get a Dirac delta at each discontinuity point, and the amplitude of Dirac delta here is one over six. Now remember, if you integrate the PDF over an interval, you get the probability that the underlying random variable takes values over that interval. So if I integrate between 2.5 and 4.5, for instance, I'm actually just considering the discrete events of the die showing the face number 3 or number 4, and of course the result would be one third. The expectation of the random variable is a way to express concisely some information about the random variable itself. What we do by taking the expectation of the random variable is to compute its mean value. Now there's a very important theorem in probability theory, the expectation theorem, that says that if we take the expected value of a function of a random variable, we can compute that just by knowing the probability density function and computing the integral from minus infinity to plus infinity of the function of a dummy variable x that we used for the integration times the PDF of the random variable itself. This will be very useful when we compute for instance the moments of a random variable. These are concise scholar descriptors of the random variable. Order number 1 you have the mean, order number 2, you measured the spread of the random variable, and higher order have different applications that we will not really be concerned with. So the raw moments are the expected values of increasing powers of the random variable itself. For n equal to 1 you have the mean that we have seen before. You also have the central moments which are moments of the random variable centered around its mean. By the expectation theorem, you can compute this by taking the integral of the dummy variable minus the mean of the random variable to the power that corresponds to the order of the moment you're computing times the probability density function. A very special case is the variance of a random variable that measures how spread the PDF is around the mean of the random variable. For example, a Gaussian random variable like the one we saw before in the experiment about the temperature of ice is defined by two parameters. The mean that centers the Gaussion curve at a given point on the real axis, and the variance that measures the spread of the bell curve. These two parameters enter explicitly in the definition of the PDF of the random variable as you can see here. If you have a uniform random variable, so a variable that is continuous values, but takes values over a certain interval from A to B with uniform probability for all possible values on the interval, then you have that the mean is of course the midpoint of this interval here which is A plus B divided by 2. The variance is computed as B minus A squared over 12. This is an important result that we will see again when we talk about quantization in a future lecture. Of course, having a single random variable is hardly any good. So we probably will have several to play with, and once we have several random variables we are interested in the relations between them. Of course, we would like to keep things as simple as possible. So it would be nice as in the case of the moments to have some simple scalar descriptor of the interplay between two random variables. So the first thing we want to compute is what is called a cross correlation. A measure of how intertwined two random variables are. This is computed as the expectation of the product of the two random variables. The covariance is very similar to the cross-correlation, but each random variable is recentered around its mean. If both variables are zero means, then the cross correlation and the covariance coincide. Now to compute either cross correlation or covariance, we need to know the joint probability density function of the two variables. Now this could be very complicated to obtain or estimate, if the interplay between the variables is complicated. So we always try to shoot for the simpler cases, and we have two such case. If the variables are uncorrelated, then their cross-correlation is simply equal to the product of their means. This implies that the two variables have no linear relationship between them. A stronger condition is the independence of the variables. That means that the joint probability density function is actually given by the product of the two independent probability density functions for the two variables. This means that there is no relationship at all between the variables.