[MUSIC] Remember this box? We wanted to put particles inside it. And we wanted to find out what happens when he heat it up or cool it down? That is we change the temperature. When we expand it or contract it? What happens, that is, when we change the pressure? What happens if I poke a hole? And then drop particles inside it. What happens then? That is when we change the number of particles inside this box. Now statistical thermodynamics is the machinery that allows us to examine these different cases. And in this module, we will learn the foundational principles of statistical thermodynamics. By the end of this module, you should be able to develop the idea of ensembles and then write down the thermodynamic quantity that governs each of these ensembles. Now classical mechanics and classical thermodynamics gives us a way to describe microscopic objects. Classical thermodynamics predicts the conditions for equilibrium and stability for microscopic systems. But it does not give a molecular level description of the thermodynamic behavior. The key to understanding how molecules give rise to bulk behavior is to establish a statistical description of the molecular states. We build this statistical description, by first introducing a few key concepts in statistics and then derive the statistical definition of entropy, which forms the basis for the development of statistical thermodynamics. We begin our discussion of statistics with the simple example of flipping coins. For any given experiment, the probability of getting heads is ph given as half and the probability of getting tails, pt also given as half. Since each experiment is statistically independent from the previous experiments. The probability of getting any specific trajectory, is the product of the probabilities. That is, the probability of exactly getting HHTH, is given as PH times PH, times PT, times PH. Now, let's ask a different question. Say if we need to know the probability of getting NH, head, and NT, tails, regardless of order, we need to include a factor associated with a number of permutations with NH and NT. To illustrate this, let's adopt the view of a random walk where a head is a positive step, and a tail is a negative step. That is the outcome H H T H results in the end position X, given as two steps away from the origin. Now let's count the number of ways M of X,N of ending up at a location X after N steps. Now let's assume that the probability is proportional to this number m. Now we can separate the n steps into positive steps n h and negative steps n t. As the total number n is simply given as the sum of nh and nt. The end location x is given by the difference between nh and nt. The total number of ways is simply the combinatorial number of ways of distributing nh and nt steps into n given the sum of n h plus a. This is given by the standard combinatorial term given by n factorial divided by n h factorial and d factorial. Now we can visualize a few different examples of the random walker for different n. Now, the simple example of coin flipping, or dice throwing, has equal probabilities for all outcomes, that is a flat distribution. Generally, we're interested in the properties of systems, with more complex probability distributions. Now let's consider an experiment, that can have integer outcomes, n ranging from zero to infinity. For a given experiment, the probability that the outcome of that experiment is n is given as P(n) with P(n) greater than equal to zero for all n. Now each experiment is uncorrelated from the previous experiments. Our goal now is to evaluate averages from the distribution and to define several properties of the distribution. To proceed, we use a particular distribution as a model probability distribution, namely the Poisson distribution. Now the Poisson distribution has a parameter A, which is a characteristic spread, in the probability distribution. Now let's define the nth moment of a probability distribution as. Now the Poisson distribution, has a mean that is given as the first moment, of the probability distribution. Now, after a little bit of algebra, we can find that this is simply given as a. Now, it turns out that the higher order moments get considerably more and more difficult to evaluate. To ease these calculations, a popular technique called generating function is extremely useful. Now let's define the generating function G as a sum of BN with each probability [INAUDIBLE] weighted by an exponential of negative NK. Note that K here is a dummy variable. Now why is this advantageous? Well it turns out that once we know the generating function, the moments of the probability distribution can be evaluated using the following simple relation. Now, can we evaluate the generating function for the Poisson distribution? It turns out that it is very easy to carry out the summation leading to a generating function. This allows us to derive the moments of the Poisson distribution quite easily. Now there are many practical examples that are by the Poisson Distribution. For instance, a telecommunications example would be the number of telephone calls arriving in a system. A biological example would be the number deterring mutations on a strand of DNA per unit length. A real life example would be the number of cars arriving at a traffic light. The probability distribution in a sense tells us the amount of confusion that is involved in determining the system. A key concept called information entropy was introduced by Claude Shannon in a seminal paper in 1948. This defiance, the information entropy, as a sum of the probability times the logarithm of the probability. Now since the probabilities are less than one, in order to get a positive number, a negative sign was introduced. Now, this equation, says that the information entropy is maximal, when all the probabilities are equal. That is, the system can be in any one of the states with equal likelihood. We'll end the discussion on statistics, with an extremely important theorem in statistics, called the central limit theorem. Now before we discuss the central limit theorem, let's introduce the Gaussian distribution which is given by the probability density function given by. Now given n independent random variables, x1, x2, etc up to x n. Selected from a probability distribution with variance given as average of x squared. Now let's define y, which is the sum of x1, x2, x3 up to xn divided by n. This now gives the mean of the selected stochastic variables. Now the central limit theorem tells us that for a sufficiently large n. The probability distribution for Y approaches a Gaussian with a standard deviation given a square root of N, regardless of the distribution that we picked for X. For example, let's take a look at the comparison between the discreet random watt statistics associated with flipping a coin and a Gaussian distribution. Now as the number of steps increases the distribution for a discreet line of work ends closer and closer to a Gaussian.