Welcome to module four. Every November, Memorial Stadium hosts the Illinois State High School football championship games. High schools are assigned to divisions. And as a result, there are a number of championships games played over a two-day period: one championship game per division. The process by which schools are assigned to a division is based in part on the size of the community that the school serves. Smaller communities have fewer people, and thus are less likely to be able to compete against bigger schools which serve bigger communities. But some schools are small and private and draw students from a wider geographic area. To solve the division problem, descriptive statistics can be used to create divisions that will be fair and competitive. This module focuses on these techniques and the general problem of descriptive statistical analysis. This includes, learning how to migrate from a spreadsheet like excel to a programming solution using Python. First, you will learn how to perform many common excel tasks in Python by using the panda's data frame. Second, you will learn about the numpy module which provides support for fast numerical operations. This module introduces the numpy array data structure which is similar to a list, but holds only numerical data of the same type like all integers or real numbers. Next, you will learn about descriptive statistics. This includes how to calculate important measures from a set of values such as a list of students. Quantities like location which is a typical or average value from a set of data. Spread, which states how far data are spread around that typical value or location. Shape, which quantifies whether the data are uniformly spread or skewed in some enter. A distribution, which specifies how ranges that contain different amounts of data for example, 10% or 75%. We could write our own functions to calculate these quantities, but it is usually better to use existing functions. The reason is, these functions usually already worked and are error free. They also were probably optimized so they would run fast, and they also are generally well documented. Finally, if you're going to share your code with others, they'll probably have that standard function as well. Finally, in this module we're going to introduce more advanced features in the panda's module, to make working with large heterogeneous data easier. This includes; masking data to make complex selections from a data frame, stacking or combining data from different data frames together, grouping or aggregating data so that you can compute aggregated statistics, and pivot tables which are a powerful data summarization technique. By the end of this module, you'll be able to perform basic statistical analysis and analytic summarization of data by using the common modules in the Python data analytics stack. Good luck.