[MUSIC] Descriptive statistics focus on graphical, and numerical procedures, which are useful to summarize and process the data. Descriptive statistics describe, show, or summarize, data in a meaningful way. Descriptive statistics are very important, because they help in visualizing what the data tells. By contrast, if we simply present the wrong data, sometimes, it might be very hard to visualize what the data show. However, and this represents the main difference with inferential statistics. They do not allow us to make conclusions beyond the data we analyze. They do not allow us to reach conclusions, regarding any hypothesis we might have made. They are simply a way to describe our data. Inferential statistics focus on using the data to make predictions, forecasts, and estimates, in order to make more informed decisions. Inferential statistics are techniques that allow us to use the samples, under analysis, taken from the population to make generalizations about the population from which the samples were drawn. It is therefore, important that the sample represents the population. We will see, lecture four, what the methods of the inferential statistics are, which are the estimation of parameters, and testing of statistical hypotheses. A variable can be defined as a specific characteristic, such as age or weight of an individual or object, and it can be classified in various ways. If we classify a variable, based on the type and amount of information contained in the data, we can speak about either categorical, or numerical data. Why it is important to properly classifying the data? Because it is the first condition to select the correct statistical procedures needed to analyze the data. Categorical variables produce responses that belong to groups or categories. For example, responses to yes, or no, questions are categorical. Do you smoke, or do you own a TV, are limited to yes or no answers. Categorical variables also include questions on gender, or marital status, and a range of choices, such as, strongly disagree, to strongly agree. For example, consider a faculty evaluation from where students are to respond to statements, such as the following. The instructor in this course was an effective teacher? One, strongly agree. Two, slightly disagree. Three, neither agree nor disagree. Four, slightly agree. Five, strongly agree. Numerical variables constitute a set which refers both to discrete and to continuous variables. A discrete numerical variable may have a finite number of values, and, usually, produces a response that comes from a counting process. Examples of discrete numerical variables may include the number of students enrolled in academic course, the number of financial instruments in an investor's portfolio. A continuous numerical variable may take on any value within a given range of real numbers. Generally, a continuous numerical variable arises from a measurement process, rather than an accounting process. An example of continuous numerical variable could be the time that a student spends in the library. What is important to notice here is that in all these cases, the value could deviate within a certain amount, and it will depend on the precision of the measurement instrument used. Categorical variables can be described by using frequency distribution, tables, and graphs, such as, bar chart, pie charts, and Pareto diagrams. Which are also commonly used to describe data collection from surveys and questionnaires. A frequency distribution is a table used to organize the data. The left column called classes or groups, includes all the possible responses on a variable being studied. The right column is a list of the frequencies, or numbers of the observations for each class. A relative frequency distribution is obtained by dividing each frequency by the number of the observations, and multiplying the resulting proportion by 100%. The classes used to construct frequency distribution tables of a categorical variables are simply the possible responses to the categorical variable. Both bar charts and pie charts are usually used to describe categorical data. If our attention is focused on the frequency of its category, then we will probably draw a bar chart, where the height of the rectangle represents each frequency. If our attention is focused on the proportion of the frequencies in each category, then we will use, more likely, a pie chart to represent the division of a whole into the parts which constitute it. While the circle, which is the pie, represent the total, the segments, instead, which represent the pieces of the pie cut from its center, represents shares of that total. In a pie chart, each segment is proportional to the corresponding frequency. The Pareto diagram is used by who need to identify major causes of problems, and attempt to correct them quickly, with a minimum cost. The Pareto diagram is a special bar chart, which displays the frequency of the effect causes. This diagram was proposed by the Italian economist, Vilfredo Pareto. He noted that, in most cases, a small number of factors are responsible for most of the problems. As we can see in our figure, the bars in this kind of diagram, from left to right, emphasize the most frequent causes of defects. In fact, the bar at the left indicates the most frequent cause, and the bars in the right indicates the causes with decreasing frequencies. Pareto's principle is also called the 80-20 rule. For example, a member of a team might think that he 80% of the work on a project was done by only the 20% of the team members. Time series data are data that are measured as successive points in time. They refer to the same object, which is observed, that is measured, over time. For example, financial data can be measured every day, every hour, every minute, and so forth. The graph of a time series data is called a line chart, or time series plot. More formally, we can say that a time series is a set of measurement ordered over time. In a time series, the sequence of the observation is very important. In the time series plot, the series of data is plotted at various time intervals. We measure time along the horizontal axis, and numerical quantity, of the interest along the vertical axis. Each measurement yields a point on the graph. Each point is an observation. The time series plot is given by joining all the points next to each other, in terms of time, by a straight line. Examples of time series data are, the annual interest rates, the daily closing prices of shares of common stock, and the daily exchange rate between various world currencies, and so on.