Now that we are familiar with some commonly use terms to describe data, let's look at what data exploration is and why it's important. After this video, you will be able to explain why data exploration is necessary, articulate the objectives of data exploration, list the categories of techniques for exploring data. Data exploration means doing some preliminary investigation of your data set. The goal is to gain a better understanding of the data that you have to work with. If you understand the characteristics of your data, you can make optimal use of it in whatever subsequent processing and analysis you do with the data. Note that data exploration is also called exploratory data analysis, or EDA for short. How do you go about exploring data? There are two main categories of techniques to explore your data, one based on summary statistics and the other based on visualization methods. Summary statistics provide important information that summarizes a set of data values. There are many such statistics. Many of them you have probably heard of before, such as mean, median, and standard deviation. These are some very commonly used summary statistics. A summary statistic provides a single quantity that summarizes some aspects of the dataset. For example, the mean, is a single value that describes the average value of the dataset, no matter how large that dataset is. You can think of the mean as an indicator of where your dataset is centrally located on a number line, thus summary statistics provide a simple and quick way to summarize a dataset. Data visualization techniques allow you to look at your data, graphically. There are several types of plots that you can use to visualize your data. Some examples are histogram, line plot, and scatter plot. Each type of plot serves a different purpose, we will cover the use of plots to visualize your data in an upcoming lecture. What should you look for when exploring your data? You use statistics and visual methods to summarize and describe your dataset, and some of the things you'll want to look for are correlations, general trends and outliers. Correlations provide information about the relation took between variables in your data. By looking at correlations, you may be able to determine that two variables are very correlated. This means they provide the same or similar information about your data. Since this contain redundant information, this suggest that you may want to remove one of the variables to make the analysis simpler. Trends in your data will reveal characteristics in your data. For example, you can see where the majority of the data values lie, whether your data is skilled or not, what the most frequent value or values are in a date set, etc. Looking at trends in your data can also reveal that a variable is moving in a certain direction, such as sales revenue increasing or decreasing over the years. Calculating the minimum, the maximum and range of the data values are basic steps in exploring your data. Determining outliers is a also very important. Outliers indicate potential problems with the data and may need to be eliminated in some applications. In other applications, outliers represent interesting data points that should be looked at more closely. In either case, outliers usually require further examination. In summary, what you get by exploring your data is a better understanding of the complexity of the data so you can work with it more effectively. Better understanding in turn will guide the rest of the process and lead to more informed analysis. Summary statistics and visualization techniques are essential in exploring your data. This should be used together to examined a dataset. In the next two lectures, we will look at a specific methods that you can apply to explore your data.