The focus of this week is how are you going to apply these machine learning algorithms to uncover hidden insights in a particular data set. We're going to look at, what you need to do and what you need to be thinking about on any machine learning and analytics process. There's a series of steps that you need to go through and we're going to learn about these steps. So, there's testing, there's validation. We're going to learn about how bias and variance in your data set can affect the outcomes of your machine learning algorithm in your analytics process. We learn about learning curves as a way of measuring the effectiveness of a particular algorithm. We learn about a process called cross-validation, for creating data sets and training your algorithm or algorithms. Get into what do we get out of studying big data sets? What do we learn from that? Processing this data, visualizing this data, and predicting the future, which is oftentimes what we want to do with machine learning and predictive analytics. I have an example, I'm going to share with you of hypothesis. I posed proposed and I'm still working on it. I started it over a year ago and I've been so busy over the last year. I haven't had any time to get any further with it. So, but I'll present my results to you. Learning outcomes for this segment is to understand what big data is and why we want to look at it, and be able to describe the testing and the validation process, understand how bias and variance can affect your results. Be able to describe the cross-validation process, understand the importance of properly preparing your data. Be able to describe what smart data is and the characteristics of good data. I'll get into that and talk a little bit about that. Learn ways of visualizing data. So, this is from Wikipedia. Big data is a term for data sets that are so large or so complex that traditional data processing applications software is inadequate to deal with them. Challenges include capturing all of this data, storing all of this data, we looked at the Lustre File System and the Hadoop File System which are designed specifically to store enormous data sets. Analysis of that data. Data curation, that's taking care of the data. Think about it like a pet maybe, got to take care of this data. A data might age, you might need to get rid of some data after a while. It's interesting working in a storage business because it looks to me like we- and I probably said this before in this class. We as human beings and our planet seem like we don't want to throw anything away. So, we just keep creating and collecting more and more data. Searching that data, sharing it, transferring it, visualizing it, making queries on it, and updating information, privacy policies, and so forth. There's many aspects to this. This is the study of how to analyze these huge amounts of data using machine learning algorithms and this is the aspect that we're going to focus on this week.