In this video, we'll describe the importance of understanding your data's structure. Understanding your data structure can help to inform the rest of your data science project, especially the machine learning workflow. A few of the things that are always important to understand, our your data's record level, the shape of your data, the columns that you have, the types of those columns, and the summary statistics of those columns. We'll walk through each of these and why they're important at a high level, the record level of your data is the level of measurement of each row or record in your dataset. For example, if you're collecting data at a frequency where each row of your data corresponds to a single unique day, you're dataset would be said to be at the day level. Note that these levels can be defined by multiple columns at the same time. As an example, imagine each row of that dataset corresponds to a single day, but it's not unique. There might be another categorical column, like country involved. It's common to have daily measurements for each unique country, and in this case, we would say that our dataset is at the day country level. That is, its records or rows are unique by the combination of day and country. It's important to understand what level you're dataset is at. Because the granularity of measurements can inform what types of problems you can solve. For example, if your data's just at the month level, you likely can't solve more granular questions about weeks, days, hours, and so on. The shape of your data is also important and it's more straightforward than the record level. The shape of the data is the dimension set, the number of rows and the number of columns. This information can help you understand how much information you have to work with, especially when it's compared with other information like the record level and more information on individual columns. For example, if you were trying to build a regression model, but you only had a few features or a few 100 rows, you know ahead of time that a simpler model might be a better initial choice. Next, we mentioned the actual columns in your dataset are important to. This tells you what information is available for each individual measurement or row. If we continue our record level example, we know that our hypothetical dataset is at the day level. We don't know what information is collected for each day. If we look at our columns, either the values or the column names, if they're specified, we can learn what information might be included. It could be anything from measurements of weather-related metrics to earn sales income. Because the columns of our dataset contains so much information, it's important that we explore them and their values a bit further. As an example, we might also want to know the types of these columns. Even if our column for day indicates a date, it might be represented in a few different ways. Perhaps a quoted string of the date or maybe a specific date time type. Knowing this information is important because it can help us when manipulating, transforming, an aggregating our data for our analysis. Finally, we also want to know more about the actual values of these columns. We should calculate and review summary statistics like measures of central tendency and spread. This can help inform us which types of machine learning algorithms might be successful and how we might need to cleanse our data ahead of time. While all of this exploration into a dataset is helpful, it's not sufficient in itself. There are situations like Anscombe's quartet were data aggregations might be identical across datasets, but the actual values look very different from one another. A good way to avoid these situations is to directly visualize your data in whatever way suits the measurement level and column types. Join us in the next video for a demonstration of Anscombe's Quartet and using data visualization to learn more about data.