Now, let's look at question 3. We're going to want to determine the following, the number of each species present. So we want to see the count of each one of our three species, and we're going to use the.value_counts method that's going to be available for Pandas series, so an individual column. We're going to look at the mean, median, and the quantiles, quantiles being the 25th percentile, 50th percentile and so on, as well as the ranges, just from the minimum to the max for each petal and sepal measurement. The hints that we have here are that the.describe method is going to have the median, but it won't be called median it will be called the 50th quantile, and it does not have the range, so we'll have to come up with that range ourselves. It does have the min and the max, and we'll see that in just a second. So the first thing we want to do is get the value counts. We're going to specify data from our dataframe, we're going to pull out the species column, and we just call value_counts. We just call that method, and we see that we have 50 of each of the three species. Now, I'm going to break this out step-by-step. Let's actually look at data.describe, and see what the output for the describe function is. So function that's available working with a Pandas DataFrame, we see that it has the counts, the mean centered deviation, 25th, 50th percentile, and so on. We know we only want the different quantiles, as well, the mean, the median, the different quantiles, and the range. So we're going to have to filter this down. But we're working stats_df is equal to this, and we're working within a Pandas DataFrame. So the next thing that we're going to want to do is, I'm going to copy over this line of code, and show you just this output. Let's first just do this, before we set it equal to anything, and we're going to locate the max value, and subtract the min value for each one of these different columns. So we're doing.loc and we're just specifying the row. Sorry, we didn't define this yet. That'll make it clear, and we see the mean for each one. We're going to set that equal to-- we're going to pop in a row by using the. Loc, and we're going to set that equal to the range, and that's going to be the index value. At this point, if we look at stats_df, we see that we've now added on range. The next thing that we do here, is we specify the rows that we actually want to look at, which are going to be the mean, the 25th percentile, 50th, 75th percentile, and range. Then, we're going to use the .loc in order to get those out_fields. So just to show you how that works, we have our out_fields, and we have our stats_df, and we're just going to do. loc, our out_fields, and that's going to specify, only want the mean, 25th percentile, 50th percentile, 75th percentile, and range. We're only selecting those rows, and we didn't specify any columns in our .loc, so we got all of the columns. The last thing that we do here, is we just rename and I'm not going to run this above as it's simple enough. We use the rename function, and we say we want to rename this row, which is 50th percentile to medium, and running all of that, we see that we have our median replacing that 50th percentile, and this is going to be all of the answers that we're asked for in question 3. Moving on to question 4, they asked to calculate the following for each species in a separate dataframe. The mean of each measurements. So we want the sepal length mean, the sepal width mean, petal width and petal length, and then the median for each of these. What we want to do here is, leverage the groupby method that's available for Pandas DataFrames, to group by the different species and calculate the statistic. Then, once you're able to find the mean and median, we also want to show how we can calculate both statistics in a single table, as well as later on we're going to show you how to add on specific ones, so you're not having to specify mean and median for all of your features but rather may be for one of those features you want to defect group by aggregation. So starting with just the mean, we're going to use the data .groupby, we specify that we are grouping by the different species, and we add the mean values for each one of our different features, grouped by species. We can get the median by just doing the .median, and that method is available to us as well. Now, assuming now that we have both of these available, we want to know how can we get both of these in a single table. We're going to show you two different methods of doing so. There's going to be a list of recognized strings that will come up with the aggregation. So we say data.groupby, we're grouping by our species and we're doing .agg. Rather than doing .mean.median, we can do .agg and pass in the type of aggregation functions that we want to use. So here we see mean and median, another option would be any type of aggregation function that you personally want to pass through. So we're going to run this just to see what that looks like. So we get a multi-level column where we have sepal length and then for sepal length, we have both the mean and the median and the same falls for each one of the four different features. What's special about this second version of using the aggregation versus the first is, let's say you wanted the product of all the values. So product is an aggregation that wouldn't be recognized here, we can pass that through and you see that we have an error. But there is a numpy product meaning just multiplying all of them together. I can do np.products. Well, just so we don't have so many columns, np.products, and run this, and we see that it's obviously can be a large number because multiplying them all together, we can pass in, we have more flexibility in regards to the different aggregation methods that we can pass through. So just in case you're looking at this, bring this back to what it was before. Then we talked about what if we want to aggregate different fields differently? We can pass in a dictionary and saying that for each one of our different fields, we may want mean, median, or max. Just to show you what we're doing here, doing a dictionary comprehension, we're saying for field in data.columns. So for each one of our different fields, we want to have a dictionary where the key is that field name, and the values are the list, mean, and medium. As long as that field does not equal to species, we're not going to get a numerical summary of the species, and that's going to be the field we want to group by as well. So if I look at agg_dict as it's currently formed, we now have sepal_length and then mean, median, and then each of the four features, and then the list of mean and median. The next thing that we do is we replace petal_length with just the max. So now that we have agg_dict defined, we're just going to set that equal to the max and we see that we are replacing that value before petal_length was mean and median. Now, we see it's just equal to max. We run pprint, that's just going to be a prettier version of printing out outputs. Then finally, we're going to use data.groupby, we're going to group by the different species and then we're going to use the.agg functionality, and we're going to pass in our dictionary that'll tell us for each one of the features, how do we want to aggregate it. We'll get the mean and median for each feature, but for petal_length, we will get the max. We see that here, that for petal_length we got the max, and then we got the mean and median for each of the other four features. Let's go into question 5 and then we'll take a break before going into question 6. We're going to start here with Matplotlib and we want to make a scatter plot of sepal_length versus sepal_width using Matplotlib, and we're going to label the axes and give the plot a title as well. So as we discussed in lecture, we want to import matplotlib.pyplot as plt. That's going to be just the most common practice when working with Matplotlib. Then we have to run this magic function, %matplotlib inline for any of our output four graphs to show up within our Jupyter Notebook. So we run that. We're going to set our axes, which is just going to be essentially the graph or the container of our image, is just going to have the different axes. We also have the figure, which will just have all the information regards to our graph, but we can focus right now in just running plt.axes. So we set x equal to our axes which is just our box, which is now empty. Within our box, we want to enter a scatter plot, so we just do ax.scatter. We pass in as the x-values data.sepal_length, as our y-values data.sepal_width. When we're working with axes, we can say ax.set and pass in a bunch of arguments in regards to labeling our x-axes, our y-axes, as well as our overall title, which is what we do here with the xlabel, ylabel, and title. I run this and you see that we have our full title. We have the sepal_width as our ylabel and our sepal_length as our xlabel. So we're going to take a break here, and when we come back, we'll jump into question 6 and talk about making a histogram.