Welcome back for the final two questions for our EDA lab. For question nine, we're going to want to start off by making a single box plot where the features are separated in the x-axis, so sepal length, sepal width, etc. And then for each one of those features, we also want to separate into three different box plots specified by each of the different species, setosa versicolor and so on. So I would say look at the Seaborn boxplots documentation, that'll give you a clear understanding of the different arguments that were about to go through. But as we go through it, it should become clear as well. Also note that Seaborn is very picky about data format for this plot to work, the data frame is going to have to be in long format rather than wide format. What that means is that each row is the and have to contain a single data point. So we see the example here, we have a species, a measurement type, and the measuring value, so you can only have one value for each row. And we're going to use the stack method, we have the documentation linked here as well, and that's worth looking into also for future use. So let's break down this code that we have here in the solution. So I'm creating a cell above, let's start off with a reminder of what our data currently looks like. And we see that our sepal length, sepal width, petal length, and petal width are each its own row, we're going to have to move those all into the column. So the way that we're going to do that, first we have to make sure that the species isn't included as one of those columns, which is why we're setting it as the index. So we're going to see rather than it being, an actual column, when I run this, it's now an index, and that's this data.set index species. The next step is going to be the stack method. And this will be clear once we run it, I say dot stack. I'm actually going to run the next piece of code to put into data frame because it will be easier to see. .to_frame, and you see here that it took all the four columns, stack them into the index, and now we have a multi level index. Where for each species we're going to have the sepal length and then the value for that combination of species and sepal length, that was once in the row here. Sorry, and the row let's see the original, just to see how that all stacked. So you see this first value which is a combination of setosa and sepal length is 5.1, and that's how we get this 5.1 over here. And then the sepal width is, again in that first row became the second row for the combination of setosa and sepal width and so on and so forth. So we see the first four values is the first row, and then the next four value starting at 4.9 for setosa and sepal length is going to be the next four rows and so on. So we took a data frame that was once a 150 rows and made it 600 rows in that long format. Now we want them each to be their own column, so we run reset index, and let's look at what this step does, now, we see them each as their own column, so they're no longer in the index. And then the final step is just going to be to rename those two useful column names. So we going to say rename my columns, right now we have zero and we want to rename 0 as size, and we want to rename level 1 as measurement. And then we see as we do here, when we just look at the head that we now have this long format, that's necessary when doing the Seaborn's box plots. So finally here when we use the sns.set_style, set_context, and set _palette. All we're doing is changing the look of our Seaborne or setting the look. You can look at the documentation for each one of these different functions and just see that it's just going to change the way that outputs. So it's not necessary to run these different lines of code in order to get the output that we're eventually looking for. So we're going to set the figure using plt.figure, and now the figure is going to be all of our information about our actual plot. And everything that we've been working with so far has been in the axes, which is just going to be that bounding box. So this is going to tell us the size of that box by using the figsize argument. So figsize equals 6 comma 4, and we've seen that used before and we pass that in as an argument before, and then finally we can use the Seaborn box plot function. Our X values are going to be different measurements. So along the x-axis, we're going to have sepal length, sepal width, petal length and petal width, and our y-axis is going to be the size. So the actual values that were looking at and I'll give us the range of the different sizes. And then we're going to use hue in order to split up the x-axis by species. And then the data that we're want to use is this new plot data that we had created just before. So we run that, and then we see this output of sepal length. And for the sepal length, we have the setosa, the burst of color and the vegenica, and then the same thing for sepal width, petal length, and petal width. That we have the box plot for each one of the different features and split by each one of the different species. And then finally for question 10, we're going to want to make a pair plot with Seaborn to examine the correlation between each of the measurements. We saw this before in the slides that we went over, and as is mentioned here in the notebook is just going to require a single line of code for a very complex plot, and that's the power of Seaborn. If we wanted to do this in something like a Matplotlib. It would require multiple lines of code to create something that probably wouldn't be nearly as nice, so it would take a lot longer to create. So all we're doing here, again the set context is just going to be a way that we can set the way that's going to output or actual plot. We see here that we have set context talk, a great set context notebook and many others, and again you can just look at this functionality online. We're going to run SNS, this is the one line of code, SNS for c1.pairplot. And all we want to do is pass in our data, and it will look at all the different correlations between each of our features as well. Yeah, between each one of our features and then we're going to split it by species. So I'm going to run this just see what it looks like. In the last piece of code before this actually outputs is that when we do this, we set this equal to plot. So we have our object that's equal to our pair plot. It has this attribute of legend, if we don't want it to be telling us that green is setosa or someone and so forth, we can remove that Legend. And then here we see the plot that gives us, if it's the same. So sepal length, by sepal length our scatter plot would have been a perfect straight line diagonally. So instead of doing that because that's not going to give us any information. It shows us the histogram to give us the distribution across each of the three species. And then for each one of the other multi, each one of the other correlations, we see the split according to the different species. And where able to see the scatter plot for each one of those in order to see the relationship. And most of the time as one of those features increases, the other one increases as well, which makes sense. If you have a larger petal length, you should also have a larger petal width which is why you have a kind of very strong upward trajectory. All right, that's it for this lab, and I look forward to seeing you back in lecture.