Okay, now let's briefly talk about some of the data visualization libraries that are available to us. We have matplotlib which will be the main library used to create plots and graphs in Python. And it has a lot of flexibility and features that you can tweak and definitely very useful and the foundation for doing visualizations within Python. Now, Pandas, which we've seen so far actually offers a convenient wrapper function around the matplotlib library. So plots can be created pretty easily on fly compared to working with be robbed version of matplotlib, and it's very convenient, but it's less flexible and less powerful compared to working with the original version of matplotlib, but often it's good enough. And then finally Seaborn is a library built on top of matplotlib as well, and it creates very pretty plots. Has shorthand methods to create statistically interesting plots such as linear model plots, pairwise correlation plots and many others, which would otherwise take a long time using just matplotlib. Also once imported Seaboard preferences are Incorporated by matplotlib. So if you import Seaborn and then go back to using matplotlib, it would look like you're still within the visualization format that Seaborn has. Now let's dive into actually creating some of these plots will start off here with a basic scatterplot with matplotlib, whenever we're working with matplotlib, we're going to actually probably importing matplotlib.pyplot. And we're going to bring that in as PLT this sub-module pyplot, and whenever we want to use matplotlib in our Jupyter notebooks something to note which often trips people up. Is that before you start doing any plotting for those plus to actually show up in your notebook, you have to call a line, which we'll see in our notebooks that we run percentage matplotlib space in line, and that will allow for the plus actually show. So here we've imported matplotlib.pyplot as PLT, we're then going to run plt.plot, and what do we want to plot? We want to pot the sepal length, so we're pulling in that column sepal length, and we are also pulling in sepal width remember using a scatter plot. So we're looking at the relationship between the two, ls here's going to stand for line style, and we want there to be no lines. So we set it equal to blank and our marker equals, we're making this an o which is just a circular dot, and there's other options such as the carrot will be triangle, x will be x's and so on. And this is what our output will look like given the code that we see here on the left, so we see the scatter plot between sepal length and sepal width. Now what about working with scatter plots with multiple layers, maybe want to differentiate it by color? We can do them one after the other as long as we don't reinitiate our plot, every single plot that we call using a plt.plot will plot one and then the next so here we're using plt.plot. And again, our X values are going to be sepal length our y-values sepal width our line styles blank again our markers o. And this time we're going to pass in a label, and our label here is equal to sepal and then we're going to do the same thing for the petal length and petalwidth, passing in those arguments that we mentioned before. And this time we're labeling it pedal, it'll automatically come up with different colors. And if you call plt.legend after this, you'll end up with a plot you see on the right or you have the green is indicating the scatter plot for the petal length and width, and the blue indicating the sepal length and the sepal width. We also have the option of plotting a histogram, here we'd only want to pass in one column. We can pass an argument for the number of bins that we'd like and if we run this we'll see the output that we see here on the right that gives the distribution of this actual column. So some ways that we can customize plots, here we're using the plt.subplot so that we have fig in ax separate. This is just a different way that we're able to plot our difference functions where we have this is a more object-oriented form where the figure will be. Or "fig" here is going to articulate essentially all of the major basics around our plot and ax as going to be the actual box itself where we're putting in our plots. And generally speaking if we use this fig, ax syntax, we can replace what we had for PLT with somethings very similar just calling acts rather than PLT. So here we're calling a horizontal bar plot, and we want it to be values between zero and nine calling np.arange10, and we're going to look at the sepal width just through the first ten values. So we're going to look at the different widths and we're going to have a bar plot to represent each one of those widths. Now we can put specific y ticks so that specific labels specific x labels as we'll see here, using the x functionality. We're going to use set_yticks, and we say we want our yticks to be between 0.4 and 10.4 counting by 1. This will allow for each tick to actually be in the middle of the bar rather than calling it 0 through 10, which would be the defaults. We can call our y tick labels, and we just those to be the values 1 through 10 right in Python it won't include the last value but we'll include the first. And then we're going to set using ax dot set they can actually pass in a lot more of the arguments similar to y ticks and y tick labels and we see that with the x label. So we're calling the xlabel, we're actually calling that the string x label. We're calling the y label the string y label and our title is title, and we see the output here on the right. Where we have the title like we mentioned y label and x label values 1 through 10, and each of those representing the different sepal widths for first 10 values. Now let's talk about how we can use the Panda Syntax for plotting. So we're going to start here with a group by object, so we're going to group by starting with our Panda's dataframe. We're going to group by are different species by our target column, and we're going to get the mean for each one of our different features, pedal length pedal with sepal length and sepal width. We're then going to call .plot the default .plot is just going to be a line graph as we saw earlier. We just set the line style equal to blank to remove those lines, here we won't do that. We want to have a specific color for each one of our different features. So for example sepal length, sepal width etc we'll call it red we'll have the colors red, blue, black and green. For our labels, which will see at the bottom bar graph in a second, we're going to set the font size equal to 10, and the figure size or the size of the figure itself will be 4 by 4. And what does that look like? We see here the output where we have our x values being the different species starting with setosa, then versicolor then virginica. And we can see the average value for each one of these starting with sepal length and with red, and then we have several with in blue according to the colors that we specified earlier. A very useful feature of Seabourn is going to be the parent plot. So we're going to start plotting in Seaborn now, and Seaborn we import as we say import Seaborn as SNS that's going to be the common way of bringing in and importing Seaborn. And once we have Seaboard we'll do SNS and call the pair plot function, and all we have to pass in is the data itself and that would work on its own. What we're also going to do is pass in hue equal species and that will actually break it across my color and then the size will define the size, so let's look at that plot. We see here that it's going to create a scatter plot between each one of the features and the other feature. So we see sepal length compared to sepal width in the first row second column, and we're able to see that relationship. With that because we set the hue equal to species, we're able to see that relationship broken down by color according to the different species. Now, obviously if you look across the diagonal if we were to create a scatter plot with one value with itself, we just have a straight line. So let's zoom in just on what that scatter plot actually looks like when we're looking at the pair plot. So we're able to see a visualization now of the correlations as well as the histograms across all of our different columns, which is what makes this so powerful here focusing in on the scatter plot. We see petal length versus sepal width in that top left corner, and we see the relationship and we are also able to break that down by the different species. We can imagine in a business case that we may want to see the example of the effects of different features such as ad spend on revenue and we can see that relationship. As well as the distribution of ad spend and the distribution of revenue for our actual data frame. Another useful feature of Seaborn is working with the Hexbin plot, the way that we're going to call that and we'll see what that looks like in just a second is calling SNS again, that's referring to Seaborn sns.jointplot. We say what our x values are going to be and what our y values are going to be, and this is going to show us similar and we'll see in a second. It's going to be similar to a scatter plot, but it's actually going to give the density of where the most similar points are being found. So we see these much darker hexagons where there's going to be a lot denser amount of values there compared to the lighter hexagons where there's not as many values there. So it's able to also show where most of these values are actually falling, and with that we can also see the histograms of both the sepal length and sepal width on the top and to the right of our graph here. And again, we can think about the relationship between ad and revenue and we'd be able to see the distribution of each, as well as saying where each of those values are most likely to overlap. Finally if we want to really break it apart and sing at it across different categories, so similar to how we do a somewhat of a group by, we can use our facet grip we pass in our data. We're going to pass in our column, that's what the col argument is going to be, and we want the column to be different species for each column. There's also the option of row we're just going to stick with column here and we're going to have margin titles equals true. Meaning we're going to have titles for each, so that's just setting the facet grid object in order to actually plot something we call that objects, it has a .map method available to it. And we're going to map in a histogram of the sepal width for each one of these species. And then we set another plot using the sns.facet grid and we do the same map except this time for sepal length and receive this output where we can see the histogram broken down each column equal to one of the different species. And we can zoom in here and see the histogram for the species equal setosa for both the sepal width and the sepal length. So let's do a quick recap of what we just learned, in this section, we discussed different approaches to EDA and how to have that initial conversation with your dataset. We discussed different techniques available such as summary statistics using the mean, median, min-max, correlation as well as different visualization techniques. We touched on how to look into a sample of your data and what it means to actually look at a sample and some of the things to take into account. We discussed producing a wide range of visualizations and how to get a better feel for your data. And from here, we're going to go in and take an actual lab and see how it's done in a jupyter notebook, look forward to seeing you there.