Now, for me this was personally a very exciting part that's visualizing the data. I love to see data visualized as plots in graphs that really gives me a good understanding of what is going on here. When we combine stat plots which are statistical plots that package with distributions, it really is a powerful combination and at least for now it works really well in Julia 1.0 inside of Julia Box here for me. We're going to do that by using macros. Macros is just code that produces code and it is a very short and powerful way build into Julia for us to use and exploit and it really is going to bring these two packages together for us, the DataFrames and a stat plots in one very easy to write piece of ability at least to write code so that we can get these beautiful plots. So, the first thing that I want to do is just to look at the distribution of the edges but I want this plot to show me according to the two groups. So, I want to see the distribution of ages for group A patients and group B patients. Now, what else do I want in here? I want a nice title on the top of my plot. I want a nice title on the x axis, I want a nice title on the y-axis and I want a little legend because I wanted to put that in some corners so that I can see if they're different colors to these two different lines that we are going to produce which one is which one. So, let's see how can I go about this. The type of graph that I want is a smooth, what's called a smooth histogram. So, just going to draw a smooth line, a smooth curve to show us this distribution. So, the stats plots function for that is a density for density plots. So, we can use this macro at df forward dataframe and the first not argument but the next variable that we're going to use after df is the actual dataframe. So, it says take the data dataframe and produce for us a density plot. So, through that, use of that macro just makes this so easy to do. Now, what do we want? While the first argument is the symbol version of the variable that we're interested in, that's the edge column. So, we use the symbol version of it, that's colon age. Now, I want this distribution of the ages but I want it grouped by one of the categorical variables. So I'm saying group equals and then the simple symbol version of my treatment column. So, colon treatment. Then some of the keyword arguments title and I put that in double quotes. So, that's a string, distribution of ages by treatment group. xlab is the x-axis label and I wanted to say edge, ylab is the y axis label and I want to say distribution. I want a legend in this case and I want to put it some way. There's a couple of choices you can do topright, topleft and bottom right, bottom left. So, we're going to say using symbol form as well so it's colon top-right. So, let's execute this. It's the first time that you run this. It's got a compile. So, be patient, the plot will appear shortly. Then we go a fantastically beautiful plot. Let's have a look at this. We have our two axes, the x axis is age and we see the zero to a 100. Remember the data was from 18 to 80 and we see this smooth distribution. So, instead of a histogram, it's going to smooth this out for us. We're going to see on the y-axis the distribution. So, we can clearly see here, it's a density because of the numbers that we see here. So, the areas under the curve will be one. We can clearly see the two different lines and we can see A and B in the topright is our legend days. So, the blue one would be for group A patient and B will be this reddish color and that is the age distribution for the patients in group B. That is fantastic. Such a simple line of code. Well I broke that into three lines so it fits in nicely so we can all see it. But what a fantastic plot and it's right here with some simple coding. Let's do the same. I want age distribution but I want it by the result. Exactly the same. I'm going use my at df macro data that dataframe I'm interested in. I'm interested in the density plot of the ages. This time I'm just going to group by result. I've given it a bit of a different title but exactly the same. As you can well imagine we're now going to see three lines because the sample space for that categorical variable had three elements in it; was improved, static and worse. We can see the legend on the topright. I mean that is the so clear. We can see the distribution of the ages for these groups. Now, why did I go through all of this? Why don't I just use density in its normal way because you can just call density, use some other arguments. I love this macro way for the following reason. That is why I'm showing it to you in this short little module, this honors module because of what you can see right now. I'm calling the DF macro again for my data dataframe density plot of age but look what I do to the group now. I'm trying to create a stuple. So, it's inside of parentheses and I'm saying treatment and result. I'm putting two categorical variables there and that is the reason why I designed the two categorical variables right at the beginning when we simulated, created the simulated data. So, it is going to look at both treatment and result and I want you to guess what's going to happen here. I'm sure you know. This makes it so powerful. So, look at this title, is still going to be the same or well title, the keyword argument but I'm just changing the title to distributions of ages by treatment and result groups. I'm still have age as my x label, distributions my y-label. Put it on the topright but have a look at this. There we go. Because I put two categorical variables here, it actually separated all of them out for me. It knows that there's A and B. It does that there's improved static and worse. So, it made A improved, A static and A worse and B improved, B static and B worse and that's just fantastic. They're just such powerful code tool to draw all that information out in one single go. That's absolutely why I love to use this at df macro together with stat plot here because that was so easy for me to draw out and you can well imagine some data analysis where this would just be so powerful to do and you don't have to tease out and jangle the data, pull it apart. So, that all these elements, all these data point values are separate so that you can plot them all in one go, boom, right there. That is just fantastic. I really think that that is phenomenal. Now, let's move on from there. I want to just show you three other plots. The first one is the box-and-whisker plot. Just want to show it to you because there's something when we use very commonly in my world at least. So, again use the macros don't try and use anything else. It's a so powerful data. Then this time boxplot is my function for box-and-whisker plots. Now, this time look at the first argument, this is other way round. Now, for boxplot, it says treatment first. So, that's a categorical variable. I know I'm going to get A and B from there. That's going to be my x-axis, on my y-axis secondly, I'm going to have the white cell count. So, it's going to create this box-and-whisker plots of the white cell count. My lab is white cell count. My title is white cell count by treatment group. My xlab is groups and my ylab is white cell count. Let's have a look at it. There we go. We see what's outcome by treatment group. We see groups A and B, we see this little label here. That's my lab there. That's going to give it a name WCC, white cell count. As we put it there that's that label and I still get my xlab and my, my x-axis label, my y-axis label all are beautifully done. Let's do one more. I'm going to create white cell count by result group this time everything else being the same and you can imagine what's going to happen there. I have three different. I have improved, static and worse and it's going to plot those out for me very nice in pyplot here, my box-and-whisker plots. Two more plots I want to show you quickly. The one is a correlation plot. So, again at df data CORR plot, correlation plot. What do I want here? The first element is just this, all my numerical variables. I want to see if there's a correlation between pairs of these. I'm just going to pass a second argument called grid equals false to say that I pulled out a grid. Let me show you what this looks like. So, we can quickly run through that. There we go. Now, have a look at this. We see these three at the bottom here. So, that will be white cell. This one here. See where my mouse is moving. White cell count against age. So, that is the correlation between those two things as the pages get over the white cell count comes down, sample CRP sample for agency appear at least and then CRP and white cell count we see at this. So, we see the correlation between the pairs. Here on the main diagonal here, we just see a little histogram. We created smooth histograms earlier but this is just the histogram and then a heat map on the top. So, that's what a correlation plot is going to give you. So, in one go, you can very quickly see the correlation between pairs of your numerical variables and all you have to do is to select them out and age white count CRP with numerical variables for me. I can put in all the main numeric variables say that I wish to and this correlation plot is going to do that for me. Corner plot sort of the same thing. Again, only going to do my numerical variables there. This is what it's going to give us. So, what we're interested in here really is just this white cell count against age, CRP against age and CRP against white cell count. This is the better representation for me just to have a quick look at this to see if there are any correlations between these. So, visualizing your data have some fun really as a powerful package was stat plot for statistics specifically and when you combine it with DataFrames, the code is so simple and the visualizations that you get is a so powerful, it really draws out the information that is locked inside of the data itself. Next up, I'm going to show you how to do all of the common statistical test very easily right-hand side of Julia.