Hi, let me introduce how to draw histograms. Histogram is used to present frequency distribution. The shape layout of histogram is similar to bar chart. But at this time, using histogram, we present frequency distribution. If you use histogram function, simplified name is hist function. Hist function returns three object. First object is frequency for each bin and bins. Bins returns the edges of the bins. Lastly, the third object returned by hist function is patches. Patches is the bins actually. Patches used for adding properties to a specific histogram bar. A new concept introduced here is bin. Bin is all bar in bar chart. But in histogram or bar is called bin. Bin is simply sub-intervals or sometimes it is also called patch. Think about there's a continuous distribution of, there is from normal distribution. You are taking 1,000 samples from that normal distribution. It is continuous. Somehow, you divide the whole range into a certain number of bins. It means that a certain number of sub-intervals reading each sub-interval, there are counts frequencies. The number of values generated by normal distribution but belonging to specific sub-interval. Those data pointer calculated and used for presenting density or frequency. Bin means that simply all bar presenting frequency data belonging to the specific bin. Now, let me give you an example. X is generated randomly chosen number from a normal distribution. Mean value is zero. This is the standard deviation n. How many samples are extracted? Two thousand samples, and then three values objects are returned from plt.hist function here: n, bins, patches. Surely, you don't have to use those names, labels. You can use, you can assign other names. Instead of n, you may use frequency. Instead of bins, you may use any description that you want to use. The key point here is that three object are returned, and x variable is used for creating histogram. Bins 10 means that there are 10 sub-intervals. Alpha 0.9, Alpha value ranges between 0-1. As Alpha value gets smaller, the color pretty dull. As Alpha value gets larger, color becomes darker. Alpha value is used for controlling transparency of each bin. Bin color. At this time, color is g and edgecolor means that there's a rectangle, so edgecolor adding the edge of that rectangle with black color, and plt.setp, setp simply means that set properties. At this time, if we first make two lines dormant and see how does it look like. In order to check histogram information, I'm using print function at the bottom. An array of frequency will be returned here, and an array of edge values. It is from bins and round one. Because it is decimal point, can be placed and so that's why, in order to make the painting simpler, I used decimal 0.1 and patches, that information will be returned also. Let me execute this one. Then what do you see? Color green. How many parts? 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Because 10 bins. Each bar is called a bin, there are 10 bins here. An array information, the first bin actually, its frequency is three. The second bin, 34, the third bin, 134. Samples here 134, first, second, third, fourth, around 344. You can get the information about how many data point is belonging to a specific bin. Here there are 10 bins. It means that this array contain 11. Why? Because this bins array contains each formation. Minus 10.1 the left aside, then minus 8.1, that is at this point. Far right side is 10.1. Because there are 10 bins, it means that you need to add one more because they're on the left side and right side. Simply 11 element contained in this array. The third one, patches. Patches simply the information of that patch, BarContainer object of 10 artists. Here 10 artists means that each patch, each bin. This time, I assigned color g. What if you want to change only the color of specific bin? In that case you use this one. Plt.setproperty patches from 0-3. It means that 0, 1, 2, 3 patches will have red color, this color r. Let me execute this one this time. First these red patches have red color right now. What if you choose a specific patch or bin? In that case, you specify, then bin index. So far it will be blue, not green. This is blue. The index rule is the same that we so far used in Python coding, 0, 1, 2, 3, 4. That's why specifically you're pinpointing 4, fourth index, that's why this becomes blue, 0,2,3 means that 0, 1, 2 index patches have red colors. Now, you can also control the limit of x-axis. Xlim, min value and max value. Min value here minus 10.9. Max value is 11.6. If you use this one, there's no empty place between the first bin and the starting point of x-axis. Let me execute this one and see what happens. As I explained,10 bins are compactly located within this canvas. This is a way of creating histogram, here's another case. What if you draw multiple distribution. X variable, mean value is minus 7, and standard deviation is the same, 3, and100 random drawings from distribution. The other mean value is zero, the last third one here mean value is seven. It means that three variables have 1000 normal or drawings from normal distribution, but mean values are different. It means that those three normal distribution will be somewhat overlapping. Bins at this time. The left side value of bin and right side, minimum value, maximum value. Linspace function is used. As I said before, linspace is inclusive. Those values are inclusive. How many bins? Fifty bins. Figure next, we are creating subplots, but at this time the whole Canvas is used for just one graph. We are placing multiple histograms on the same graph. In this case, we can place multiple variables within one hist function. How? Place each variable and reading list, and bins. The same bins will be used for x-axis and labels x, y, z because they are three variables and those r, g, b colors are for each variable, x, y, z. Let's see the outcome, then looks like this. Legend location upper right, that why it is here. Labels are used for creating the legend and r, g, b colors are used. In this example, I'm trying to show you how to draw multiple histograms together. You can add, you can create three histograms. For example x_1 hist, x_2 hist, you are using at this time only one hist function to create three histograms. Now, let me show you another case of showing distribution. This is a probability density function of normal distribution. If you use provide the x values, then a certain value is returned. You provide x value then certain probability density information is returned. In this case, Mu mean value, this is standard deviation value of Sigma and x value. You are creating a function calculating, returning this density value when x value is provided. But depending on the Mu and Sigma, this value changes. That's why this function takes three values, mean, standard deviation, and x. This formula simply describes us this formula up here. If you look carefully, this formula is actually exactly the same as this formula. Now, Mu, min value is 100, Sigma is 10, sample 1,000. Pdf, probability density, you are sampling by providing Mu, Sigma, sample information and after generating 1,000 samples, we are trying to create histogram. Bins, left minimum value 60, maximum value 140, and how many bins? Eighty bins at this time. Think about, how do I set minimum and maximum? Because this is min value. Usually, normal distribution ranges between three Sigma. If you subtract three Sigma is 30. If you subtract 30 from 100, it is 70. For safety, I'm adding 10 more, 60, and it is balanced. In the middle there is mean value, so subtracting 40, adding 40. Minimum, maximum is determined within that range, 80 bins will be created and counted. Figure axis, histogram. Here's pdf, samples created bins, and color g. Plotting, this bins is used for x value and it is plugged into this function. Same Mu and Sigma value is used, then this y value is returned from this plot here, adding line graph. It is placing a line graph on top of histogram graph. Let me show you the outcome. This one, one error. What is the error? This function should be executed first and then rerun. Then it looks like this one. This is a sample histogram based on sampled data. Then why is the [inaudible] value following normal distribution based on the same parameter value, mean 100 and Sigma 10? Then it looks like this one. What if we increase bins like 100? Then the histogram becomes lower. For example, the maximum value is 40, what about 80? Histogram value becomes larger. What about 40 or 60? Now, the red line graph is far lower than the histogram maximum value. Now, maximum value is 40. Why is this happening? Because if you reduce bin number, there are 1,000 samples created. If you reduce bin number in each bin, the frequency will increase. But what about we increase bin? Then obviously the number of frequency belonging to each bin will decrease because total number of frequency is set at 1,000. That's why by changing the numbers, you can somehow adjust the location of histograms. Actually, this one is probability density, but in order to convert the values into frequency, I multiplied here sample. This is the probability and we are multiplying the sample size, then the height will be calculated as long as sample size is not changing, the location of line graph does not change. But depending on bin numbers, the frequency of our histogram is changing. Now, let me give you another example of drawing histogram. At this time we are using iris data. Setosa, versicolor, virginica, they're sliced by row. The setosa contains only setosa information. Versicolor contains only versicolor information. Names list is created and kinds list is created. Kinds list contains actually subsets of ID status, setosa, versicolor, and virginica. Now, we are creating histograms for each kind, figure, Canvas, ax here. We are creating only one figure. So ax, only one object is created and adding hist. At this time, we are adding three variables, sepal lengths from each kind, setosa, versicolor, and virginica. Number of bins 20, label, we are using the label names created here, and color r, g, b. The remaining command lines are all explained already. Let me execute. Then the outcome is this way. Setosa versicolor, virginica. From this figure, what do you see? Three kinds are overlapping if you use the sepal length. Not easy to separate based on sepal lengths. Not easy to separate one kind from the other, because they are heavy overlapped between three kinds. This is what you can see from this figure. Let me give you a review question, true or false. By adjusting the number of bins, we can optimize the presentation of a histogram? That's what I already explained. There's negative relationship between number of bins and the number of frequencies belonging each bin. That's what I already explained. The answer is true.