In this part of this module, I want to show you how to implement association rule, association in Rattle. Association rules can take two formats. One is in the format of what we call a Basket, another format is DataFrame. Don't worry too much about it. For the Basket format, we will use the DVD_Transaction data in Rattle, for the DataFrame we will give you a subset of Groceries data on which you can practice. So, here it is. To get the DVD_Transaction data, it comes with Rattle. So, what you can simply do is load Rattle from R, don't select any data file and hit Execute. So, it tries to open the weather data. So, instead, then you start looking at the file and open. When you look at the file and open, you will find the DVD data out there. So, you don't have to search where the weather data is, and you want to go there to look at the DVD_Transaction data. So, the way you do a market basket is use Associate, Basket, and ask it to do a frequency plot. So, it tells you what frequency various items are being bought. Then you ask it to create the rules. That's it. It does it for you. But then it can sort the rules, by either the Lift or by a Confidence. Confidence you already know. Confidence is the support of your recommendation divided by the support of the items already in your basket. So, for example, this is the support of, so what it says is 60 percent is a support. There are 10 items in this dataset think of it like that. That means there are six transactions in this database in which people have got both Patriots and the Gladiator in that, there's movies. Now, the confidence is a 100 percent. What it means is, everybody who's seen Patriot has also seen the Gladiator. So, the confidence is a 100 percent. What we're asking it to do is suck the data by the Confidence. We may like to recommend some of these. Anybody who's bought Sixth Sense, 83 percent of them have seen the Gladiator. So, you might like to recommend somebody who's just seen the Sixth Sense to widen your watch the Gladiator, and they feel good about it. Another way is to sort it by Lift. Lift is again the same concept. How helpful is your rule? So, for example, here it is. The rule Gladiator Green Mile implies Lord of the Rings. So, people who have watched Gladiator and Green Mile, implies The Lord of the Ring has this confidence of one. What it means by the confidence of one, is that those who have watched Gladiator and Green with a 100 percent, you can recommend The Lord of the Rings and probably they'll like that video. Which means, how good are we doing? Well if we didn't know they had watched the Gladiator and The Green Mile, versus if we knew they had bought. So, if they didn't know then it's a random hit or miss, and only 10 percent of the people have seen The Lord of the Rings. But the moment you know they have watched Gladiator and The Green Mile, this probability goes up to one. So, the lift is, the probability given this information divided by the probability they will watch Lord of the Ring without this information, and that's called the Lift. So, the Lift is basically the ratio of the improvement in your prediction because you have information. So, it's pretty cool. On top of it, it also produces this very nice plot for you. What does this plot do? What this plot does, is it graphs the connections between movies and it has a size and a color. So, for example, the color gives you the lift and the size gives you the support. So, look at this, Green Mile, Lord of the Ring has low support but high lift, and which is the sixth rule on the previous slide. So, here you go. The support is only 10 percent but the lift you're getting is five, and so that it's a very visual representation. So, what do really want, is things which have a reasonable support and a deep color. Those are the ones I want to recommend to people. You can keep improving on this. So, this method, the frequent items like [inaudible] idea is one of the cool innovations of using data to recommend things for people to buy, to see, etc. Here's a small exercise. Do it yourself. In this exercise, you can use the groceries subset data we are giving you, to predict that based on what people already put in the basket, what to recommend for them to put additionally in this basket. This is a subset data we used, so that it's easy for you to analyze.