0:00

In this third module we're going to cover a complete a creative modelling and

Â scoring application from A to Z, so bear with me.

Â The first step is to again, as usual, load the data,

Â rename the columns correctly, compute date of purchase,

Â year of purchase, the number of [INAUDIBLE].

Â January 1st 2016, which will later be used to compute recency.

Â I'm going pretty quickly over all these things.

Â And the first step is basically to extract all the predictors

Â that we're going to use in our predictive model.

Â Remember that as we said the predictors are variables computed a year ago,

Â and these data will be used to predict what happened over the last 12 months.

Â So, exactly as we've done in the previous tutorial we're going

Â to compute everything we know about customers at the end of 2014.

Â These are the predictors.

Â And then we are going to look at what they did in 2015.

Â These are the target variables.

Â We are going to predict.

Â The next step is to merge these two together.

Â And again, exactly as in the previous tutorial for

Â module number two, we're going to merge customers from

Â 2014 with the revenue they generated in 2015,

Â making sure that all the customers in the first data sets will remain in the data.

Â We are going to call the data sets the in sample,

Â meaning that we're going to run in sample predictions, and

Â then later on we're going to run out of sample predictions on customers in 2015.

Â Again, we merge everything, we transform the not applicable

Â values into zero, and we have our revenue in 2015,

Â which is how much money they've spent in 2015.

Â And many of them spent zero.

Â We're going to create a new variable.

Â Whether they spend anything, we're going to call that variable active 2015.

Â And, basically, we look at revenue in 2015.

Â If it's above zero, it's a yes.

Â If it's zero, it's a no.

Â And we'll store that value as numeric.

Â So instead of storing true faults, we're going to store zeros and ones.

Â Pretty much pretty standard here.

Â We execute everything, and the next step, just to look at what we've created.

Â 2:52

This is the data we're going to calibrate our creative models.

Â We have recency, first purchase, frequency, average donation amount,

Â maximum amount spent, which is a new variable we're going to use [INAUDIBLE]

Â and then how much they've spent, and whether or not they've spent anything.

Â So obviously, if there is a zero here, you have a zero there.

Â Either is a positive value.

Â You have a one over here.

Â 3:25

Now, we are going to calibrate the first model, which is the probability model.

Â The likelihood that a customer we'd be active in 2015 or not.

Â We are going to use the NNET library,

Â which contains the function Multinom.

Â Multinom stands for Multinomial Model.

Â It's an extremely useful model to predict outcomes that can either be zero or

Â one and nothing else, which is exactly what we'd like to use.

Â So, we're not going to use traditional linear models as we could use later on.

Â We use the binary model where the output can either be zero or one.

Â 4:08

And here is how is works, the output of the model which I call prob

Â dot model is the output of the Multinom function with formula

Â that states that active 2015 is a function of recency

Â first purchase as frequency, average amount and maximum amount.

Â Which is five of we have introduced just for fun.

Â And the data is the data we created the calibration data called In sample.

Â Next, these will fit, calibrate the entire model on the data set.

Â And then we can going to extract the coefficients and the standard

Â deviations of these coefficients, and output not only the coefficients,

Â the standard deviations, but the ratio of those as well.

Â Let me execute these first, so it converged,

Â the model converged with, if you look at the sign of recency.

Â The recency parameter, for instance, it's negative.

Â Which makes perfect sense, right?

Â The larger the recency, meaning the more days have lapsed between the last purchase

Â and today, the less likely you're going to make any other purchase in the future.

Â So, if your last purchase was three four ten years ago,

Â it's extremely unlikely that you'll make any purchase very soon.

Â Meaning that the sign of the primary is negative, the higher the recency,

Â the lower the probability.

Â However, if you look at frequency, that primary is positive.

Â Meaning that the more purchases you've made in the past, the more likely

Â you'll make additional purchases in the future, which makes perfect sense.

Â Now, these two parameter values are the most interesting,

Â simply because if you look at standard deviations and

Â at the ratio between coefficients and standard deviation, which usually indicate

Â to what extent each parameter value is significant, or not.

Â If it's above two, or below minus two, usually it's a good sign, and

Â as you can see here, recently is huge, minus 32.

Â Way, way below minus two, so It's highly significant.

Â Frequency is also the ratio of the frequency coefficient and

Â it's standard deviation is extremely high as well, close to 15, but all the others

Â are pretty close to zero or at least not as good and not as large as the other.

Â So the impact of first purchase, average amount, and

Â maximum amount on the predictions is actually pretty limited.

Â So now we have created our model, the probability model.

Â And we have stored everything we need in that

Â variable to later on make additional predictions.

Â What we'd like to do now is to predict if you're going to be active,

Â how much are you going to spend with that specific retailer over the year 2015?

Â The issue here, as we've discussed In the previous video.

Â That model can only be calibrated on those customers who actually purchased

Â something, and so we need to sub sample to only take those customers who

Â were active in 2015, so we can calibrate an estimate how much they spend.

Â 7:51

And feet the model.

Â So what we are going to do is take the sample, the viable 2015.

Â Look which ones are equal to one and we'll store the index

Â of those customers in a variable that we'll call Z for the time being being.

Â Z would be a vector indicating which customers have been active in 2015, and

Â only on those customers will we actually calibrate the second monetary model.

Â So we're going to run that.

Â If you look at the head of the data with only the index Z we've retained,

Â as you can see, all these customers have active 2015 at one,

Â which is exactly what we wanted.

Â And all have spent something in 2015.

Â And, finally, if you look at the value active 2015, everything is equal to one.

Â 8:52

We only have active customers in there.

Â In terms of revenue, customers have spent anything between $5 and

Â $4500 with that retailer of the year 2015.

Â Now, what you are going to do is calibrate the monetary model.

Â Meaning we begin to predict how much they spend in 2015 based on

Â only two things here, the average amount they spend usually,

Â and the maximum amount they spend.

Â So we have two different predictors.

Â Here we're not going to use the Multinom function,

Â because the output is not something either zero or one, it can be anything.

Â LM, which stands for Linear Model,

Â will fit a linear model to match as closely as we can revenue 2015 based

Â on the predictor's average amount and maximum amount.

Â And the data here is not the entire sample, but

Â only those customers who can be found in the index, Z here.

Â So only those customers who actually spend something.

Â 10:19

Our square value is point 60,

Â which is basically a signal of the fit of the model.

Â But, we have a slight issue here.

Â And the issue is that, let's plot on one hand how much has been spent by customers.

Â And on the other,

Â we take the amount model we've just created here through linear regression,

Â and look at the fitted values, which are the values predicted by the model.

Â If we plot that, Well,

Â the chart will look terribly ugly.

Â And the reason is that most customers have spent pretty small amounts.

Â 50, 60, 70, 100, even $200, and

Â a few outliers have spent huge amounts, up to three four thousands.

Â And so basically the model is trying to fit a line through this cloud

Â of points where actually no line is clearly a good fit.

Â So what we're going to do, very much like what we did in the segmentation model,

Â instead of creating a model with

Â 12:33

the revenue of 2015 best on the model we've just created.

Â To summarize we have just calibrated two models the first one over here,

Â to predict the likelihood that someone will be active.

Â And the second one over here to predict how

Â much they will spend if they will be active in 2015.

Â Now the end game of this exercise is actually to

Â apply the models to predict the future.

Â So what we are going to do is to look at today's behavior,

Â today's data, and extract exactly the same information

Â about today's customers as we used in terms of predictors about a year ago.

Â So we're going to extract recency, first purchase, average amount, maximum amount,

Â and everything else for the 2015 customers about whom of course

Â we have no idea who will be active next year, and how much they'll spend in 2016.

Â But we can try to predict that, with our model, so

Â once you create that and that what is usually called the out of sample data set.

Â 13:47

You have all your customers at the end of 2015, 18,417 people.

Â And you are going to predict their probabilities to be active in 2015,

Â based on an object which we've created that's our probability model.

Â The new data is customers 2015, and the type of predictions we are going

Â to make are the actual probabilities that's the primary of the model.

Â The probabilities that someone will be active or not.

Â So we add we create a new value the prob predicted,

Â the probability predicted, to the data set we have and

Â that column will actually contain the predictions from the probability model.

Â We're going to create another column called revenue predicted,

Â where the predictions we come from the amount model.

Â In here we're going to apply that to the entire data set.

Â However, remember that the amount model

Â is actually predicting the log of the revenue.

Â So you should like to the actual revenue,

Â you need to take the exponential of the predictions,

Â since the log is the inverse of the exponential, and vice versa.

Â So, you predict the log of the amount using the amount model,

Â and then you exponentiate that to get the actual revenue predicted, and

Â the score the actual score of your customers is the conjunction

Â of probability predicted multiplied by amount.

Â So if you have a 10% chance of buying for $100,

Â your score will be 10% of 100, which is 10.

Â So we run these three lines

Â 15:47

And then we summarized the results.

Â If we look at predicted probabilities on average, people,

Â customers in the database have a 22.5% chance of being active.

Â Some customers are predicted to be be absolutely certain to be

Â active close to one.

Â Other customers are predicted to be absolutely certain to be non active

Â close to zero.

Â And many are in between.

Â If you look at predicted revenue, so if they are active how

Â much they are going to spend next year the average is $65.

Â It goes between six and 38,000 here.

Â And of course it's six and

Â not zero because it assumes that you will spend something.

Â So the overall score Is actually a function of both and

Â probability and revenue together.

Â And the score has a mean of 18,8.

Â What does it mean?

Â From a managerial point of view, that value is extremely important.

Â It means that, on the average, every customer in this database,

Â will spend 18 point eight dollar next year.

Â Some will spend zero, a lot of them will spend zero.

Â Some will spend maybe $333.

Â Some will spend 50, and so and so forth, but on average it will be 18.8.

Â Some have a score very near zero.

Â We don't expect anything coming from them.

Â Some have a score extremely high,

Â meaning they're potentially extremely profitable for the firm.

Â And if you look at the histogram, of course, most people are around here

Â you could actually create a histogram of the log or look into more details.

Â But what we do is a slightly different exercise.

Â 17:52

We'll take the customers here.

Â Look at their predict score and only retain those with the score.

Â Above $50, and if you look at that,

Â it will create a vector of people with a score above $50,

Â which contains a total of 1323 customers.

Â So in the list of 18,000 customers here about

Â 1,300 have a predicted score of $50 or

Â more and you can see which ones they are.

Â Here you have the index of all the customers that have

Â a predicted score above 50.

Â And if you like to apply targeting applications, if you'd like to identify

Â the customers with which you should spend the most marketing dollar.

Â Those customers are the ones with the high score, obviously.

Â