0:00

This lecture is about bagging, which is short for bootstrap aggregating.

Â The basic idea is that when you fit complicated models,

Â sometimes if you average those models together, you get a

Â smoother model fit, that gives you a better balance between

Â potential bias in your fit and variance in your fit.

Â 0:19

So bootstrap aggregating has a very simple idea.

Â The basic idea is take your data and take resamples of the data set.

Â So, this is the similar to the idea of bootstrapping, which you would have

Â learned about in the inference class that

Â is part of the data science specialization.

Â After you resample the cases with replacement, then

Â you recalculate your prediction function on that resampled data.

Â And then you either average the

Â predictions from all these repeated predictors that

Â you built or you majority vote or

Â something like that when you're doing classification.

Â 0:50

The thing is that you get a similar bias that you would get from fitting any one

Â of those models individually, but a reduced variability

Â because you've averaged a bunch of different predictors together.

Â This is most useful for non-linear functions.

Â So, we'll show an example with smoothing, but it's

Â also very useful for things like predicting with trees.

Â 1:22

And then I look at the data set and I

Â can see it has four variables, ozone, radiation, temperature, and wind.

Â So the idea is that I'm going to try

Â to predict temperature as a function of ozone.

Â [BLANK_AUDIO]

Â So the first thing that we can do is just show you an example of how this works.

Â So the basic idea is, I'm going to create a matrix

Â here and it's going to have 10 rows and a 155 columns.

Â Then what I'm going to do is, I'm going to resample the data set.

Â 2:00

Then I'm going to create a new data set, ozone0, which is

Â the resample data set for that particular element of the loop.

Â And that's just the subset of the data set corresponding to our random sample.

Â Then I'm going to reorder the data set every time by

Â the ozone variable, and you'll see why in just a minute.

Â Then I fit a loess curve each time, so a loess is

Â kind of a smooth curve that you can fit through the data.

Â It's very similar to the sublime model fits that we

Â saw in a previous example with modeling with linear regression.

Â And so the basic idea is we're fitting

Â a smooth curve relating temperature, to the ozone variables.

Â So temperature is the outcome, and ozone is the predictor, and each time I

Â use the resample data set as the data set I'm building that predictor on.

Â And I use a common span for each time, the

Â span being a measure of how smooth that fit will be.

Â 2:51

I then predict for every single loess curve the outcome

Â for a new data set for the exact same values.

Â I always predict for ozone values 1 to 155.

Â So the ith row of this ll object is now the prediction from

Â the loess curve, from the ith resample of the date ozone.

Â So what I've done here?

Â I've resampled my data set ten different times, fit a smooth curve through it those

Â ten different times, and then what I'm

Â going to do is I'm going to average those values.

Â 3:23

So, here's what it looks like in this plot.

Â So, here I've plotted ozone on the x

Â axis, these are the observed ozone values versus temperature

Â on the y axis, those are the observed

Â temperature values, and each black dot represents an observation.

Â Each gray line here represents the fit with one resampled data set.

Â So you can see the gray lines that have a lot of curviness to them.

Â They capture a lot of the variability in the data set.

Â But they also maybe over-capture some of the variability.

Â They're little bit too curvy.

Â Once I've averaged those lines together I get something that's a little

Â bit smoother and is closer to the middle of the data set.

Â That's the red line.

Â So the red line is the bagged loess curve.

Â It's basically the average of multiple fitted loess curves,

Â the same data set where I've resampled it every time.

Â 4:07

There's a proof that shows that the bagging estimate will always have

Â lower variability but similar bias to the individual model fits that you do.

Â In the caret package there's some models that already perform bagging for you.

Â So if you're using the train function you

Â could set method to be bagEarth, treebag, or bagFDA.

Â And those are specific bagged models that the the

Â model that the caret package will fit for you.

Â 4:34

Alternatively, you can actually build your own bagging function in caret.

Â This is a bit of an advanced use and so I recommend that you

Â read the documentation carefully if you're going to be trying to do that yourself.

Â The idea here though is you basically are going to

Â take your predictor variable and put it into one data frame.

Â So I'm going to make the predictors be a data frame that contains the ozone data.

Â Then you have your outcome variable.

Â Here's it's going to be just a temperature variable from the data set.

Â And I pass this to the bag function in caret package.

Â So I tell it, I want to use the predictors

Â from that data frame, this is my outcome, this

Â is the number of replications with the number of

Â sub samples I'd like to take from the data set.

Â And then bagControl tells me something about how I'm going to fit the model.

Â So fit is the function that's going to be applied to fit the model every time.

Â This could be a call to the train function in the caret package.

Â Predict is a the way that given a particular

Â model fit, that we'll be able to predict new values.

Â So this could be, for example, a call to the predict function from a trained model.

Â And then aggregate is the way that we'll put the var, the predictions together.

Â So for example it could average the

Â predictions across all the different replicated samples.

Â You can see that if you look at this

Â custom bag version of the conditional regression trees, you can

Â see that it gets some of the benefit that I

Â was showing you in the previous slide with bag loess.

Â So the idea here is I'm plotting ozone

Â again on the x-axis versus temperature on the y-axis.

Â The little grey dots represent actual observed values.

Â The red dots represent the fit from a single conditional regression tree.

Â And so you can see that for example, it capture, it doesn't capture the

Â trend that's going on down here very well, the red line is just flat.

Â Even though there appears to be a trend upward in the data points here.

Â But when I average over ten different bagged

Â model model fits with these conditional regression trees.

Â I see that there's an increase here in the values in

Â the blue fit, which is the fit from the bagged regression.

Â 6:41

So we're going to look a little bit

Â at those different parts of the bagging function.

Â So in this particular case I'm using the ctreeBag function, which you

Â can look at in, if you've loaded the caret package in R.

Â So, for the fit part it takes the data frame

Â that we've passed and the predict, and the outcome that

Â we've passed, and it basically uses the ctree function to

Â train a tree, conditional regression tree on the data set.

Â This is the last command that's called the ctree command.

Â So it returns this model fit from the ctree function.

Â The prediction takes in the object.

Â So this is going to be an object from the ctree model fit.

Â And a new data set x, and it's going to get a new prediction.

Â So what you can see here is it basically calculates each time

Â the tree response or the outcome from the object and the new data.

Â It then calculates this probability matrix and

Â returns either the actually the observed levels that

Â it predicts or it actually re, just returns

Â the response, the predicted response from the variable.

Â 7:47

The aggregation then takes those values and averages

Â them together or puts them together in some way.

Â So here what this is doing is

Â it's basically getting the prediction from every

Â single one of these model fits, so that' s across a large number of observations.

Â And then it binds them together into one data matrix by with

Â each row being equal to the prediction from one of the model predictions.

Â And then it takes the median at every value.

Â So in other words it takes the median prediction from

Â each of the different model fits across all the bootstrap samples.

Â 8:24

So bagging is very useful for nonlinear models, and it's widely used.

Â It's often used with trees.

Â And you can think of an extension to this as

Â being random forest, which we'll talk about in a future lecture.

Â Several models use bagging and caret's main train

Â function, like I told you about in previous slide.

Â And you can also build your own specific bagging functions, for any

Â classification or prediction algorithm that you'd like to take a look at.

Â For further resources, I've linked to a couple

Â of different tutorials on bagging and boosting, as

Â well as the Elements of Statistical Learning which

Â has a lot more details about how bagging works.

Â But remember that the basic idea is to basically resample

Â your data, refit your nonlinear model, then average those model

Â fits together over resamples to get a smoother model fit,

Â than you would've got from any individual fit on its own

Â