0:33

So the first thing that came to my mind when thinking about extending linear

Â models was how that we can fit complicated functions using regression models.

Â So we've seen a little bit in how to do this by adding squared and cubic or

Â polynomial terms into our regression model so that we can fit lower order functions.

Â However imagine if you have a complicated function like a sine curve or

Â something that looks maybe like a sine curve.

Â How would you fit that non-parametrically using a linear model?

Â So we might specify our model as Y is now equal to f of x

Â plus epsilon where f might be a complicated function.

Â Well it turns out that there is a pretty easy way to do this and

Â what I'm going to show you is a simple first step.

Â There is entire literature in how to do this and

Â were just going to go over in some of the basics.

Â So, first step would be the use something called regression spawns.

Â So we're going to, our model for Y = f(X) + epsilon

Â is going to be Y equal to an intercept, beta naught,

Â plus a line term called a regular slope term,

Â beta1 times X1, but then a bunch of terms like this

Â where it's the sum of a bunch of terms of x minus a naught

Â point with a little plus at the bottom times gamma.

Â And I'll show you in a second what these actually mean.

Â So the function with a little plus, that just returns

Â the argument if that argument's positive and it returns zero otherwise.

Â And these little terms, 1 up to d, those are naught points.

Â So what we're going to do is we're going to take,

Â imagine a stick that if you break it it just created a kink in it.

Â Okay, rather that breaking it apart, you create a kink in it.

Â So what we're going to do is imagine we want to fit our function.

Â 3:04

We're going to actually mathematically model those breaks.

Â And that's what these little plus functions do and

Â so I argue down here that if you're interested,

Â you can just prove yourself that this function is continuous.

Â It doesn't have split,

Â you can draw this without ever lifting up your piece of chalk.

Â Okay?

Â And so, I think the easiest way to show this would be to go through an example.

Â And in this example I just simulate data that's a sine curve plus noise,

Â and you can see that in the blue points on the plot.

Â 4:02

And then I show exactly here how to create my naught terms,

Â just executing that function that I had on the previous page.

Â You can replicate this function and you can see there it's pretty easy.

Â So now I have a matrix of these naught terms.

Â And I then create my x matrix by adding an intercept term that's just a constant, my

Â slope coefficient term, which is just the x by itself, and then these spline terms.

Â Then I just fiddle in your model.

Â Y is this collection of predictors.

Â I took out the intercept because I included the intercept in my x matrix.

Â Okay? And then I show my fitted plot.

Â And you can see, it looks pretty good, it actually fits the sine curve very well.

Â The only thing that's a little bothersome is that we know the sine curve that

Â we're trying to fit is likely smooth and this is very

Â 4:56

sharp at these at this naught points.

Â And the way that that happens mathematically is the function that we've

Â used is continuous but it's not continuously differentiable.

Â If we took the derogative, it's the derivative is not continuous.

Â And so we can actually do a pretty simple trick and get a continuous

Â derivative of the naught points and make them nice and smooth looking, so

Â it doesn't look like we're breaking the stick at each one of these points.

Â Instead it looks like we're drawing a smooth curve.

Â It's very easy to just add square terms, is all we have to do.

Â 5:31

So now our function is y equals beta naught plus beta 1 x,

Â plus we add a square term, beta 2 x squared, and

Â then you see we're using our same function as before, only we're squaring it.

Â Okay, so instead of having x minus the naught points with a little plus symbol,

Â instead of just using that, we're squaring it, which just means If the axis is

Â further along than the naught point, then return that difference and square it.

Â Okay if it's naught then just return zero.

Â So that's what that function is.

Â I give you the term here and

Â you can see our code is identical with the exception of, I'm just squaring

Â the eventuality that our x is further along than each of the naught points.

Â 6:16

And I've added an x squared term.

Â Okay? And then now when I fit it,

Â you can see now the curve is nice and smooth.

Â And notice all we've done in this process is just ordinary regression.

Â So, we fit a pretty complicated function,

Â with just ordinary regression in putting a bunch of naught points.

Â Now this is a pretty basic version of regression spines or basic version of

Â spines called regression spines and there some problems we have to know exactly

Â where the naught points, we have to know what where we put the naught points.

Â There's potentially a problem if we put too many naught points.

Â There's potentially a problem if we put too few naught points.

Â And so there are some solutions for that, and

Â in fact i have some notes on the next page that cover this.

Â So the first thing I'd like to point out in these notes is that,

Â this collection of sine terms, it's called a basis.

Â It means it's the collection of building blocks for functions and

Â this is just one way to create a set of building blocks for functions.

Â There's a lot of other ways and in fact we're going to cover another way here in

Â a minute but, this is one of the more popular ones, these regressors lines and

Â there's ways you can extend this to make them even more useful.

Â But I would just say, people spend a lot of time thinking about these things.

Â And there's a lot of different kinds of basis that you can look at, and

Â each basis has its strengths and weaknesses.

Â Some of the most notable bases are the Fourier basis,

Â that we'll talk about a little bit in a slide, the wavelet basis and

Â then spline basis like the ones that we're just considering there, and

Â there's a lot of different kinds of spline basis.

Â Another thing is, if you want to fit data that looks like that,

Â you can just have one naught point.

Â If you know where that naught point is exactly, you can just have the one naught

Â point, add it in and you can get a hockey stick like model out of this.

Â Now, that might be controversial if,

Â you have to know that it's really a hockey stick and an abrupt change like that.

Â But if you really know that and

Â you want to do that, you could fit a hockey stick model with this approach.

Â 8:31

There's nothing particular to doing these in just linear models.

Â We could do them in generalized linear models as well and just specify

Â the regression's in the linear predictor, and that's all that it takes.

Â So in addition to showing you how to fit non linear functions in linear models,

Â we've basically just showed you how to do it in generalized linear models too.

Â 8:52

So, like I said a very consistent problem with this is that we don't know where

Â the naught points are, there's a problem of putting in too few, there's a problem

Â of putting in too many so, the modern solution to this problem is to put in lots

Â and lots and lots of naught points, add a so called regularization term.

Â The rationalization term will penalize the coefficients from the spline terms,

Â will penalize larger coefficients, so

Â it keeps the number of parameters in check and

Â so that's in more advance topic but if you take some more advance linear models or

Â regression modeling classes they'll cover that.

Â 9:33

So anyway, now you have some basics to how you could fit an interesting

Â nonlinear function using linear models in generalized linear models.

Â And for the time being,

Â you'll have to play around with the naught placement and how many naughts you put in.

Â And then I would suggest taking some other classes in linear models to extend this.

Â So let's move on to the next example, the final example for the class.

Â I wanted to talk about how you can make another basis, the Fourier basis and

Â how you could try to model harmonics using the Fourier basis.

Â 10:09

So I thought of this simple exercise.

Â Imagine a scale, a major scale is just kind of thing that you've heard of.

Â Do re mi fa so la ti do.

Â Just like that.

Â The standard.

Â That's a octave eight notes and a chord is

Â three notes played together so, any play any chord is if you hear

Â someone strum a guitar or play three notes in the piano, they're playing a chord.

Â 10:42

So, I wondered if someone play a chord continuously.

Â For well, a little bit of time and we had it digitized and recorded and

Â brought in to R.

Â Would R be able to figure out what chord it, what the notes were?

Â So that would tell exactly what chord it is.

Â All it got was a sound.

Â It couldn't tell us what chord it was.

Â So here, I took this data, or I generated the data, but

Â I got the frequencies for the various notes off a webpage,

Â and this is the frequencies if you start on the middle C key on a piano and

Â go up one octave to the C key again.

Â Okay.

Â So that's probably the, I don't know,

Â maybe the most important part of the piano.

Â I don't know.

Â And, so, what I've done is that I've taken this notes and

Â then I've created a digitally sampled time points.

Â So, here I did, the time goes from zero to two.

Â And it and it, I'm saying sampled every .001.

Â So let's say it's two seconds.

Â So if two seconds of data, and, I created three notes.

Â A C, an E, and a G.

Â I created those notes by just creating a sine wave at that frequency.

Â And you could, R doesn't really have the capability of doing this,

Â but it's actually pretty easy to do in Map Lab.

Â You could, if you were to take these frequencies.

Â And save them in a file and you know, wave file and play them.

Â It would you know, sound like a very digital sound at

Â the right note of that key would sound like [SOUND] like that.

Â Okay?

Â And so to build our chord, you just add these three notes together.

Â So there I build my cord by adding my c, e, and my g notes together.

Â That's a c major chord.

Â And now what I want to know is, here I have a chord, and

Â if I could save it as a wave file and play it, it would sound like a chord.

Â 12:45

And what I want to know is, can R detect what chord it is?

Â Okay, so what I did, is I created a basis that was all the sine functions for

Â every note.

Â So, eight sine functions, okay?

Â And then, I fit a linear model with my chord, my chord as an outcome.

Â 13:05

The sine functions as my predictors, and I got rid of the intercept because

Â everything is centered to have means here, okay.

Â So let's see what happens when I plot the coefficients.

Â So here I've plotted the coefficients and I've connected them with a line, and

Â you see that R does it correctly and it estimates the c, the e and

Â the g chord as having very large coefficients and

Â all the rest of of them are kind of small, relatively speaking.

Â So if you were to do this, you would be able to guess the chord.

Â 13:35

Now this only covers the sine part, but

Â there could also be a cosine component to the note as well.

Â And so there is an automated way.

Â Let's do a fairly automatic way to fit all the possible sins and

Â cosine terms for a you know two seconds of digitally sampled music and

Â that's called the discrete Fourier transform.

Â All the Discrete Fourier Transform is doing is exactly what we're doing here.

Â It's fitting a linear model, but it has all,

Â not just the sine terms, but all of the cosine terms, but all of the possible ones

Â that that time series will allow at the rate at which it's been sampled.

Â So it has a so-called completely saturated model.

Â So, it has, if you have a thousand time points sample,

Â it has a thousand coefficients in the model.

Â 14:32

And so, the discreet Fourier transform really just fits this linear model,

Â it does and kind of interesting way with complex numbers and that sort of thing.

Â But that's all that the discrete Fourier transform is doing.

Â And so it would fit both the sine terms and the cosine terms.

Â But if you were to plot the output of the discrete Fourier transform,

Â I'll just show you how to do it right here.

Â A is just my FFT of my court, and

Â here I'm plotting because the FFT does this calculation with imaginary numbers.

Â I'm just plotting the square real components of the magic,

Â of the, of the discreet variance transform.

Â And you can see it loads on three very specific frequencies.

Â You could back calculate what those frequencies are,

Â and those are the frequencies of C, E, and G.

Â So the C major chord.

Â So and that's actually a very common thing to do in music and sound processing.

Â So if you were to take a sound signal a section of a sound signal and

Â do a discrete fourier transform and plot the so called spectrum which is what

Â this is, that is what sound engineers do all the time.

Â And ultimately underneath the hood what they're doing is a linear model

Â with a lot of sin and cosine terms in the, as regressors.

Â 15:48

I should say that the discovery of the Discrete Fourier Transform was a larger

Â advance, and that the fact that the reason people do the calculation

Â with the Fourier Transform rather than the linear model is because there was

Â a discovery by a very famous statistician named Tucci who discovered a way

Â to do the Fourier transform very quickly, the so-called fast way Fourier transform.

Â 16:14

That's what everyone uses.

Â It's I think, you can prove,

Â it's about as fast as you can take the Fourier transform.

Â It's much faster than doing the linear model, and that's why, if you were a sound

Â engineer, you wouldn't think of it as a linear model problem, you would think of

Â this as a Fourier transform problem, and you would take fast Fourier transforms.

Â So, that's in the end of the class, but I hope in this lecture,

Â what you've seen is, one, a case where we can fit pretty complicated functions

Â with a couple of lines of code of a couple of lines of our code very easily.

Â Secondly, an instance where we can do something that maybe at first blush,

Â we would've thought maybe that's not really in the scope of what linear models

Â can do in diagnosing what are the various notes of a chord.

Â That's something we just wouldn't naturally think a linear model could do.

Â We showed that not only are both of those things possible with linear models,

Â they're actually kind of easy with linear models, it's not

Â reams of code we had to do, there was, you know six or seven lines at the most for

Â each of these examples which includes generating the data.

Â So that just goes to show how powerful these techniques are and so

Â I hope you've enjoyed the class and

Â I especially hope that you take the knowledge from this class and build on it.

Â If you were to build on it my suggestion would be to

Â learn a little bit more about generalizing your models.

Â because we only touched on them in this class.

Â And the second thing would be to approach correlated data and longitudinal data.

Â Another divergent aspect of linear models that's quite important.

Â Everything we've done is independent errors and

Â data that were exchangeable at some level.

Â A very important subset is when that doesn't happen.

Â If we measure, take measurements,

Â on a large collection of siblings, the measurements between the siblings

Â are going to be closer together than a different pair of siblings.

Â So handling that correlation is a big part of generalizing

Â linear models to a broad collection of important topics.

Â So if I were to suggest two ways to go further with this stuff, that would be it,

Â generalized linear models, and longitudinal multi-level data.

Â 18:23

So thanks again for taking the class, and make sure to look for

Â us, me, Jeff Leak, and Roger Pang on Twitter.

Â For any interesting new stuff that's coming out of the data science lab here,

Â our data science program.

Â We have some other programs coming out along the way.

Â And so, we'll post them all on twitter and other social media stuff.

Â If you like what we've done in the data science specialization,

Â we've got a lot more coming.

Â So thanks again.

Â