Hello, welcome to lecture four on Interest Rate Instruments. As I said in the end of the lecture three, we will be working on some data driven analysis on LIBOR and swap rates. I called it part I because there is a part after that which would be definitely part 2. One more time before we get to anything, as I said, in the agenda, the goal here is because of these highly coordinated rates, specially like LIBOR and swap rates, I just want to do some analysis before we get into any model calibration or making any model base when it comes to the fit. And, at the end of lecture three, we did cross correlation, we do q-q plots. And from there, from those graphs, it seems that it's the right time for us to start introducing some kind of a linear relationship which In this case would be actually a linear relationship. Then I'm starting with this so-called simple linear relationship between swap rates. What I'm doing here can be exactly done, the same can be done, for LIBOR rates. I'm going to leave it for you as an assignment that you exactly mimic what I'm doing here for swap rates. I want you to do it for LIBORs. I put write it both to you. You were having all these codes. Anyway, I'm going to go through these codes as well. But then, I'm leaving the LIBORs to you for you to do it. Then what I'm doing is this map. I'm saying that I mean with that loss of generality, let's assume I'm assuming the following relationship, that means between US swap rate 5, I'm building this relationship that is A nought + a1US swap rate 2. Or I can do 30 versus 10, which is this 30, again, a nougt + a1. And the goal is to find this a nought and a1, or you may actually say, why not build it against a few other ones because this linear one doesn't have to be between the two, that means I can build 30 against 2, 5, and 10. For simplicity, what I would do is I will write it as the following equation. I want to make sure that is clear what I'm doing. This is the same a naught, a1, a2 and a3. And that's because I want to make it more generic, I'm calling it the first one, the second one, the third one. And then, and knowing that I'm going through time series that I would be for various different times vary member for that one, the a nought, a1, a2, and a3, they are not changing. That's exactly relationship. I'm having at for short and writing it as, f xt theta which theta here is nothing but a nought, a1, a2, a3. And of course, x is nothing but this factor x1, x2, and x3t, that's the way you should looking at this. Good, and what coming out of this is what I call as yt hat. The reason for the hat is because it's an approximation to the true value yt. That means, assuming this is yt. And then, what I want to say is, what comes out of the model versus what comes out of the model, and I'm going to plot it against each other. Now, you guys, should be familiar if you're not you going to go through it. This is simply a linear regression problem. And there are many options to find those coefficients that means finding this theta which is a nought, a1 in the case of just having two. Or having a nought, a1, a2, a3 in case of having three rates, one against three rates. Now, the options to find those coefficients could be linear regression using least squares. We can use, also, gradient-free or gradient-based optimizer to minimize the following objective function. It's up to you which one you're going to use. And as you see, we've actually using both the regression and some gradient-free, actually, we will be using linear regression, gradient-free, and gradient-based. I will be using all three because that's kind of to remind us of how to do model calibration as well. Now, the objective function definitely would be something of these four. That means, you have in what's coming out of the model, what's the market, is setting up this kind of objective and replacing this one with that. We've already seen that one during the model calibration, this kind of objective function, but I just want to tell you that you can write it in various different ways. You can simply write it if you wish, something of this form, you could simply even drop this one over 2T because it doesn't make a difference, we're just writing it that way, just looking at an average. And T from 1 to 2 of some absolute value, yt- yt to the power p for p greater than or equal to 1, that could be one way of writing it. You can make it weighted. You remember we talked about this thing as well so this doesn't change with time. Just write it as double w as a scalar. There are various different ways that you can actually write that one, good. I'm having a very simple sample code for linear regression in Python. The data that I'm providing to you, I call it swapLiborData, comma-separated file. And, the first part of it has to do with, sorry, column 6 to 11 are for swap rate, and column 13 is the 30 year one that I'm doing the regression against. Anyway, I'm going to go through the code, you will see it, how it works. I'm just simply utilizing the linear regression model in Python. I'm using the regression, I'm getting the B out, and then I can also get the intercept for the regression. Now, what I've done is this, and also you getting the R squared just to know how good your fitness is, it is good measure of fitness which we call it R squared. What I'm doing is, I'm regressing 5 year against 2. I'm doing this one for a half of the data, that means the first half, the second half and overall. Then I'm regressing 5 year against 2 years from January 2nd 2014 to May 24, 2016. I'm being very consistent, exactly the way we did that for cross correlation as well. The coefficients that I'm getting, that means when I'm doing say, the five year, if I write it as x5t is = to a nought + a1xt2. These coefficients coming from regression are these two. Now, what I need also to look at how good my fit is. And the R squared is 4%, the best R squared you can get is, of course, 100%, and I'm seeing I'm far from 100, actually is not doing a very good job. If I go back and substitute these two and create this hat versus the true version, then said the true value, this is what you're seeing. And I know you see why R squared was so bad because the blue one is a constructed one. The blue one is the so-called the y hat, so you should have called this one actually y hat to be consistent and this is the y. Now, under the red one is actually the y, and as you see they are not even close, that means regression is not doing a good job here. Let's look at the second part now, this is not exact same regression, but this time from May 25th, 2016 to October 11th, 2018. And do exact same thing y hat is = to a nought + a1xt which this is 2 year and this is 5 year, I'm not writing it there. And these are the coefficients I'm getting and look at the R square. R square is 97% is pretty good. Now, substituting back that means grabbing this one, get this one and plotted against the original one, this is what you get. And as you see, is doing a very nice job, original versus constructed. That means, we're seeing that we are doing a very nice job. We miss a bit here, but that's fine. These things actually could be used for some signal trading, but that's not what our goal is. But, the people actually, typically use this one for signal trading, to extract certain features for signal trading. Now, what we are doing is, if we are not doing for the entire data? I'm now bringing the entire data in, again 5 versus 2. As I said, I did the first half, I did the second half, not the entire data. These are the coefficients I'm getting from this Python quote. I'm going to go actually through this in details. And, this sum I'm seeing that I'm getting R squared as 77%. It's better than the first half, but then still, you seeing that many parts of it you are missing, specifically here and there, and you are doing a better job over here, no question about that. Now, what we are doing is, that would be bringing more in by more means as are suppose to just one against the other, I'm actually would regress 30 year against, my apologies. First, let's not jump, I'm not doing the long data version, my apologies. I'm regressing 30 against 15, exact same way of 5 against 2 for first half, second half, and the entire dataset. This is for the first half of the dataset. These are the coefficients I'm getting, and look at the R squared pretty good, 99.5%. And then, using the theta that I have, again, to construct a time series here, this one with the 15, and this is the one which corresponds to the 33-year swap. And as you see, this is one against the other the constructed theory versus original one. You see that you're doing your very, very nice job there. Let's go to the second half, from May 25th, 2016 to October 11th, 2018, these are the coefficients I'm getting for a nought and a1, 99.5% again. And if you plot it against each other, you see that you're doing it very nice job. And the last one is for 30 versus 15 for the entire data. For the entire data, when you're doing it, you'll see R2 is good is so pretty good but it goes down actually from 99.5% it goes to 95.4%. And again, if you plot it against each other, you'll see that the the sum part is slightly missing here, slightly missing there but the movements are very similar. The very, very last one that I do is, you can do it for various different ones, I just did it for one. I can leave it for you to wrap it up for the other two periods. I'm doing 30 versus 2, 5, and 10. For the first half, the reason for the first half is, because you recognize for the first half the curve was very, very steep. And, I want to show you that for the case that there were not that much correlation between the two but if you bring all three, you still can do a very nice job. I'm actually you seen that if you do it like this that I'm having y hat for 30 against a nought + a1 x1, sorry x2 + this is 2, this is 5. So there was a2 I missed here + a 3. You seen that you are getting 99.4% R squared because I'm having all three against one. And we are doing a good job here, and this is the plot against each other. We can do exact same thing here when you do 30 versus 2, 5, 10 for the second half and for the entire data. I can leave it as an exercise, but should be very, very simple exercise anyway. I will be going through the code anyway for this part and you seeing how quickly you can actually do it by simply changing the start and the end date. Thank you.