A major problem in estimating the term structure is the high dimensionality of the discount curve. In this part we aim at finding the basis shapes of the yield curve by looking at time series of yield curves. The technique we'll use is Principal Component Analysis(PCA). This is one of the most important dimension reduction techniques in multivariate data analysis. The key mathematical principle underlying PCA is the spectral theorem from linear algebra. It states that any real symmetric end by end matrix Q can be decomposed as shown here into the product of A which is an orthogonal matrix whose columns are the normalized Eigen vectors of Q. They from an orthonormal basis of R-n. And L which is the diagonal matrix containing the Eigen values of Q we'll always assume without loss of generality that these eigen values are ordered in this decreasing sense and times A-transpose. Now let X be an n-dimensional random vector with mean denoted by μ and co variance matrix denoted by Q. Note that the co variance is a symmetric and positive semi definite matrix. We can think of X as being high dimensional and is large, it could be a model for the shape of the yield curve. In this figure I illustrate the case where the dimension is 2. We see the mean vector μ and this ellipse indicates that the support of the vector X is somewhat scattered with the shape. Formally speaking, this ellipse is a level set of the co variance matrix acting as a quadratic form on R-2. The spectral theorem applied to the co variance matrix Q gives us the matrices A and L. L is diagonal with the eigen values which are non negative on the diagonal, and A stacks the eigen vectors of Q as normalized column vectors a-1 up to a-n. a-1 corresponding to the largest eigen value is giving us the largest variability of X. The principal component transform of X is given by projecting X onto these eigen vectors. In other words, we get the component Y-i by projecting the de-centred μ onto the eigen vector a-i. We call Y-i the ith principal component of X and we call a-i the ith vector of loadings of X. In matrix notation, these rates are shown here. We can re express X in terms of its principle components and that reads like this here. As we see in this figure here Y-1 is now capturing the major variability of X and Y-2 is what remains. Indeed this is confirmed on the following page one can formally show, and that is an elementary exercise, that the principal components have mean 0 and they have a co variance matrix which is diagonal. It is actually equal to L. That means The principal components are uncorrelated and have variances given by 'λ-i's. Because the 'λ-i's are ordered it follows that Y-1 has maximal variance among all standardized linear combinations of X as shown here. Going on Y-2 has maximal variance among all such linear combinations which are orthogonal to A-1, the first eigen vector. And generally speaking Y-i has maximal variance among all such linear combinations which are orthogonal to the first i-1 linear combinations. Observe that the sum of all variances of all the components X-i is preserved under the principal component transformed. If we thus choose a number K which is smaller than N, and sum up the first 'λ-i's from 1 to K, and hold this against the sum of all the eigen values, we thus obtain the amount of variability in X explained by the first K principle components. As such, we could think of X being an N-dimensional model for the yield curve, or the daily changes of the yield curve. If then the first K principle components explain a significant amount of variability in X, it is appropriate to approximate X by the first K principle components as shown here. This is now a lower dimensional model which however captures the variability of the original model quite well. The loadings a-1to a-k are in turn, the main components of the stochastic yield curve movements. Before we look into an example, Let us recall the sample analog of what we just did for the random variable X. Assume we have a time series of observations that we stack into the matrix X whose T-th column is a sample realization of a random vector X(t). We assume that this random vector X(t) is distributed as X with mean μ and co variance matrix Q. We then estimate the mean vector of X by the empirical mean of the data and the co variance matrix Q by the empirical n x n co variance matrix as shown here. Applying the spectral theorem to the empirical co variance matrix gives us the empirical counter parts eigen values λ and eigen vectors â. indicated now with hats. The sample principal component decomposition of X is then given as shown here with the empirical principal components y and the loadings â-i. These empirical principal components are uncorrelated again by construction. The empirical mean and co variance matrix μ̂ and Q̂ are standard estimates for the true parameters μ and Q if the observations of X(t) are serially uncorrelated as shown here. If this kind of stationarity of the time series X(t) is in doubt the standard practice is to differentiate and to consider the increments. We are now ready for the example and consider monthly changes of the yield curve of the Swiss government bonds over the ten years from August 2005 until July 2015. For 8 times to maturities τ-i ranging from 2 years up to 30 years, running the PCA on this time series gives us the following first 3 yield curve loadings. Remember each loading is a vector with 8 components and we'll represent the vector as a function with values given by its components.. So the first loading is this black line. The second loading is the blue line, and the third loading is the red line. If we look at their shapes, we see that the first loading is roughly flat. It affects the parallel shifts and can thus be dubbed the level. The second loading is upward sloping It affects the tilting of the yield curve and the third is hump-shaped, it affects the flexing of the yield curve. It has become common to speak of these shapes as level, slope, and curvature of the yield curve. The explained variance of these first 3 principle components is as shown here. The first principle component explains roughly 77% of the variance, the second explains 18%, the third explains 3%. In sum, the first three principle components explain more than 98% of the variance of the yield curve. As a consequence the yield curve (movements) can be approximated by a linear combination of the first three loadings level, slope, curvature, with small relative error. In other words, a 2 or 3 factor model will do a very good job in fitting the time series of the yield curve.