We are now ready to develop a Spectral Alignment Algorithm, and we will capitalize on the observation that a modification corresponds to either block insertion, or block deletion in the Peptide vector. And, therefore, maybe we can us an analogy with sequence alignment, where "alignment" corresponds to the minimum of insertion, deletion, and substitutions of amino acids. So, our goal is to figure out whether we can find similarities between Peptide vector, and Spectrum vector by finding a longest path in a properly constructed spectral alignment graph. The only question is how to construct this graph. Let's start. In the beginning, let's construct first a graph that I call Southeast(m,m+delta), which consists of a grid with m+1 rows and m+delta+1 columns. Edges in this graph will simply correspond to all edges going in the Southeast direction. And given that peptide a_1,...,a_m of mass m and its modified variant peptide^mod a'_1,...,a'_n of mass m+delta, we can construct a path formed by this Peptide, Peptide and Peptide', in the graph Southeast(m,m+delta), and such path is shown in this graph. So this blue path corresponds to a modified Peptide with two modifications, shown in blue here, and this red path corresponds to a Peptide with three modifications, shown in red here. So far, I failed to explain how to construct a path between an unmodified Peptide, in green here, and its modified version, shown in blue here. Let me do this now. So this path will start at the the node (0,0), then move to the node with the coordinate equal to the masses of prefixes of length 1 of this Peptide, then move to the coordinate corresponding to masses of prefixes of length 2, and finally, move to the node with coordinate corresponding to the masses of the entire unmodified and modified Peptides. We will also distinguish between two types of edges in the resulting graph. The ones that are showed by dashed edges are non-diagonal edges, and these are edges corresponding to modifications. They are non-diagonal because modification has happened. And this is an example of a diagonal edge. There is no modification, and the masses of amino acids in the unmodified Peptide and in modified Peptide are the same. So, we will consider different paths in the Southeast(m,m+delta) and, in fact, any Peptide with mass m+delta will correspond to a path in the Southeast graph. Moreover, we can substitute these paths by a Spectral vector, with mass m+delta. And therefore, any possible interpretation of a Spectral vector with mass m+delta will correspond to a path in this graph. Therefore, I will rename this Southeast graph into the graph that is denoted Southeast(Peptide,Spectrum), and moreover, I will remove all light rows from this graph, because I don't need them. All paths corresponding to a pair of unmodified and modified Peptides traverse only dark rows in this graph. And therefore, all light rows can be safely removed. The resulting graph will be called PSMGraph(Peptide,Spectrum), which is defined as the graph where the weights of all nodes in column i are equal to the i-th coordinate in the Spectrum vector, and the rows in this graph, it's important to note, have indices not equal to 0, 1, 2, 3, but equal to 0, mass(a1), mass(a1a2), ..., and masses of prefixes of the unmodified Peptide. And, in this formulation, the score between the modified Peptide, and Spectrum is simply the weight of the path for the modified Peptide. And, therefore, we have reduced the spectral alignment problem to the following problem in the PSMGraph: Given PSMGraph(Peptide,Spectrum) and an integer k, we want to find a maximum-weight path in this graph with at most k non-diagonal edges. Well, we already know how to search for a maximum-weight path. But how would you search for a maximum-weight path with an additional constraint that it has at most k non-diagonal edges. To find an optimal path in the PSMGraph with an additional constraints on the number of non-diagonal edges k, we will construct yet another graph, which is called the Spectral Alignment Graph. This graph will have k+1 layers, and each layer will be identical to the PSMGraph. Weights of nodes in the Spectral Alignment Graph, will be simply inherited from the nodes in the PSMGraph. Now, this graph on the right side of the slide, Spectrum Alignment Graph, is currently disconnected. There are no edges connecting the different layers of this graph. We will connect layers of this graph in the following way: Diagonal edges from the PSMGraph will stay on the same layer. For example, the blue edge stays on the same layer in this case. But non-diagonal edges from the PSMGraph will move to the next layer, like in the case of the non-diagonal red edge, on the right of the slide. Now, for the case of the next pair of edges, the same rule applies. And finally, for every path in the original PSMGraph, we will have a path in the Spectral Alignment Graph going through multiple layers, and the spectral alignment problem is now reformulated as finding paths in the k-layer Spectral Alignment Graph, and our goal is to find a maximum-weight path in this graph. Before you implement the Spectral Alignment challenge, let me clarify some implementation details for you. I will introduce the notion of the diff array. The diff array will simply represent differences in the indexes of consecutive rows in the PSMGraph. For example, diff(2) = 2-0 = 2 diff(5) = 5-2 = 3 And here, all the elements of the diff array are presented. Now, to solve a longest path problem in the spectral alignment graph, we will define the variable score(i,j,t), which is the maximum weight of paths from (0,0,0), the source of this graph, to the node (i,j,t). And node (i,j,t) has just one predecessor in the same layer, and we can compute the coordinates of this node using the diff array. And it has j predecessors in the previous layer, and we also can compute the coordinates of these nodes using the diff array. Therefore, score(i,j,k) can be defined as the s_j + maximum over all predecessors, score(predecessor) To be more precise, we will use the diff array to specify the exact element of this recurrence. And, of course, the maximum score of all Peptide with at most k modifications is simply the maximum value of score(m,m+delta,t) for all possible t, represented by green nodes in this graph. After you learned about so many algorithms from computational proteomics, the time has come to address the challenge problem for this chapter. In the same paper where Azara presented T-Rex peptides, they also presented mastodon peptides, and we want you to re-analyze them from the perspective of statistical significance. It actually has never been done. There is nothing surprising about finding collagen peptides in mastodons because mastodons roamed the Earth, just 10,000 years ago. They went extinct because of the climate change and over-hunting by humans armed with stone weapons. So, we want you to analyze the mastodon peptides reported in this paper and decide which of them are statistically significant. We want you do find other statistically significant mastodon peptides that this paper possibly missed. And we want you to find hemoglobin peptides, if you can, matching spectra from mastodons.