We are now ready to start sequencing real spectra. But before we do this, we need to agree on how to score a peptide against a spectrum. After transforming both the peptide in spectrum into peptide vector and spectrum vector, it will be easy because this peptide vector and spectrum vector will have the same dimension, and thus, we can score a peptide against spectrum as simply the dot product of the peptide vector and the spectrum vector. And therefore, the Peptide Sequencing Problem can be defined as follows: Given a spectral vector, find a peptide vector with a maximum score against this spectral vector. The input to this problem is a spectral vector and the output is an amino acid string Peptide that maximizes the score between Peptide and Spectrum among all possible peptides. To solve this problem, we will construct a directed acyclic graph for a given spectral vector. For a spectral vector <s_1, ..., s_m>, we will construct a graph on m+1 nodes, with these nodes labeled as 0, 1, ..., m. And we will assign weight s_i to node i. How will we construct the edges in this graph? We will connect the node i to the node j, if j-i is equal to the mass of an amino acid. In our two alphabet of two amino acids x and z these masses 4 and 5, this will be the set of edges corresponding to amino acid x, and this will be the set of edges corresponding to amino acid y. And the score between Peptide and Spectrum will be defined as simply sum of scores or weights of nodes in this path. In this case it will be 22. The conclusion from this construction is that every peptide corresponds to a path from source to sink in the constructed DAG, and the score between Peptide and Spectrum is simply the sum of scores of the nodes this path visits. And therefore it would just reframed the Peptide Sequencing Problem as the problem of finding a maximum weight path in a node-weighted graph. We already know how to find the maximum paths in the edge-weighted graph, and it will be a good exercise for you to try to modify the algorithm that we previously learned for edge-weighted graphs into an algorithm for node-weighted graphs. We are now ready to attack the problem of finding an unknown peptide that generated a given spectrum. This is the spectrum that was generated by a mass spectrometer from an unknown peptide, and our goal is to solve the reverse problem: Given a spectrum, find what was the peptide that generated this spectrum. After we learned about spectral vectors, it looks like the solution of this problem will be easy. Indeed, it may appear that the only thing we need to do is to simply find the De Novo Reconstruction of the spectrum, or to find the peptide with the highest score against this spectrum. Before, we of course, will transform the spectrum into a spectral vector. In this case, the solution of this peptide sequencing problem will generate this peptide that, as it turns out, is incorrect. The reason it is incorrect is that, despite two decades of research, scoring functions that reliably assign the highest score to the biologically correct peptide remain unknown. It looks like we've put ourselves into a corner. We learned how to sequence peptides as the highest scoring peptide, but, as it turns out, this highest-scoring peptide is not biologically relevant. However, imagine for a second that you know the proteome of the species. If you know the proteome, what is the point of finding the highest scoring peptide among all peptides? Shouldn't we instead find the highest scoring peptide in the proteome? And, it turns out that, although the correct peptide may not score highest among all peptides, it typically scores highest among all peptides in the proteome. There is a small print here and this small print says "If the resulting score is sufficiently high", and we will talk more about this important condition later, but it looks like we reduced problem of interpreting the spectrum to peptide identification rather than peptide sequencing, which is reconstructing a peptide as the highest-scoring peptide occurring in a proteome. If you compare peptide sequencing with peptide identification, you will find something quite surprising. And as we saw in Peptide Sequencing, we defined the best scoring peptide as simply the longest path in the constructed DAG. On the other hand, in Peptide Identification, we find the solution by simply finding the highest-scoring peptide in the proteome. Hopefully, these two solutions will result in the same peptide, but it is often not the case. Interestingly, Peptide Sequencing is not unlike Peptide Identification. In the case when the proteome consists of all possible peptides, let's say in the case of peptides of length n, it will be 20^n peptides. What approach do you think is faster, Peptide Identification or Peptide Sequencing? It looks like Peptide Identification should be faster because the size of the space we have to explore is much smaller. However, the reality is that peptide sequencing algorithms are much faster, even though their search space is much larger. The reason is that peptide sequencing eliminates the time-consuming scan of proteome by modeling the problem as the longest path in the DAG problem. However, since the scoring function is imperfect, peptide sequencing remains inaccurate. State-of-the-art algorithms for peptide sequencing only identify 30% of peptides correctly.