At the end of the last section, I gave you a 4 x 4 distance matrix, and I asked you to apply the algorithm from the last section in order to reconstruct the simple tree fitting it. I hope that you were driven crazy by this exercise. Things were probably coming along just fine, And then you encountered an edge with negative weight, and everything went haywire. Maybe you think I was mean enough to give you a non-additive matrix, but there does exist a simple tree fitting this matrix. If you don't believe me, here it is. So there must be something wrong with our proposed algorithm. If we look at the matrix itself, then the minimum element is supposed to correspond to neighbors j and k. But there's just one tiny problem with this, which is that j and k are not neighbors in the tree. So hopefully you will see why I tricked you with leading us down the wrong path. When we're designing an algorithm, we must be mathematically rigorous. Just because a statement may seem to be true, doesn't necessarily make it true. And you will see later that leading you down the wrong path on a search for neighbors wasn't completely a lost cause either. For now, let's not lose hope for a method that will reconstruct a tree that fits an additive distance matrix. The idea that we had of using recursion was a good idea, but rather than trying to reduce the size of the tree by finding a pair of neighbors, let's instead try and compute the length of limbs or the edges that are connected to a tree's leaves. Now, we found a formula for the length of a limb in the previous section, but that was assuming that i and j were neighbors. However, this formula is at least in the ballpark. To compute the limb length of i, we will need to take the minimum value of this expression as we let j and k range over all leaves of the tree. If you're interested in seeing a proof of this theorem, just see the main text. So, note that here we denote the length of the limb of j using the notation LimbLength(j). Now that we have this theorem, you can use it compute the length of any limb of the tree, which we can state as a computational problem and you can code yourself. When we return to our toy distance matrix, let's pretend that we don't know what the tree looks like. We can substitute chimp for i here in the Limb Length Theorem. Now we just need to compute the formula in the Limb Length Theorem for all other choices of leaves j and k. We can take these leaves to be human and seal, in which case when we plug in the matrix values, we get 1. We could take these leaves to be human and whale, in which case we get 1 again when we plug in the matrix values. Or we could take these leaves to be whale and seal, in which case we get 4 when we plug in the matrix values. So the minimum of these three values is 1. And that confirms what we already knew was the limb length of chimp. Now we're ready to introduce a recursive algorithm called AdditivePhylogeny that operates by computing limb length. So let me show you the outline of how this algorithm works on the additive matrix that we encountered at the end of the last section and that I tripped you up on. We begin by selecting an arbitrary leaf, say j. We can then apply the Limb Length Theorem to find that the length of this leaf, excuse me, the length of this limb, which in this case is going to be equal to 2. We can form a matrix Dbald by then subtracting the limb length from each entry in the j-th row and column of D. We will see the purpose of doing this subtraction soon. But note that what we're doing is really compressing j towards its parent so that the limb of j has length 0. This is called a "bald limb". So we'll hang onto j for later, but for now, let's remove the j-th row and column from the bald matrix to form a smaller matrix called Dtrim. We will then call AdditivePhylogeny on this smaller matrix to reconstruct the simple tree that fits this trimmed matrix. Now, to add j back to the tree, we must identify the point in the simple tree fitting Dtrim where j attaches to the tree. Once we find this point, we can attach j by an edge of length 2, the limb length that we computed previously. So here is a summary of AdditivePhylogeny. You may want to take a moment to review it before we continue. If you look at this carefully, and if you were paying close attention to what I was saying, then each step of this algorithm may have been clear, except for step 6, where we need to identify the attachment point, to put j back in the tree of the trimmed matrix to form the tree of D. This is where that bald matrix should come in handy. So remember that Dbald was formed by subtracting the limb length of j from each element in the j-th row and j-th column of D. So we'll apply the Limb Length Theorem then, to this bald matrix, on the length of the limb of j. But we know that the length of the limb of j in the bald matrix is just 0. And so when we take this minimum, we see that we obtain that Dbald_i,j plus Dbald_j,k minus Dbald_i,k over 2 is going to be equal to 0, because 11 plus 10 minus 21 is equal to 0. So if we move Dbald_i,k to the right side of this equation, this means that Dbald_i,k is equal to the sum of Dbald_i,j and Dbald_j,k. So how do we interpret this formula? Well, it means that the attachment point for j must occur somewhere along the path from i to k. And specifically, it means that this attachment point must occur at distance Dbald_i,j, which is equal to 11, from leaf i. So, this time the attachment point occurred along an edge of the tree, meaning that we can form a new internal node along that edge at the attachment point. But I want you to note that the attachment point for j could theoretically have occurred at an existing internal node of this tree, in which case we simply need to just join the limb of j onto this existing node. So, now that know how to find the attachment point, you should be ready to implement AdditivePhylogeny for yourself. Unfortunately though, when we go to apply this algorithm to the spike protein distance matrix taken from coronaviruses, we find that the matrix is not additive; i.e, somewhere along the way, if you apply AdditivePhylogeny, you're going to get a negative edge weight and things are going to go wrong. So, for now, let's just fudge this matrix a little bit, to yield the following matrix that's additive. Just from this distance matrix, I would hope that we should have a pretty strong guess which animal gave us SARS because there's one species that's much closer to humans than the rest of the species. Of course, we should be wary of jumping to this conclusion because we already know that the smallest element of a distance matrix doesn't necessarily correspond to neighboring leaves. But when we run AdditivePhylogeny on this adjusted distance matrix, our suspicions are indeed confirmed. The animal that gave us SARS is the palm civet, a creature so fearsome that I hesitate to even show it to you. Now, despite the undeniable cuteness of the palm civet in this photo, it is indigenous to Southern China where it is harvested for its meat. So it does make sense that at some point during the process, the palm civet infected us with SARS, and that's how it crossed the species barrier. Additional research has revealed, though, that the palm civet coronavirus is very similar to some coronaviruses found in bats, which are notorious disease carriers. So it is very possible that bats gave SARS to palm civets, which then passed the disease on to us. I'd also like to just make a brief aside that you may have heard about the palm civet for a slightly different reason. It feeds on raw coffee cherries, and some people say that when the beans pass through its digestive tract, certain proteins get broken down that reduce the bitter flavor of the coffee. For this reason, these predigested coffee beans, after they pass through the palm civet, have become somewhat of a luxury item. Colloquially called "cat poop coffee", this type of coffee sells for hundreds of dollars per kilogram. But regardless of what your morning coffee tastes like, the fact remains that we had to fudge the coronavirus distance matrix a little bit in order to get an additive matrix. In fact, the majority of distance matrices that we encounter in practice are going to be non-additive. So our question is, how can we construct an evolutionary tree for these non-additive matrices?