In the last section, we saw that the motif enumeration algorithm that we proposed had a flaw. It assumed that the motif that we were looking for must occur in every string in the collection of strings under consideration, which is not necessarily true in biological applications. As an alternative, our plan is to assign a score to a give collection of chambers chosen from the strings. So say that we have selected such a set of chambers which we call Motifs, represented here as a matrix. We indicate the most frequent nucleotide in each column of this matrix by a capital letter, breaking ties arbitrarily. So note that positions two and three in the Motifs matrix are the most conserved, nucleotide g occurs in these positions in every. Whereas position ten, at the end of the Motif matrix, is the least conserved. A common method of visualizing a Motif matrix, is with a motif logo, in which the size of a nucleotide correlates with how often it occurs in the Motif matrix. Taking the capital letter in each column, construct a consensus string for this particular choice of k-mers that will also provide us with a candidate Motif. Our goal then is to choose k-mers from the underlying strings, so that they produce the most conserve Motif matrix. Then simply take the Consensus String as our desired Motif. We therefore are going to define the score of the Motif matrix as simply the number of lowercase letters in the matrix. Counting the number of lowercase letters column by column, we see that the score for this Motif is equal to 30. We can now restate our goal as choosing a collection of k-mers from the underlying strings in order to minimize this score. So, we now have a more appropriate computational problem, which we call the motif finding problem. A brute force algorithm for the motif finding problem would simply consider every possible choice of k-mers to find a Motif matrix, and then simply take a collection of k-mers having the lowest score. But as with any brute force approach, we need to ask ourselves how fast this algorithm is. Assuming that there are say t strings under consideration, all of length n, then there are (n- k +1) t such possibilities for how to form a Motifs matrix. Then scoring the matrix requires k times t steps. So assuming that k is much smaller than n, as this is the case in practice, the overall running time of this approach is approximately O(n to the t * k * t). Now the n to the t term means that this algorithm is useless for practical values of n and t, and so we are going to need a faster algorithm. So to improve on our existing brute force approach, we will rethink the motif finding problem. Specifically, I've noted here that when we choose the motif matrix it automatically gives us a consensus string. Our new approach will be to choose the consensus string first, then look for a motif matrix that scores best against this consensus. How do we achieve this? Well until now, we have been scoring the motif matrix column by column. However, we could just as easily compute the number of lower case symbols in the motif matrix row by row. I want you to note that the number of lower case symbols in a row is equal to the number of symbols in that row that don't match the consensus, which we know as the hamming distance between that row and the consensus string. As a result, the score of the Motif matrix is just the sum of Hamming distances between each string in the matrix and the Consensus string which we're going to write as, d(Consensus(Motifs), Motifs). This discussion now leads us to the Equivalent Motif Finding Problem, in which we are looking for a pattern that will serve as our consensus string, i.e., a pattern that minimizes the distance to motifs over all choices of motifs in the underlying string. It may still seem that we have not made our work any easier though. Since we are now looking for a pattern in addition to a collection of motifs, whereas before we just looked for a collection of motifs. However, we will see in the next section that we don't need to search through every possible choice of motifs in order to find an optimal consensus string, which is what we're looking for.