[MUSIC] We will now talk about how this issue of zeros in randomized algorithms can be addressed. And it's not a minor annoyance, it's actually a serious issue because randomized algorithms do do not like working with zeroes. We will now learn how to deal with zeroes in the profile matrix. And in the real world, there is always a possibility that we do not observe one of the possible events. In this case, we assign them the empirical probability zero. But in fact, their real probability is not zero. It's simply the size of our sample is too small. And randomized algorithm hate zeroes because zeroes affect their performance in a serious way. That's why we need to learn how to get rid of zeroes. Fortunately, Laplace helped us to figure out how to do this. Laplace calculated the probability that the sun will not rise tomorrow, given the fact that it has risen in the last 5,000 years. And this probability is roughly one in two million, according to Laplace. His estimate was ridiculed because his opponents did not realize how important this calculation was. If we repeat an experiment that can result in a success or a failure n times and get s successes, then what is the probability that the next repetition will be a success? In other words, if X_1, X_2, X_{n+1} conditionally independent Boolean random variables (where failure corresponds to 0 and success corresponds to 1), then the traditional way to compute the probability of X_{n+1} is simply to compute the fraction of successes among among all trials, so it will be s/n. However, since we have a prior knowledge, that successes and failures exist, then they actually have two more observations implicitly, even before we started our trials. So, even before we started our trials we should acknowledge that success and failure are possible and therefore there are two more "invisible" events. That was Laplace argument, and therefore he made a correction based on the fact that when we see n events, we actually see n+2 events. And when we see s successes, we actually see s+1 successes, and the formula changes to (s+1)/(n+1). It may sound counter-intuitive but let's see how Laplace would get rid of zeroes in the profile matrix. In this case, what we need to do to update the profile matrix to get rid of zero? According to the Laplace's rule of succession, we should assume a possibility that every event represented in the profile matrix can argue. And therefore we should add 1s to every entry of the profile matrix. And here I show how Laplace would update the the count matrix by adding 1s to all entries. As a result, the profile matrix also will get updated, all probabilities will get updated, and we will get rid of all zeroes. Now, instead of a rolling a one-sided die, we can actually roll a seven-sided die. And based on the result of rolling a seven-sided die, we will choose a new instance of the motif in the sequence. Here it is. Afterwards, we iterate, and let's see how our motifs are changing. Starting from this, we once again continue with Gibbs sampler, remove one of the sequences, form motif matrix, build count matrix, it update it with pseudocounts by adding ones to every entry, compute profile matrix once again, compute probability of every k-mer, and once again roll a seven sided die. The seven-sided die leads to changing once again one of the choices of our motif and we start a new iteration. Again, we randomly remove one of the sequences, construct count matrix, apply Laplace's rule of succession, create profile, calculate probability again, roll a seven-sided die, and derive (once again, starting from completely random motif), in just three iterations, we almost captured the correct implanted motifs. Which means that the bias that implanted motif introduced was correctly captured by Gibbs Sampler, and led us to the correct motif. We are not still there, because, in the second sequence, we have not found the correct motif yet. But surely, at the next iteration, with high probability, we'll find it. We're almost done with covering randomized algorithms, but the challenge of predicting the degenerative signals remains. Remember how randomized motifs search did not find the perfect binding sites for this motif? What can we do? Well, there is one possibility. Maybe, maybe we can develop even better algorithm for finding motifs, but if you think about this, imagine that you implant not not a motif motif of length 15 with 4 mutations, but motif of length 15 with 5 mutations. Finding this motif is almost impossible, not because we don't have efficient algorithms, but because random motifs start competing with real motifs. Simply statistically, it may not be possible to find this motif if we implant them in long DNA strings. For example into strings of length 1,000. Well, what can be done? Should we give up in this case? Biologist did not give up and the way to solve this problem today is to resort to the technique which is called ChIP-sequencing. Essentially what the cheap ChiP-sequencing does, it reduces the length of DNA strings to look for regulatory motifs. As I mentioned, when you insert elusive patterns in the strings of length 1,000, it may not be possible even to find it even with the best available algorithms. But if biologists reduced the area where to look for motif, let's say from 1,000 to 100, then suddenly statistics works in our favor, and we can find such motifs. And this is the end of the story about randomized algorithms.