Hi. Welcome back to Sequence and Time Series lectures. In this lecture, we'll focus on what is known as the Time Series Motifs Search problem and we'll learn about algorithms to implement time-series Motif Search. So first of all, let's remember what a time-series Motif is. So, in this slide we have three time series, the first red series, once again if you remember from the earlier lectures, is the popularity of the keyword Big Data over time. So, in this case we are looking at the time series from 2013 up to today, and we see how the term big data, popularity of the term big data changes over time. The second blue time series is the popularity of the term Machine Learning. Once again, we track the popularity of the term Machine Learning over time from 2013 until today. The third time series, Deep Learning, is in this case plotted with a yellow curve, keeps track of the popularity of the keywords Deep Learning over time. As you see here, the time series that we have as examples show certain repeated patterns. For example in the case of the Big Data, we see a repeated pattern that occurs over time, almost identically. We see a similar pattern in the keyword Machine Learning. So, we see that basically almost around the same timeframe, the keyword Machine Learning shows a repeated pattern. We see the same repeated pattern, but much less strongly on the keyword Deep Learning. If you look very closely, we see that the keyword Deep Learning also shows a similar pattern almost at identical time to Big Data and Machine Learning, but not as strongly. So, for example, while we see very slight patterns earlier, those patterns are not very strong. Even at later stages the pattern that we see is not very easy to identify. In the case of Big Data, the pattern is very easy to notice. In the case of Machine Learning, the pattern is very easy to notice. In the case of Deep Learning, the pattern is not easy to notice right. So, these patterns, these repeated patterns, we call them Motifs. So Motif essentially is nothing but a repeated pattern in a time series. So, these repeated patterns are often important to understand because usually the data, the time series that we have, covered the long period and a repeated pattern can tell us the occurrence of the same event or similar events over time. For being able to identify a Motif is important, to be able to detect occurrence of similar events over time. So, Motif Search is an important tool we have in analysis of time series and decision-making based on time series. Of course, the example also shows that, furthermore, for motifs can be easy or hard depending on the time series. So, in the case of the Big Data, this Motif is very well-defined API, easy to identify. Whereas, in the case of Deep Learning, the Motif while it seemed to exist, it is much less powerful, much less strong and correspondingly, it may be more difficult to identify. So, what we'll do in this lecture, we will first give a framework to Search for motifs. Then we will discuss metrics or measures to quantify the quality or strength of a Motif and I will be able to see how this algorithmic framework can be modified to better identify stronger or more interesting motifs in time. Okay. So, let's basically take a look at the General Motif Search Algorithm. For the Motif Search Algorithms are actually very simple. So, the basic algorithm and essentially all water Search algorithms are variants of this very simple algorithm and the general algorithm essentially has two steps. The first step, what we do is we enumerate or identify subsequences of time series and usually we assume that we are given a length. So, given time and given my time series, essentially what I do is, I enumerate it's subsequences of the same length, say W. So, that's the first step, I create a set of subsequences for my time series. In the second step, we apply a clustering algorithm to the set of subsequences to identify groups of similar subsequences. That is, I was given a number of subsequences. I apply a clustering algorithm to identify those subsequences that are similar to each other, and essentially the subsequences that are similar to each other, I referred to as motifs of my time series. If I have lots of subsequences that are similar to each other that means that I have a Motif with lots of entries in it. So, the algorithm is very simple. The enumeration of subsequences, a trivial operation, I just need to take a window size W and I shifted on my time series to identify subsequences of my time series of length W. Then, I need to apply clustering algorithm to identify clusters of time series. Of course, I can apply different clustering algorithms and this is basically where many Motif Search algorithms change. In this class, I'm not going to talk about these clustering algorithms because we learned about clustering in other lectures in this course. So, if you want to basically see what these clustering algorithms might be, I encourage you to revisit some of the other lectures where the different clustering algorithms are discussed. So, this is it. I already learned about the Motif Search. We already know how the Motif Search works. So, the key question however is, as we discussed earlier, not all motifs are equivalent to each other. So, it is possible that for example in this example, the Motif I have A here and the motifs that identify as B, it might be that Motif A is better than Motif B. So, can I somehow quantify how strong Motif A is relative to Motif B? And Maybe, I can also use the difference between the different Motifs to make the Motif Search algorithm more efficient by focusing on stronger Motifs and ignoring less stronger Motifs earlier. Why do I have to do that? Well, because remember what I did is, I took a time series and I found all W length subsequence of it. There may be many, many such subsequences. Which essentially may mean that the clustering step that I am applying to Search our Motif might be expensive. This means that, if I can somehow identify better Motif early on, I might be able to reduce the cost of the clustering algorithm. So, we'll discuss how we might be able to achieve that. Okay. So, let's basically discuss a bit more detail, what do we mean by Motif quality? What's a better Motif? What is a less better Motif? So, let's take a look at two cases, right.? So, here I plot the popularity of the Search term Oscars over time. As you can see, the Search term of Oscars have a very well-defined behaviors in time. For a long period, the Search term almost doesn't appear. It looks like there is a peak. Small peak, tiny peak that occurs before the Oscars. This is probably when the Oscar candidates are announced and then there is a big jump in the term Oscars, possibly, potentially, when right before or right after the actual Oscar Ceremony. So, it is clear that this motif is a good Motif. It is different than its neighborhood, very different pattern and it is very strong. Let's take a look at another search term. In this case I'm looking at the search term Politics. Note that this search term also seem to have a motif. So, I seem to have a pattern that is repeating itself every four years and so. Note that this might indicate that there is something happening every four years that gives us this pattern. However, it is also important to note that, first of all, the patterns that basically I might call motif doesn't look identical to each other. Their amplitudes seem to vary a lot, and also, what I call motif is not necessarily very different from its neighborhood. It's kind of different from its neighborhood, but it's not very different, so it's not easy to isolate. So, this essentially would mean that the Oscars have a better high-quality motif whereas Politics seemed to have a motif but the motif is not a high-quality motif. So, next question is, how can I quantify this? So, what we'll do we'll basically discuss couple of measures that can basically be somehow easier to quantify. One of those measures is the support of a motif. So, in this example, my time series have two motifs: motif A, shown in green and motif B, shown in red. Note that in this case, motif A has a higher support than motif B. Motif A is occurring three times, whereas motif B is occurring only twice. So, motif A has a higher support. It's occurring more frequently than motif B. So, this is one of the measures. Usually we want things that repeat more often or more in quantity. Now, the second measures of motif quality is the distortion. So, let's see. In this example, I have two motifs, motif A and motif B. If I look closely, I will see that both motif A and motif B have the same support because both of them are occurring twice. However, the motif A has a very similar shape. If I compute the distance between the subsequences, I will see that motif A is very close to each other, whereas motif B is less so. Yes, the pattern looks similar but not identical. So, I could say that motif A is more pure or has a smaller distortion than motif B. So, that is another quality measure I have when I am searching for motifs. I usually want to find motifs that have lesser distortion or higher purity. The third measure that I often use when I search for motifs is dynamicity. So, usually we are looking for patterns that shows us that something is happening. Something is happening often translates, not always, but often translates to strong changes on the time series. So, if I basically look at the third example on the slide, I see that once again I have two motifs: motif A and motif B. I can see that motif B has a higher support. It's occurring three times. Motif A is only occurring twice. However, motif B seems to indicate nothing happening much because the time series is more or less flat and there not that much changes. On the other hand, motif A has a much higher dynamicity. Things are happening, things are changing. So, it may be possible that motif A might be more interesting even though it has a lesser support. So, to recap support, distortion, and dynamicity can be used depending on the application as quality measures to select among different motifs. So, which essentially means that we can modify our search algorithm. So, remember, our search algorithms, the way we present it had two steps. The first step was enumeration of all subsequences, and the second step was applying a clustering algorithm to identify groups of similar subsequences. Now, we can introduce a third step. In this case, the third step essentially is the pruning of the clusters. So, I can look at the clusters that I obtained and I can eliminate those clusters that have only few subsequence in it, because those clusters essentially do not have enough support, or I can prune those clusters where the subsequences, yes, they are similar to each other, but they are not very similar to each other. So, I can eliminate those clusters because they are not as well-defined motifs. So, they are motif consisting of higher distortion or lesser purity. Finally, I can prune those clusters where the subsequences do not have large dynamicity because the clusters where the dynamicoty is low may point to regions of the time series by nothing much happening. Yes, it is true that there is a repeated pattern, but the pattern may not tell us much. I might be more interested in patterns even though they may have less of support, patterns that basically where things are happening. So, now our general algorithm has three steps; enumeration of subsequences, applying clustering, and elimination of the clusters. Now, this great, but note that what is happening is that I am eliminating the clusters as the last step. The problem with the elimination of the cluster as last step is that the expansive steps which is the integration of the subsequences and application of the clustering still has to be done using all subsequences, which means that it will still be costly. So, wouldn't it be nice instead of eliminating the clusters later, I could have somehow started applying my pruning operation earlier so that I don't need to enumerate all subsequences or even if I enumerate all subsequences, I can prune away subsequences that are not necessarily interesting. So, usually that's what we do. We don't necessarily enumerate all subsequences, but we'll enumerate subset of the subsequences or even if you enumerate all subsequences before we applied our clustering algorithm, we prune away some of the subsequences so that the clustering can be implemented efficiently. So, how can this work? So, one possibility for example is prune a way in a subsequence that is not very different from its immediate neighborhood. So, why would I do that? Remember, in this case, so we had said that this entry here is not very good, because it is not very different from its immediate neighborhood. So, what is happening in this region, is kind of similar to what's happening here, and that's kind of similar to what's happening here. It's not very different. So, if I try to avoid Motifs, entries like this, one way to do that is to look at the subsequence, and look at the nearby subsequences, and see if my subsequence is different than its neighborhood. In this case, it's clear. The pattern that I have, is very different than this neighborhood. So, this is a very well-defined pattern, and I can identify that, no problem. But maybe in this case, I shouldn't identify indicator politics, maybe I shouldn't identify this as a pattern, because it is actually not very different than what is happening, it's in neighborhood. So, it is not really telling me something very important that is happening. Again, these are mostly heuristics, and you may decide, in depending on the application, you may decide to apply that pruning, or you may decide not to apply that pruning. So, for example in this case, it may be that this is not a good Motif entry. So, I might want to prune it because not very well differentiated from it's neighborhood, or I can say, well you know what, I still want to maintain it even though it is not very different than its neighborhood. That kind of depends on the application, but if I can make this assumption, that can help me prune away large number of subsequences, and reduce the cost of my clustering algorithm. So, the second pruning strategy I can use once again, is to eliminate those subsequences that have low dynamicity. That is in this case, I might prune these away, because they have low dynamicity, that way I can more easily and more quickly identify my motif A, because I pruned away some of the subsequences, and the remaining subsequences, there are fewer of them, so my clustering algorithm will work faster, and also the motif that I will identify will have higher dynamicity, and will point to patterns where things are changing, or things are happening. The third strategy essentially is to prune away subsequences where there are no other similar subsequences to that. Now, that is in a sense saying that, remember what I want to do is high support, if I have a subsequence that there is no other subsequent that is similar to that, it cannot really be part of a cluster with large membership. So, any subsequence that is very distinct from everything else, cannot be part of a motif. So, if I can quickly identify those subsequences that are very different from everything else in my time series, I know that they cannot be part of a motif, and I may want to prune them away. So usually, motif search algorithms, different motif search algorithms, apply different heuristics to reduce the cost of the clustering algorithm. So now, our motif search algorithm has three phases. In the first phase, we enumerate subsequences, but maybe not all subsequences, but we prune away some of the subsequences. Both subsequent data are not going to lead to good motifs, then we apply the clustering algorithm to find similar subsequences, and then we apply pruning on the clusters that we identified. We basically remove clusters with few elements in them, we remove clusters where subsequences are not very similar to each other, and we may remove clusters where the dynamicity is still low. Now, this is it. If I do that, do I solve my motif search problem? But it turned out that this is not the case, because sometimes it may be that the motifs may although look similar in pattern, may have very different amplitudes, average amplitudes for example. If you take a look at the social NBA, and if you basically track it from 2004 untill today, we will see that a pattern, a motif emerges. But the motif have very different size or amplitude from the beginning to the end. So,, the first few instances of NBA show low amplitude, whereas last few instances of the motif have much higher amplitude, which essentially means that I need to be careful when I did my clustering, because if I don't do my clustering carefully, I might not be able to put these in the same cluster because the clustering algorithm may say well, this and this, since they have very big difference, they don't necessarily belong to the same cluster. So, I need to be careful when I apply my clustering, if I want to say, well, this is part of the same motif, I need to basically use a clustering algorithm where the amplitude differences are not considered strongly. So, I might not want to use something like Euclidean distance, where amplitude differences would be highlighted if I want to be able to put this subsequence, and that subsequence in the same cluster. So, I need to be careful. I need to know ahead of time what type of motif I'm searching. Another possibility is that motifs may not have the same temporal length, may not have the same duration. So, let's see an example. So, if I basically look at my keyword Olympics, I will see that there is a pattern, there is a repeated pattern. So, the repeated pattern, However, have instances of different durations. Then my instances are of different amplitudes, but on top of that the duration of my time series is also different. This is because Summer Olympics and Winter Olympics both are Olympics, and both basically shows us pikes in my search. However, they have somewhat different popularity. People started talking about Summer Olympics much earlier than they talk about Winter Olympics. Because Summer Olympics usually have more nations participating, and people start talking about it earlier, there are more sports, and so on, and so on. Which essentially means that if I am trying to basically identify as all of these as part of the same Motif, if that's my goal, if I want to identify all of these same Motif, I should not only allow amplitude differences, but I should also allow window of length differences. Now if you remember, we had said in our search algorithm, that the window of length is given. But if the window of length is given, if I start with a large window then I will not be able to identify this as being part of it, and if I started with a small window, if I give a small window then I might not be able to identify this as part of the Motif. This essentially means that giving a single window may not work if my Motif and trees may have different lengths. Not only this. So, I can also basically have in the same time series, Motifs of different types. So for example, in this case, I can look at my time series and I can say, "Well, you know what? I will not find not only these as my Motifs, the orange region as my Motifs, but will also considers this time window, and then I will find Motifs where I have a strong peak and a small peak, strong peak and a small peak, strong peak, and a small peak." Essentially in this case, my Motif is much longer. If my Motif has a close to four-year timeframe. I might want to identify in the same time series, both the shorter Motifs and the longer Motifs. Of course, I may not know ahead of time that my time series have both short Motifs and long Motif. I may not know ahead of time. So, if I don't know it ahead of the time, essentially, I need to change my algorithm. So, I need to change my algorithm such that I enumerate subsequences of varying length, so I need to be able to create Motifs of different lengths, and when I apply my clustering algorithm I need to make sure that my clustering algorithm works even though my time series may have different lengths. Or my subsequences have different lengths. So, I need to somehow make sure that my clustering algorithm clusters time series that are short and time series that are long in the same cluster, if the overall pattern that they show is identical. Note that this essentially means that the cost of my search will be much larger. Because a) I have many, many more subsequences, because I'm going to have more short subsequences and long subsequences, so I have more subsequences. The clustering algorithm may need to basically consider more subsequences as candidates to be part of the same cluster. So, it may be the case that I have more subsequences, and the clustering is more expansive because I have more alternatives to consider. So in those cases, the Motif Search is more expensive. So, it is more important that I apply effective pruning algorithms, and I apply them early so that the search is cheaper. So once again, it is important for me to decide head of the time what type of Motifs I'm searching and apply the appropriate pruning early on, so that the Motif Search is not very expensive. So, let's summarize. So, we have seen that the Motifs are nothing but repeated patterns in time series, and Motifs are important because they can tell us about repeated events, repeated interesting events that are occurring in the real world. So, we have also seen that Motif Search has essentially four steps. First, we divide the series into subsequences, then we eliminate uninteresting subsequences to reduce the cost of the Motif Search process, we apply a clustering algorithm and then finally, we eliminate uninteresting or not interesting clusters so that the remaining Motifs are interesting. The challenges are that usually we may have many, many subsequences so we need to basically find effective ways to only enumerate interesting subsequences, which essentially means that I need to have a definition of an interesting subsequence. So, that has to be defined on an application to application basis. I need to understand for my application what are interesting subsequences. Finally, I need to identify what is an interesting cluster which essentially means that once again, I need to know my application to decide whether a given cluster is interesting or not. Do I care about support? Do I care about purity? Do I care about all dynamicity? I need to decide that head of the time so that my Motif Search can be effective. So Motif Search once again, very simple algorithm, very important in time series analysis, but we need to have good understanding of our application to make sure that we identify the right Motifs, and we identify them cheaply. Thank you your attention.