There are other heuristics that are more adaptive, in terms of basically the regions of the space they fill. So the sDTW algorithm, essentially relies on adaptive constraints to decide which part of the matrix should be filled, and which part of the matrix should be ignored. The idea of this algorithm, again I'm not going to go over the details of this algorithm, but the idea of this algorithm is that, most of the time, the time series have some structural properties, such as these elbows here, that are easy to identify. And once we identify these sort of structural features, if I can quickly identify these structural features like these elbows. If I can identify them, then I can use those features and their alignment to decide my window size. For example, if basically this two elbows are, say ten units apart from each other. That kind of tells me that at that region of the space my misalignments are around ten units. If say between these two elbows the alignment is 20 units. That kind of tells me that, basically, around that part of the matrix the alignments are 20 units apart from each other. Which it actually means that, basically, this can kind of give me some insight as to what should be the window I should consider when I am filling the matrix. So, if I can use an algorithm like this, I don't need to fix my window, I don't need to assume that my window will start very small, becomes large and then it becomes small again towards the end. Instead, I can find the structural features, like elbows, for both of the time series. Then I can find which of these windows are aligned, and how far they are in time. And then I can decide, based on the alignment between these elbows, I can decide what should be the length of the window I allow at different part of the series. So, sDTW algorithm is dynamic time warping algorithm, essentially uses what are known as locally relevant constraints to decide what part of the matrix we should fill. So that gives us an adaptive way to reduce the cost of dynamic time warping. Okay, so we have learned so far about Euclidean distance, we have learned correlation, we have learned about ED distance based comparison of time series, and we have learned about dynamic time warping. Are there any other alternatives? Well, there is one more alternative that I would like to discuss with you. And that is basically, that essentially goes back to what we have started from the beginning, right. Remember, at the beginning we have said that we have sort of two different representations of things that are of sequence type. So one of them, we call string or sequence where the symbols are coming from a symbolic dictionary. And the other one we call time series, where the entries are taking numeric values. Aside from that, essentially there is not much difference between sequences and time series. So they are kind of related to each other. And we had also seen that we could Treat a time series like a sequence if needed. So we had basically said that when we compute, maybe I'm trying to compute a similar to between sequences, we use edit distance. And it also seem that we can also basically use ED distance when we are giving time series, that is when they are giving quantitative numeric entries. So then the question essentially becomes, can we actually make this relation more concrete? That is, since we know the time series are similar to sequences, can we take numeric time series? And can we covert them explicitly to a sequence based representation that is more compact? That is, we are taking a long time series, and we want to obtain a short sequence that somehow captures the same time period formation. Can we do that? Well, yes, it turns out that you can do that. And the basic algorithm to achieve that conversion from long time series to compact sequence representation is the Symbolic Aggregate ApproXimation, or SAX algorithm. And I will basically in the next few slides, I will actually describe the key idea behind the Symbolic Aggregate ApproXimation algorithm. So, essentially what we do in SAX is we take the time series, so we are given a time series. And then we decide first of all a window of length, w. So we basically decide a window length w. And then, we split our time series into sub-sequences or sub-series that each of them is of length w. So we basically convert the time series. We chop it into w-length pieces. Then, for each of this w-length piece, we compute the average amplitude. So, for example in this case, for the first window, this is with the average amplitude close to 0.55. For the second Window, if you can't see the values are higher, so the average is going to be also higher. It's going to be close to 1, it's maybe 0.925. My time series are sort of dropping for very quickly drops in the second window. So if I compute the average I would be able to see that it was 0.48, it is lower than 0.95. For the third window, both of the entries that I have are negative and If I basically compute the average, I will basically see that the average is 0.26. For my next window, both of the values are around -1, and in fact if I compute my average output I will see that is -1, and so on, and so on. Essentially what I did is, I chopped my time series into smaller windows, and for each window I computed an average value. Next what I will do is I will see whether I can replace each of these average values with a symbol. And for that, we will basically use a symbol table. So the symbol table essentially is going to take different ranges of values, and will assign for each range value a symbol. So, for example, and this is just an example table, the facts are going to go into a specific table, and it's just in this case I'm using an example table, right? So, for example, I can say that any value between 0.9 and 2, I will replace with symbol A. Any value that is between 0.3 and 0.9, I will represent with symbol B. Any value that is between 0 and 0.3, I would base it with symbol C. Any value between -0.3 and 0, I would represent with symbol D, and so on and so on. So essentially what I am doing is that I'm, in this case, chopping the amplitude axis. And then I am associating to each of these. Ranges a different symbol. If I do that, it's actually what this will give me, is that for each window I have a corresponding symbol. That I can open by considering the average amplitude. For example, for the first window, the average amplitude was 0.55, and 0.55 is in range B. So I cancel the represented window with symbol B. My second window had average amplitude of 0.95. 0.95 is in range A. So I can replace and represent that window with symbol A. My third window had an average amplitude 0.48. 0.48 is, once again, in range B. So I can represent with symbol B, and so on, and so on, and so on. So what happened? I took a long time series with many entries, and I converted that into a compact string. In this case, my string is B, A, B, D, F, E, B, a compact string that represent the shape of the time series. Now that I lost the details, but my string justly represents the shape of my time series. Given these compact representations, then I can use algorithms like edit distance or other algorithms to compare sequences. So essentially I take the time series comparison problem, and I converted into sequence comparison problem. But in the meanwhile, I am essentially both reducing the temporal resolution by dividing this string into windows. And I'm also reducing the amplitude resolution, by moving from past potentially continuous amplitude domain to only s symbols per window. So, essentially the SAX algorithm gives us two reduction techniques. I reduce my temporal resolution and I reduce my amplitude resolution. So it's a lossy algorithm. But once again, it is been shown that in practice using SAX to compare time series give reasonable results. Of course, using dynamic time warping may be more effective. But computing dynamic time warping is very expensive. Instead, SAX give a fast approximation of the dynamic time warping results.