Hi. Welcome to the Data Visualization and Data Exploration course. These unit, I will be discussing Sequence and Time Series. If you remember in the previous units, we have covered different datasets, different data types, we talked about vector data, we talked about sequence data. In this video, we'll focus on time series and we'll discuss time series modelling specifically. So, what's a time series? So, if you remember from the earlier lectures, we had introduced strings and sequences and at that time, we had said that strings and sequences are essentially a finite sequence of symbols that are taken from a dictionary. In this example for example, the dictionary cons of three symbols, a, b, c. Time series are actually very similar. Time series again, we have a finite sequence of data, again we are dealing with sequences. The key difference in this case is that the domain of the sequence, that is the values that the sequence takes are from a numeric domain. It could be integers, it could be real numbers, we are essentially working, instead of a symbolic domain, we are working with a numeric domain. So, in this example, we are looking at a time series that is tracking the interest in the term big data that is recorded in a search engine. So, this is basically a good example of a time series. We have time, we are doing our recording and we also have a numeric value that we are tracking. So, why are we interested in time series, what do we do with time series, what are the typical operations that we do with time series? Well, in the next few slides, I'm going to basically give a few examples of the typical operations that we want to do with time series data. The first core key operation that we would like to be able to do in time series, is to compare them, is to be able to tell, "Oh, this time series is similar to the other time series, or this time series shows the same patterns that this other time series shows." Let's see an example. In this slide, we have three time series. The first time series keeps track of the interest in the term big data over time. Again here, the time is shown in the X-axis of the chart. The second time series here is tracking the interest in the term machine learning over time. The third time series is keeping track of the deep learning term over time. So the question essentially that analysts or decision-maker may want to answer for various reasons is, how similar these are, how the different interests or different terms evolve over time. So, we will discuss basic of how to answer this question later. We'll introduce several measures, metrics and algorithms to compare time series, but I want to highlight that this is not a very easy task. For example, here you can basically think of this as before 2000 and after 2000. As you will see here, before 2000, we have the big data and deep learning show a similar pattern, but after 2000, we seem to see that machine learning and deep learning show a similar pattern. So, the essential question becomes, how do we quantify the similarities and how do we basically tell the decision-maker or the person who's exploring and visualizing the data, "Hey, this is the pattern that's happening. Hey, these are the sort of similarities and differences between the time series." So, this is our first task, being able to compare time series. So, the second thing that we would like to be able to do with time series data is to forecast, is to look at beyond what we have recorded. So, in this example, we again have the same time series but these time series are recorded up to today. So, an important question here that most decision makers and most data scientists would like to be able to answer is, what happens in the future? This is an important question for many reasons. In this case for example, for a scientist to know whether big data is going to be an important research topic or whether deep learning is going to be an important research topic in the future. So, we might want to really know what's going to happen in the future. So, for that, we need to develop forecasting algorithms, we need to develop predictive analytics algorithms, and some of those things are covered in the theoretical sequence that we are presenting as part of this program. A third challenge that we often try to implement over time series data is to search for motifs. So, motifs are essentially repeating patterns. So, in these three time series, you will see that there are certain repeating patterns. For example, this pattern here is repeating pretty regularly in this example, and we see that basically the similar pattern also repeats on the other time series as well. We see that the same pattern here also repeats maybe a little bit less strongly, but also on the third time series as well. So, the question essentially becomes, can we find these repeating patterns and can we explain them? Because these repeating patterns for many applications may signal certain events or certain important occurrences. In this case for example, these repeating patterns seem to correspond to the New Year time frame, Christmas/New Year's time frame. Right. So, in this case, we can observe the repeating pattern and we can also explain the repeating pattern. So, the question essentially becomes, if you give me another dataset which shows different characteristic, can we find these repeating patterns? They are not always easy to find as in this example. By the way, even basic on the web data, it is not true that we'll have the pattern so easily identifiable. For example, if you basically put in another sort of key term here, in this case I select a time series and I track the interest on time series, we see that there is a repeating pattern again, but the repeating pattern is actually very different from the repeating patterns of the first three search term. So the question essentially becomes, an interesting question becomes, why is that the case, why are we seeing a different repeating pattern, different motif for the search term time series over the other three search terms? To be able to answer that, the first thing that we need to do is, we need to be able to locate these repeating patterns in the time series, and we'll discuss that. So, that will be one of the things that we will discuss in this unit. Finally- Another important task that we would like to do using time series data is classification of the time series. Because time series usually Bayesian is used to record real-world events. For example, we can use time series to record sensor positions. In this case, there are certain sensors that are placed on the human body, and you might be recording the physical positions of these of the sensors over time. When we look at the the time series recorded while the user is doing different actions for example, walking and running, we can see that the time series are showing different patterns. So, this time series here, and this time series here, the set of time series here are showing very different patterns. So then essentially, the question becomes, if you give me a new fresh time series that doesn't really tell has a label, it doesn't tell me whether the user is walking, it doesn't tell me the user is running, it doesn't tell me the user is jumping, can I look at the time series, and can I classify the given time series as, "Oh, the user is running, oh the user is jumping, or maybe the user is doing a mixture of those things", can I do that? Classification is basically again an important problem, and once again, it is one of the machine learning techniques that we are covering this program. I'm not going to get into details of the classification task as part of this video sequence, but we have other videos that you can you can use especially for the classification task. Okay. So, what is the overall goal? So, we have introduced what the time series is. We have looked at several important operations, important task, important challenges that we face when using time series in data exploration, and database decision- making. So, one final important task that I would like to introduce is time series modelling. Essentially in this case, we have a very specific task. We are not necessary trying to compare different time series to each other, we are not essentially trying to find repeating patterns in the time series, we are not trying to classify the time series, what we want to do is want to understand the time series. Essentially, this is usually being formulated as, can we discover a closed form formula, and this formula is often called a model for time series, that describes the given time series. So, because if I can find a closed form formula for the given time series, then I can have a better understanding of the time series. Maybe I can forecast the future better. We will see that this is a difficult task. Finding a closed form formula for a given arbitrary time series is not easy. So, a simpler problem which we will start from is, can we characterize high level properties of a given time series? We will see that this is going to be a little bit easier, and we'll see that if we can characterize high level properties of a given time series, it might help us actually find a formula for that time series. So, in the next few slides, we will basically discuss these high-level characteristics of time series, or what the other type I will call here, the types of time series. So, now we are basically diving into the time-series data, and we are trying to see how do different time series look, and can I characterize them at a very high level? That's what we try to do. Okay. So, let's try to do that. Let's start basically with the first simple type of time series. The most simple types of time series is called Stationary Series. This type of time series show similar pattern over time. Now, similar pattern doesn't mean that it has the same values. For example, this time series here wildly varies over time. So, it doesn't show the same pattern, the same value over time. However, if we analyze the time series, if you look at statistical properties such as the mean, average value, the time series take over time. If you look at the variance of the data over time, we will see that that is essentially constant over time. So, these type of series where the statistical properties are constant over time, we call them Stationary Series. The advantage of stationary series, we will see them later, is that they are easy to analyze. As the time series become more complex, as the statistical properties of the time to change over time, the time series becomes harder to study, and it becomes harder to predict. So, we'll basically call these as the Stationary Series. The second type of series, obviously, these are the Non-Stationary Series. Right? So, in the case of non-stationary series, the statistical properties of the time series changes over time. They don't stay the same, they change over time. I will basically say that there are different ways for a time series to be non-stationary. Okay? In this slide, we see two different types of non-stationary behavior. So, the first non-stationary behavior that we see here is cyclicity. So, the blue time series here as you will see is cyclic. So, it's mean changes over time. So, the mean of this time series varies over time. In this case, it varies through a periodic behavior, it shows a periodic behavior. So, we'll call this cyclic time series. It turns out that in the real world, many data shows cyclic behavior. So, it's important for us to understand cyclicity, and it's also important for us to understand how to capture, how to discover, how to capture, and how to use cyclic behavior. The second type of non-stationarity is the trend. Trend essentially, usually, it is used to mean a change, a constant like change over time in the mean of the data. In this case as you will see, we have a cyclic data, but the cyclic data also shows an increasing trend. The data doesn't simply go up and down over time cyclically, but it also has a positive trend, the values are increasing over time. Again, this data is non-stationary because the statistical properties of the data changes over time. Then once again, it is for many applications, it is important to understand whether it shows a positive or negative trend. So, we will need to be able to understand and characterize if a time series shows non-stationary behaviors such as cyclicity or trend. That's not all, because other things can also change over time. In this case, we have again a cyclic data. In fact we have two cyclic series, but these series have a variance that is the spread of the data, the variance which changes over time. So, we have in this case, the two datasets showing the same cyclicity, but the variance of the data, the peaks, the difference between the peaks of the data changes over time. So, once again the question becomes, can we discover these type of behaviors? Can we represent these type of behaviors? Can we use them to support decision-making? So, this is the third type of non-stationary behavior. Finally, again, I'm saying finally, but I don't necessarily mean these are the only non-stationary behavior, these are the only ones that will be essentially discussing in these slides. The final type of non-stationary that we'll consider is the change in the speed of the data. That is, how fast the cycles do change. In this example, the blue time series has a constant speed, it is cyclic and the period of the cycles, the frequency of the cycles stay the same over time. On the other hand, for the red time series that we see here, it is again cyclic. It again has the same mean over time, it again has the same variance over time, but the frequency of the cycles, the period of the cycles, or the speed of the cycles change over time. Once again, this is a non-stationary behavior. Once again, when we are characterizing a time series, we need to understand its speed. The modal, the formula that you want to discover should have a way to characterize the speed of the time series as well. So, what did we discuss so far. We have learned that some of the time series are stationary. So, these time series show constant statistical properties. Their mean and variance especially, and also the speed are constant over time. We also learned that not all time series are stationary. Many time series and actual many times [inaudible] as in the real world are non-stationary. Their statistical properties change over time. So, a particular difficulty, or particular challenge, which we will be discussing soon is that most algorithms that we use for forecasting, they assume that the series can be rendered approximately stationary through mathematical transformations. This is important because forecasting is a difficult challenge, and we know that if a time series is simple, it is much easier to forecast. If everything is static, you know what will happen in the future because things are static. So, essentially now the question becomes, can we take a non-stationary data, non-stationary time series data, and can we do certain transformation from the data to convert this non-stationary series to a stationary series? Which is much easier to study, which is much easier to understand, which is much easier to forecast. Those are some of the techniques we will learn in the upcoming slides.