[MUSIC] All right, this part of the course covers theoretical foundations and future trends of cyber GS and Geospatial Data Science. Will cover three parts, specifically, the first part is going to be theoretical foundations. Let's pose a fundamental question. I understand we've been learning about quite a few technical aspect of Cyber GIS and Geospatial data science. But from this fundamental question, we want to know, given a geospatial analytics, you want to perform against any computers. In fact, you need to be aware of the computational cost of these analytics. And as we learned so far in the course, increasingly your geospatial analytics is not going to be able to get done by your personal computing resources. Oftentimes, you need to get your analytics performed using high performance computers or cloud computing resources. But before committing your geospatial analytics, you need to be aware of how much computational cost your analytics is going to pose to high performance computers or cloud computing? Because, for instance, in this cartoonish illustration, if your analytics is going to to take a massive amount of resources from your computing environments and you do not even realize this is going to cost you a lot of money from your, say, credit cards, and you might go into bankrupt after running this analytics. So it's important to know beforehand, what is the computational cost of your analysts? So the question here on the slide is, what is in the nature of computational intensity of special analysis and modelling, which is the underlying important methodology for your Geospatial analytics, and why spatial is special? We're going to learn to figure out the computational intensity of your geospatial analytics. So this computational intensity questions are comparable to this question about the computational complexity. The computational, complexity of algorithms to estimate upper and lower bounds of the complexity of computation for running an algorithm. A good example, for instance is, what is the upper bound of the algorithm running on a computer, taking how much time? And the notion of computational intensity is different from the definition of computational of complexity. In that competition, intensity is about the exact cost we need to know for spatial analysis and modelling. And for computational intensity is beyond upper and lower bounds of running an algorithm on a computer is about more exact estimate of computational cost. For instance, if you run a geospatial analytics against some geospatial big data sets on a cloud computing resource, we want to know how much time and how much memory, how much data storage space that should be provided by your cloud computing resource? That's the exact computational cost we need to derive by assessing the computational intensity of your analytics. Now the concept and the theoretical foundation of computational intensity is really core. Theoretical Foundation of Cyber GIS, we visited this diagram at the beginning of our course. You see computational intensity is situated in the center by intersecting spatial analysis and modeling GIS and the cyber infrastructure. It's really a concept that underpins all these three domains by providing exact estimates of computational cost based on spatial characteristics of your analytics. This illustrative example shows a data set on the left of this slide, which is a clustered data set. In a spatial sense, all the measurements there are quite clustered into one region of this spatial domain. Now, if you want to conduct, say, a clustering analysis. Let's say you don't know this cluster and you want to detect this cluster by running your analytics and certainly you would need to perform your analytics. So the question as I posed earlier for computational intensity, you want to figure out before committing your analytics, in this case, clustering detection. You want to know how much computational cost this analytics is going to bring you about. Let's say to your personal computer, Is your personal computer able to accommodate this analytics with this kind of data set? Now the right side of this slide is a rough estimate of the computational intensity based on the different parts of the data set. So remember, we're taking a spatially explicit view of this computational intensity understanding by going to different parts of the spatial domain and to see how the different parts are leading to different level of computational cost. That's the visualization on the right side of the slide is trying to give you. And we're going to learn how to estimate, depending on different parts of your spatial datasets. If we need to break this large dataset into different parts, what is the cost of the different parts? And how do we, in this case, visualize the computational cost across the spatial domain? So that's the notion of spatial computational domain. So we really want to know the computational intensity across the spatial domain. So the spatial domain in this case, shown on the left side of the slide and the right side, is to match to the spatial computational domain with regard to the spatial aspects on the left side. And you all know more from some examples I'll have later, how to derive the spatial computational domain shown on the right side of the slide. So why we're doing this? Well, a major need for doing this is to guide our divide and conquer. So for geospatial, big data analytics, oftentimes your analytics is not going to be accommodated and accomplished by using just a single computer. It needs to run in the cloud and needs to run in parallel using high performance computing resources. So the strategy of divide and conquer is necessary. And how do you know different parts of your spatial domains? Having different computational intensity that should be accommodated by different computing elements is important to accomplish your analytics in efficient way, but also using the computing resources in efficient way. And even importantly, how do you gather all the results together in a coherent way that will match the results correctly if you would not have to do the divide and conquer at all? So theoretically, take a look at this example dataset from the left of the slide. That's a point pattern dataset similar to the earlier example you saw. So the distribution of points across your spatial domain. Let's say we want to perform type of analytics against this data set, and we want to estimate the computational intensity of this analytics. So how do we do that, theoretically? Well, the basic idea is, if you might be familiar with taking a picture of something, let's say you have a remote sensing satellite taking picture of the Earth all the time. And sometimes some satellites would focus on, say, land use. Some other times would be focusing on water features. So in this case, we have this camera that is an analogous to the camera on a particular satellite. We would take a picture of the spatial domain, and the result of the picture taking is our computational intensity surface. So the idea here is this camera will do the transformation of combining spatial characteristics of your analytics and to your data. And translate that into computational intensity estimate, spatially explicit fashion. So graphically, the idea here is you have your spatial domain, such as the one with the points distributed across your domain. But here we also need to create another spatial domain, which is the spatial computational domain. In this graphical representation, the resolution of your spatial computational domain usually is closer than your spatial domain. Because when you divide and conquer your geospatial analytics, you tend to divide and conquer the problem into course parts that could be scheduled to run on different computing elements. So the cell size here is the resolution of your spatial computational domain. So the process of that camera, I mentioned earlier, is to transform your spatial data and operations into this computational intensity estimate that is often done in spatially explicit fashion. So the result of this transformation is a spatial computational domain. So the domain with that graphical representation that could be formally represented it through two types. One is data centric, the other is operation centric. Data centric, meaning we want to consider the data characteristics primarily. And the operation centric, we want to consider the operations, spatial operations in particular, explicitly. And getting these two types combined together is often necessary to derive your spatial computational domain. The formal definition of spatial computational domain is down through this function, which translates this I-squared domain to a real number domain. The real number domain is the representation of computational intensity. So the I square represents the graphical domain. We just saw a few slides earlier, has a particular resolution for determining the structure of the spatial competition domain. And each of the cells of the spatial competition domain is determined by this function with regard to the value of the cell to be the computational intensity. This sounds to be abstract, but when I give you an example later, you will see this makes sense for deriving computational intensity of your geospatial analytics, based on the characteristics of data, or spatial operations applied against the data. So I was alluded to two types of transformations. One is data centric, the others operation centric. So if you consider the combinations of data centric or operation centric, you could easily see there really four combinations here, operation data centric, operation centric. This is a data centric but not operation centric and neither data centric nor operation centric. So in this part of the course, we're going to focus on the upper two, and particularly the operation and the data centric.