In the last session we discussed DB scan, a density-based clustering methods. For the same group of authors, they later invented another interesting algorithm called OPTICS, ordering points to identify clustering structures. Why they wanted to do that? Just because DBscan is sensitive to parameter setting. So OPTICS extended DBscan but make it is much less sensitive to parameter setting because it can find a clustering structure. This algorithm was developed by almost the same group of authors in SIGMOD 1999. We can have a simple operation. Suppose we fix one parameter of DBscan say minimum number of points. Then for density-based clusters with respect to high density, that means you just need a small radius to cover the number of minimum points like five, these dense clusters are completely contained in clusters with respect to a lower density. Simply it says, the dense cluster is deeply nested in the less dense clusters because, less dense cluster needs a bigger radius to cover the same number of minimum points. So if we observe this one we probably can see we may have high density points, it should be processed first or deeper, in order to find a high density cluster first, then we can plot them into a reachability plot for datasets. OPTICS actually stores such a clustering structure using two pieces of information, core distance and the reachability distance. We will introduced in the next slide, but let's look at this reachability plot. If we got this set of datasets, then if we study their reachability distance, since the points belonging to a cluster, have lower reachability distance to their nearest neighbors simply it says you just need a smaller disk to cover the minimum number of points. So your Epsilon disk radius should be smaller. That means the valley should contain the very dense clusters. The deeper the valley, the denser the cluster. So you probably can see if we order them according to their reachability distance, we probably can plot them into a graph. So you can easily find the clustering instructors. So that's the trick of OPTICS. Then let's look at how OPTICS is defining core distance and the reachability distance. We first look at the core distance. For a core distance of an object p like this one is the smallest Epsilon value such that piece Epsilons neighborhood will contain at least the minimum points objects, for example five. Let's see if you say p, you'll try to find its radius to cover five points like one, two, three, four, five points. Now you probably can see this Epsilon should be three millimeters. Simply say three millimeter you will be able to cover five points, that's actually is the definition of the core distance. Simply it says if p is so isolated, you just cannot find enough neighbors to cover the minimum points. Then we say this core distance is undefined because you would stretch so large, you just cannot find enough neighbors. On the other hand, if you can find the minimum number of points, then this minimum number of points distance of p is the core distance. For example here like this p, you just need a three millimeter to cover this, then the core distance is three millimeter. Then we look at reachability distance. The reachability distance is from a core object q. Reachability distance of p, you want to reach this q, what you want to define is the minimum radius value which makes p density reachable from q. If you want to see if p can reach q from q, then you just to see if q is not a core object. Simply it says no matter how big your stretch is, just cannot have enough object to get a minimum number of points, then this reachability distance is undefined. Otherwise, you just to see which one is bigger. Is it the core distance, is it bigger, or the distance from q to p is bigger? For this picture you probably can see is if you look at the reachability distance from q to p rather than from p to q, we just looked at it from q to p. If q1 to p is smaller than the core distance, then the core distance should be the reachability distance. Otherwise, like this q2 try to reach p, then if it's bigger than the core distance, then the reachability distance is the distance between p and q2. So that's essentially is the definition of reachability distance. That mean how big your disk is in order to reach enough number of minimum points. To calculate this, the complexity is O (N log ) if you have index. N is number of points, so it is not too bad. Then based on this idea, we actually work out these clustering structure. The structural graph, the nice thing is this method produces special clustering ordering of the data points with respect to density-based clustering structure. Then the clustering structure actually it contains information equivalent to density-based of clustering corresponding to a broad range of parameter settings. For example, the first thing is you fix the minimum number of points you'll get this plot. With this plot for example, you see there are four major clusters. If you do not think this one is something you actually can find a one, two, three, four clusters. Even you change the number of minimum points, you will still be able to find the same number one, two, three, four, of course, the plot is somehow changed the shape with the different minimum number of points, but still we can find four clusters. Moreover, you can find nested clusters. For example, this bigger cluster has three deeply nested clusters. When you look at this bigger cluster, it contains if you'd see the deeper one you'd see one, two, three deeper clusters corresponding to this. So you probably can see this is a very nice you'd use different lie on the Epsilon. You actually can find the bigger cluster and nested clustering structures. So it is a pretty interesting algorithm and you can actually use automatic or interactive. If you do clustering analysis, you can find intrinsic even hierarchically nested clustering structures. So this is a pretty interesting extension to DBscan, that's the OPTICS algorithm.