In this lecture, we will summarize other methods for clustering and illustrate one of them, so that we can compare its performance with that of the k-means algorithm. Although the k-means algorithm is one of the most widely used methods for clustering, there are other approaches that have been used with remote sensing data. In the next lecture, we will explore a recent clustering method that has been applied to hop spectral data and to big data sets. But here we will look at another long-standing technique so that we can see its performance relative to the k-means algorithm. By doing so, we will demonstrate that the results of clustering are not unique, affect that the use and needs to be aware of and handle carefully when undertaking unsupervised classification. As noted in the slide, other methods that have been used in remote sensing. First, hierarchical clustering, which when applied to the first example of the last lecture, tends to lead to three and not two clusters. It has been used as a basis for clustering in some big data applications and for details of this algorithm, please see my book. Secondly, histogram picks selection, math and climbing or density maximum selection is another technique used for clustering in remote sensing. Thirdly, we have the single pass clustering algorithm, which is the method we are now going to examine. The single pass method is an old technique and had its origins when remote sensing imagery or supplied on sequentially accessible storage media like magnetic type, which with iteration would be a particularly time consuming process because of the need to read and re-watch the type. Despite its formation, it is still sometimes used because of its simplicity and its speed. It starts by assuming that the data is arranged in the usual row and column format. If the image is very large, a random sample is taken with the pixels and the results arranged again by row and column. The algorithm proceeds in the following manner. The first row is used to obtain an initial set of cluster centers in the following way. The first sample is used as the center of the first cluster. If the second sample is further away from the first by more than a user specified critical distance, then it is used to start a second cluster. Otherwise, the two samples are assumed to be from the same cluster, in which case they emerged and their mean computed. This process is applied to all the samples or pixels in the first row. At the end of the first row, the multispectral standard deviations of the clusters we generated, are produced for use in the light of rows. Each sample in the second and subsequent rows is checked to see if it lies within a user specified number of standard deviations of one of the clusters from the first row. If it does, it is added to that cluster and the cluster statistics are recalculated. Otherwise it is used to start a new cluster and allocated a normal standard deviation in each band. In this slide, we show the single-pass method diagrammatically. The left-hand diagram shows how the first four samples are traded in this particular illustration. Only samples 2 and 3 are close enough to be merged. Clearly sample 2, was too far away from sample 1 and was used to start a new cluster. Also, sample 4, being too far away from the two existing clusters is used to start another cluster. The right-hand diagram shows how sample n plus 1 falls within the prescribed number of standard deviations of cluster 2 and becomes part of that cluster. Whereas sample n was too far away and is used to initiate another separate cluster. The single-pass method is fast and does not require the number of clusters to be pre-specified. It does, however, require the user to specify two parameters, the critical distance used with the first line of samples and the standard deviation multiply used in the remaining lines. Also, since it initiates clustering on the first line of samples, it can be biased by the samples in that line. There is no way to moderate that choice. Variations on a single pass algorithm exist, some let the users specify the actual initial cluster centers, while others use a critical distance measure for all rows. The multi-state package where they're going to use operates that way. We're now going to apply the single pass algorithm to the dataset we traded in the last lecture with the k-means method. The MultiSpec package was again used for this. It doesn't use the standard deviation method for the second and subsequent lines but applies another critical distance. The critical distance is used here, were 2,500 and 2,800 respectively for the first and subsequent lines of data. These numbers seem large, but remember, this sensor has 16 bit radiometric resolution. As noted in the previous slide, the algorithm uses a first line of pixels or samples if the image is large to initiate the cluster centers. In this case, the first line is actually the right-hand column of pixels in the displayed image, say in the next slide. Because after clustering the image and custom apps were rotated 90 degrees clockwise to bring them into a North South orientation. Here we see the results of the application of the single passed method to the image we analyzed earlier. As with the k-means algorithm, we can say that the clusters represented by different colors follow the visual patents of the classes in the image. The colors here are different from before, and this time they were just confusion between roughage and trays. In this slide, we say the cluster center is created by the single pass algorithm. How do they compare with the clusters generated by the k-means method? Well, let's say we compare the results of the two algorithms using bi-spectral plots as shown here. That the sparse vegetation, water, and building classes are about the same for both algorithms. Whereas the two approaches have picked up different combinations of bare surfaces, roads, and trays. In practice, clustering may need to be refined by re-running the algorithm with different sets of parameters until a cluster set is obtained that matches the information classes of interests. Here we summarize the essential elements of the single pass algorithm and the fact that unique results are unlikely to occur. The third question here is particularly important. Often in remote sensing, we have the notion that the pixels tend to clump into groups that align well with groundcover classes. That is often not the case. Instead, the spectral domain can look like a continuum with a few density maxima associated with definite classes like water. We will have more to say about that in Module 3.