Often, we say analysts, sometimes naively using machine learning techniques for thematic mapping. Without thinking carefully about the objectives of the exercise and whether a careful mix-up procedures can produce better results. Here, we want to look at some mixed algorithm methodologies. Too often, analysts just apply a chosen classifier to an image, using labeled training data for each class and expect good practical outcomes. Such an approach works okay for highly stylized images, such as the labeled data sets regularly used to test new algorithms, but they are not optimal when looking at real and heterogeneous image data. In this lecture, we will look at practical methodologies that work well in practice and yet are often overlooked. We start with a review of the properties of supervised and unsupervised classification. Remember, in supervised classification, the analyst acquires beforehand a labeled set of reference data, that is, training pixels, for each of the classes of interests and desirably for all apparent classes in the image. That data is used to estimate the parameters or other constants required to operate the chosen classification algorithm. The algorithm can then be applied to the full image of interest, and testing pixels can be used to assess how well it has performed. The amount of training data will often be less than 1-5 percent of the image pixels. The learning phase, therefore, in which the analyst plays an important part in the labeling of pixels beforehand, is actually performed on a very small part of the image. Once trained, the classifier can then attach labels to all the image pixels, and that is where a significant benefit occurs in thematic mapping. The output from supervised classification consists of a thematic map of class labels, often accompanied by a table of area estimates, and importantly, an error matrix which indicates by class, the residual error, or accuracy of the final product. In unsupervised classification, clustering algorithms are used, typically on a sample of an image to petition the spectral space into a number of discrete clusters or spectral classes. It then labels all image pixels as belonging to one of the spectral classes found, typically using a minimum distance assignment. A cluster map can be produced in which the pixels are given labels indicating which cluster they belong to. Unsupervised classification is a segmentation of the spectral space in the absence of any information fed in by the analysts. That's why it's called unsupervised. Analyst knowledge is used afterwards to attach information class labels to the map found by clustering. This is often gathered by the spatial distribution of the labels shown in the cluster map. Clearly, this is an advantage of unsupervised classification, as we will see in an example. In general, clustering algorithms are computationally expensive to run, compared with most supervised classification methods. We can benefit thematic mapping by bringing together the strengths of supervised and unsupervised methods into a classification methodology. Even though very old and devised when the maximum likelihood rule was the key machine learning classifier used in remote sensing, it teaches us a lot about operational thematic of mapping. We will demonstrate the method with a simple example. It uses the best aspects of the unsupervised and supervised approaches. Unsupervised clustering is used to discover the spectral classes using an image. Supervised classification after having associated those spectral classes with the information classes of interest, produces the thematic map. It is outlined in five steps on the next slide. The unsupervised supervised methodology developed by Fleming and his colleagues works as follows. Step one, we use clustering to determine a set of clusters or spectral classes into which the image results. This is performed on a representative subset of data. Spectral class properties, that is, statistics can be produced from this step. Small classes that might be overlooked by the analysts, such as those consisting of mixtures along class boundaries, and elongated classes like rivers and stream systems, will be picked up. Step two, using available reference data, that is maps, air photos, the image itself, and even the special distribution of clusters in the cluster map, will associate the spectral classes or clusters with information classes. Frequently there will be more than one spectral class for some information classes because they might look different spectrally to the sensor, even though the analysts calls them by the same name. This is an important consideration. Step 3, we can then perform a feature selection to see whether all the features or bands need to be retained for reliable classification. Depending on the classifier algorithm used and the data set. This step is not always necessary. Step 4, we use the supervised algorithm to classify the entire image into the set of spectral classes. The hybrid approach was used initially with maximum likelihood classification, but it can be used with most classifier algorithms. Step 5, label each pixel in the classification with the information class corresponding to its spectral class and use independent testing data to determine the accuracy of the classification product. We now use two simple examples to highlight the value of using unsupervised classification as a means for identifying spectral classes and for generating the signatures of classes for which the acquisition of training data would be difficult. Those results are just then used in a supervised algorithm. To demonstrate the hybrid approach, we use an image segment recorded by the HyVista HyMap sensor over the city of Perth in Western Australia. It is centered on a golf course. The obvious cover types are water, grass that is the fairways in the golf coarse, trees, bare ground including bunkers, a clubhouse, tracks and roads. Apart from a few cases, the distribution of cover types suggests that it might be hard to generate training fields for all classes of interest. Five bands were used, band 7 which is visible green, band 15 which is visible red, band 29 which is in the near infrared and band 80 and band 108 both in the mid infrared range. These last two are the infrared maxima of the vegetation and soil curves midway between the water absorption regions. It was felt that they would assist in discriminating among the bare ground and roadway classes. The three heterogeneous fields shown on the image were used for clustering. Among them, they seem to include all cover types. The three fields were aggregated and the Isodata clustering algorithm was supplied to the combined dataset with the goal of finding 10 spectral classes. Note there are seven information classes in the image. The results for the three cluster regions are shown on the top right-hand cluster map. We can say that the spatial distribution of the clusters matches the spatial distribution of the information classes in the image which is as to be expected although there may be more than one cluster corresponding to each information class. In this slide, we showed the main vectors of each of the clusters found by looking at where the clusters lie in the image and by looking at the shapes of the spectral reflectance characteristics seen by the distribution of the elements of the cluster means, we attached the information class labels shown on the slide. Several important observations can be noted here. First, two of the information classes are each represented by two spectral classes or clusters. Secondly, one spectral class or cluster has been generated from the water vegetation mixed pixels along borders. Thirdly, elongated classes such as tracks and roads have been found as separate spectral classes. It would be a little hard to develop 20 pixels for those classes otherwise. For verification, this slide shows an infrared versus red bars spectral plot for the clusters found. The distribution of classes agrees with where they would normally lie in such a plot. Using the set of spectral classes found from clustering and their mean vectors and covariance matrices which are generated automatically by the Isodata algorithm, we get the symmetric map shown on the left-hand side of the slide. The right-hand, the medic map is colored. Sets of the clusters or spectral classes corresponding to each information class have the same color. Here we show the final thematic map alongside the image to allow a comparison. In this simple exercise, we did not choose testing data so it cannot report quantitatively on the map accuracy. We now look at another simple example of the combined unsupervised, supervised approach. It involves classifying an arid region of Australia employed for growing cotton by the use of irrigation from a nearby river. The task was to assess the area in hectares sown to cotton as a surrogate for the amount of water used. Field agronomists had assessed the hectarage of cotton crops in the region but required corroborative evidence. The image to be classified consists just of the visible red and the first of the two near infrared bands of a landsat multispectral scanner image, recorded in February 1991. Although the region is very dry at that time of the year, apart from the crops there is a gallery or riparian forest along the river, which provides another vegetation class. Supervised classification was carried out using the minimum distance classifier, a very simple algorithm. On the left of this slide, we see a near infrared image of the region to be analyzed. A test semi-image has been identified on which the results are to be evaluated. The Darling river can be seen in the image. The cotton fields are mostly in the test area, which is what in the left-hand image, indicating a high IR response as well and approximately triangular shaped crop is shown in the bottom southeastern corner of the image. This is cotton, as some other scattered fields. The full rectangular selections in the right-hand image subsets, along with our sample of the lower triangular crop were use to resolve the spectral space in the spectral classes by clustering. Here the simple single-pass clustering algorithm was used and each of the five of heterogeneous regions was clustered separately. The results are shown on the next slide. The results of clustering as shown here in bar spectral plot form. In terms of the means of the cluster centers found, there were 34 clusters in total, which were then rationalized down to the 10 shown. That was done by associating the clusters with information classes using black and white and color air photos along with photointerpretation of the image itself. Those 10 grouped spectral classes were considered to be adequate to differentiate the image into its main cover types and thereby avoid any errors of commission, which might lead to poor estimates of the area of the cotton crops. When the minimum distance classifier was applied to the test image, using the 10 rationalized spectral classes from the previous slide, it was found that the cotton crops accounted for 803 hectares. The field agronomists had estimated 800 hectares. In this exercise, it was not necessary to produce thematic map since the important result was the air of cotton in the test region. How can we ensure we have pixels in the training dataset representative of all the information classes in an image and does it matter? For example, consider the river and gallery forest class in the previous irrigated crop example. How can the user select beforehand a representative set of those pixels with which to develop one of the binary classifiers in a support vector machine? If the pixels for that class are not well differentiated and indeed that may be mixtures of water and trees, then hand selection of the training fields might be difficult. Again, the analyst may wish to consider using a hybrid clustering support vector machine approach. The cluster step as with the maximum likelihood and minimum distance classifier examples just presented, would be carried out on representative and heterogeneous parts of the image in order to generate a set of spectral classes with which to work. Those clusters could be aggregated into single information classes before application of the support vector machine in order to limit the number of binary classifiers needed, although in some cases there may be value in retaining some information class sub-sets that improves separability. By way of summary, unsupervised classification based on clustering can be a very effective way of generating training pixels and revealing spectral differences within information classes, that is, spectral classes. Often, more accurate supervised results will be obtained if sets of sub-classes are use for each information class. Rather than use reference data explicitly to train a supervised algorithm, it is used to attach information class labels to clusters. This is a traditional unsupervised classification approach. Even there the analysts might be interested in just a small number of classes, sometimes one, better results will often be obtained if all apparent cover types are represented in the classification, in order to avoid errors of commission caused by large classes spilling over into the small classes. Finally, because the convolutional neural network develops knowledge about a scene as a whole, and to particularly the special nature of information classes and implements exceedingly complex piecewise linear decision surfaces, it is unclear at this time whether a prior unsupervised clustering will help improve performance. Answering this question requires a careful assessment of the spectral reflectance characteristics of the cover types involved.