[MUSIC] In this session we're going to study the requirements and challenges of cluster analysis. There are many things to considered for cluster analysis. For example, the first one people are often to consider is partitioning criteria. That means whether we want to get a single level or multiple-level hierarchical partitioning. In many cases, multi-level hierarchical partitioning or what we call hierarchical clustering could be quite desirable. The second thing we want to consider whether how the clusters can separate the objects. It can be hard or exclusive, or soft, non-exclusive. The exclusive means one object, like one customer, can belong to only one region, okay. Non-exclusive, for example, one document may belong to more than one cluster. Then for a similarity measure, there could be many different kinds of similarity measures. For example, distance-based using Euclidean space or a road network or vector space. Or we can use connectivity-based, for example, based on density or contiguity. The fourth one that we are going to discuss is clustering space. That means, whether we want to do full space clustering, or we want to do subspace clustering. Full space means especially in the low dimensional, for example in 2D, two dimensional space, in the area or in the region, whether we want to find the clusters in those region, in the full 2D space. Or only we want to project on the 1D subspace. However in many high dimensional clustering, it is hard to find meaningful clusters in a very high damage space. But it is possible to find interesting clusters in the subspaces. For example, in the corresponding lower dimensional space. So the subspace clustering also becomes very important. Then, for cluster analysis, there are many requirements and challenges. The first, most important thing is the quality of clustering. Means whether we are going to deal with different kinds of attributes, numerical one, categorical data, text, multimedia data, networks, or a mixture of multiple types. Whether we can discover clusters with arbitrary shape, like an oil spill, or whether we will be able to deal with noisy data. Then the second important issue people consider is scalability. That means whether we will be able in a very big amount of data environment, whether we can cluster all the data or we must draw some sample and cluster only on the sample rate. Whether we can handle high dimensional data, or we can only project them in the low dimension space. And whether we can do timely incremental clustering, especially for data streams. Whether we can do stream clustering. Especially whether the cluster results may not be sensitive to the input order of the data. Another important issue we need to handle is constraint-based clustering. That means users usually have their own thinking on the preference or constraints of the cluster they want to derive. They may have domain knowledge. They may know certain subspaces could be more interesting. They may even raise queries. So can we handle clustering given user preference or constraints? The final issue we are going to discuss is interpretability and usability. Simply says the clustering whether they can derive meaningful clusters which can be understood by people, can be used by many applications, can interpret data nicely. All these are a very important issue we are going to study in this lecture. [MUSIC]