We will start with SIFT. In SIFT, this stands for Scale Invariant Feature Transform. This is one of the first feature detection schemes that had been proposed. It uses image transformations in the feature detection matching process. SIFT characteristics include that it's highly accurate, which is wonderful. However, the large computation complexity limits the use in real-time applications, and this is an issue. Looking at the process that you see right here. The table that you saw in the former lecture, as well as the figures that I'm showing you right here are from the PhD dissertation at Yonsei University of Dr. Young Son Pak, who is my grad student. We worked together in this research for several years and I proudly I'm showing you some of the results that were accomplished and are shown in his PhD dissertation. I am very proud of my grad students that are doing research with me and had doing research with me together in the past to accomplish new technologies. So, I'm using this proudly and here we go. In the overall SIFT feature, you can see the DoG generation on the top. DoG stands for this right here, and we will talk about this in the following page. It's based on Gaussian techniques and we are extracting the differences of this, and after that process goes into key point detection mechanisms. As you can see here, the key points are based upon these different levels of an image based on different scales that are produced and finding maxima or minima points, the extrema points. This follows by orientation assignment techniques, which you see uses this which, I will show you soon in further details. Then there's descriptor generation techniques. Then it goes down into downsample of the image, and then looking at the octaves. You will see about the details of this in the following page. Difference of Gaussian generation, DoG generation. Build a scale space using an approximation based on DoG techniques. So, if you look at SIFT and the following SURF, scale space is used as the core technology to start the augmented reality process. Local extrema of the DoG images at varying scales are selected as interest points. These local extrema, what is a extrema? A local is within an image there are local ranges that you can divide it into based on characteristics of the image. Then, in each image there are extreme points, which are the extremas, as in terms of being either a maximum or a minimum point in that local range. That's what we mean by local extrema. These will be the key points that we are looking for using the DoG images at various scales, and these will become our interest points. DoG images are produced by image convolving or blurring with Gaussians at each octave of the scale space in the Gaussian pyramid. So, we are going to use the scale space and Gaussian pyramid, in which, in the scale space we have abstract scale space and coming down to the full scale space. Stacking them up, it will look like a pyramid of images, based upon Gaussian processing. The processing is involving convolution or blurring techniques. These Gaussian images based on the number of octaves is down-sampled after each iteration. Why use DoG? What is this doing? Well, a DoG is a LoG, Laplacian of Gaussian approximation method. So, what we're really saying is that we would want to use a Laplacian of Gaussian, but instead of that we're using a simpler less computation burdening DoG technique. A Laplacian of Gaussian. Why do you want to use a Laplacian? What is the benefit of it? Hang in there, I'll explain it soon. DoG has low competition complexity compared to LoG. In addition, DoG does not need partial derivative competition like LoG does. So, automatically from the statement you see that LoG is very computation burdening and it is computation burdening because it uses partial derivatives. DoG obtains local extrema of images with difference of Gaussian techniques. Interest Point Detection classical methods commonly used LoG technology. LoG is a very popular approach to use it because in Interest Point Detection, LoG is scale invariant when it is applied to multiple image scales. The characteristics of a Gaussian scale-space pyramid and kernel techniques are frequently used. This is just a conceptual example of a pyramid of images at different levels. This is not exactly what you will see as a result. However, it gives you an idea of what you mean by a pyramid of images at different scales. Approximation of Laplacian of Gaussian. What is the Laplacian process doing? It is a differential operator of a function on Euclidean space. Once again, remember that SIFT and SURF are based upon the euclidean space, the euclidean distance. Where the other algorithms that were faster and more appropriate for real-time were based on the hamming distance, because they are based on binary operations. Here, this scheme has a second order Gaussian scale space derivative based process. This here is sensitive to noise and requires a lot of computation. This is one of the reasons that we want to avoid the Laplacian process, and that is why we want to have approximation methods. That is why DoG technology is used instead of LoG technology in SIFT. SIFT processing steps include the keypoint detection where keypoint localization process is conducted. Each pixel in the DoG image is compared to its neighboring pixels. So, each pixel is compared with its neighboring pixels, so that's what it means by a localization process. What are you trying to do? You're trying to do keypoint detection. Comparison is processed on the current scale and two scales above and two scales below. That is where the comparison process is conducted. Why? To find key points. Candidate keypoints are pixels that are local maximums or minimums. The extrema points are the candidate keypoints. A final set of keypoints will exclude low contrast points. Because if you're a low contrast point, you may have air inside, you may not be accurate. So, even though the value is a maximum or minimum in its local range, if the contrast is low, you may not want to use it because it may be faulty. Then we have the orientation assignment procedure, where keypoint orientation determination is what we're trying to do. Orientation of a keypoint is the local image gradient histogram in the neighborhood of the keypoint. What is the gradient doing? The gradient is showing you the rate of change. The histogram is a structure in which it's a graphical technique that shows you where the amount of change is largest and smallest. So therefore, looking into this, we can find the orientation of a key point based upon gradient histogram techniques, in which the peaks in the histogram, the gradient histogram are selected as the dominant orientations. Because that's where the most changes are detected. How are they detected? Using the gradient technique. The SIFT descriptor generation is based on the computation of the feature descriptor for each keypoint. The feature descriptor consists of 128 orientation histograms. In the figure below, you can see the image gradients, the keypoint descriptors, and the 128 features for one keypoint as a sequence of this operation that is used in SIFT. These are the references that I used and I recommend them to you. Thank you.