In this lecture we're going to talk about the ILSVRC. The ImageNet Large Scale Visual Recognition Challenge. Now, this is significant because there are very few places that you can have these machine learning. These Deep Learning technologies to compare and compete. To see really which one is how much better than the other one and this here is a representative challenge, the representative competition that is held annually. So, it's very interesting, it's very rare, so let's take a look into it. We look into the ImageNet Large Scale Visual Recognition Challenge and, for short, they call it the ImageNet Challenge. Even shorter, they just call it ImageNet. Now, in the ImageNet Challenge, this is an annual contest that started in 2010. The object category is classification and detection. Where three main challenges used to benchmark large-scale object recognition capability. Where there's one competition that goes for object localization, where you pick out the top five, so therefore it's called top-5. Then there's two competitions, two challenges for object detection. Basically one for pictures, images, and one for motion video. Number one, is the first one, object localization, also called top-5. This is the original competition of the ImageNet Challenge. The other two were added on more recently. Training data set is 1.2 million images. The labeled object categories are 1,000 categories. The test image set is 150,000 photographs. Each competing program lists its top 5 confident labels based on each test image in decreasing order of confidence and bounding boxes for each class label. What are these bounding boxes? Well, on an image, when you identify a certain object there's a box that's put around it. This image, this is what I'm claiming it is. And that's what the box is talking about. Here in the top-5 competition this is evaluated based on accuracy of the program's localization labeling results, the test image's ground truth labels and object in the bounding boxes. Where program with the minimum average error rate is the winner. This is an example of the top-5. Whereas you can see right here, top-5 selections for each image is listed with its probability and a histogram is added to it, in this example here. Then comes a second competition for pictures, for still images. And this is the object detection, the second competition category. Each program attempts identification of 200 basic-level categories, and test image data set has fully annotated labels on each image. The winner is the program with the highest accuracy in annotated class labels, confidence scores, and bounding boxes. The third category of the competition is for motion video, and its object detection on these videos. The contender program attempts identification of 30 basic-level categories. This is among the subset of the 200 basic-level categories used in the second competition for pictures, for the still images. Each program will produce a set of annotations of frame number, class labels, confidence scores, and bounding boxes on the video clip in real time as the video is run. The winner is the program with the highest accuracy on the most object categories. Let's look at some of the participant's program requirements. Each team is allowed 2 submissions per week, no regulations on the number of neural network layers. So, you can go as deep of a neural network as you need or want, and learning scheme and parameters have to be based only on the training set. Now in deep neural network technology, one thing to notice is that, just because you add on more parameters, just because you add on more layers and layers and layers does not necessarily make it more intelligent, more accurate. It may just take you more time to train, and actually your performance may worsen after a certain level, after a certain number of parameters and certain number of layers that were added. So therefore it takes a lot of knowledge and experience to figure out the best, the optimal number of layers and parameters to use to solve the target problem. And that's the challenge of the design that is done by these experts, companies, and expert organizations and universities that submit their programs to this challenge. It's truly significant. And the competition results are also very interesting. We'll soon take a look at that. The annual results, let's just look at some of the significant ones that win. Now, significant improvements in performance have been seen since Deep Learning has been used in 2012. Where for the AlexNet, which used over there in 2012, shown a significant performance improvement. This is once again the first time that Deep Learning was used. Here, the 2012 winner AlexNet, it was named after its first inventor, lead inventor and it was developed by Geoffrey Hinton's research group at the University of Toronto. The results show that from the 2011 winner Xerox to the 2012 AlexNet system that we have here, you can see that a 9.4% improvement in performance was obtained in the top-5 category. The object localization. Now, some of the system characteristics, what was used to make this works so well? Well, Deep Learning was used for the first time, a Deep Learning neural network which had 8 layers. And 5 convolutional layers were used with 3 fully connected layers. 60 million parameters were trained for 6 days and they used two Nvidia GTX-580s with 3 GB of memory. Another recognizable thing to notice is the 2014 winner, the GoogleNet. Which was the Inception-v1. Here, this resulted in a 93.3 accuracy in Object Localization in the top-5 localization results. This is significant because the 90% accuracy level was exceeded for the first time. The system characteristics that were used in the GoogleNet is 22 layer Deep Learning neural network with 5 million parameters that were trained for 1 week on the Google DistBelief cluster. The 2015 winner was the Microsoft ResNet, and it resulted in a 96.5% accuracy in the Object Localization, the top-5 competition. This is significant because the 94.9% human accuracy level was exceeded for the first time. The system characteristics that were used was a 152 layer Deep Learning neural network. Trained for approximately 3 weeks on 4 NVIDIA Tesla K80 GPUs using a combined processing capability of 11.3 GFLOPs. These are the references that I used and I recommend them to you. Thank you.