In this section, we'll explore neural networks. The topics in this section are perceptron, neural network architecture, convolutional neural networks, and recurrent neural networks. Neural networks have been a buzzword recently especially within the deep learning revolution that we have seen in the past few years. The concept of the neural network first emerged in the 1950s and began to show up in commercial applications as early as 1962. The simplest neural network is a perceptron. Perceptron is a single layer neural network that uses a list of input features. For example, X1 to Xn. X1 can be the cost, Xn can be the ratings, and these are all feature vectors. In this case, the goal might be to determine whether the customer will or will not buy something based on the inputs. In addition to the input features, there is also the bias term, which is like an intercept in your linear regression models. Actually, all those features are combined together just as in the linear regression. So we have this linear combination of features from the input feature, space, as well as the intercept. Once we have the linear combination, then we'll apply an activation function. This activation function is usually non-linear, and really depends on the problem you're trying to solve. In this particular case, our response variable is a binary, one for buy, or zero for not to buy a particular product. The natural way to apply the activation function is by using the sigmoid function, we saw in the last section. Perceptron is really a very simple network. We only have one layer, which is composed of the sum of the linear inputs and the activation function, which all connect up to the output. The idea of perceptrons was introduced a long time ago and over time people have added more layers to the perceptron. For example, here we have the input layer, we have the original layer, and we also have some hidden layers. Once we have multiple layers, it becomes a neural network. Neural networks usually contains multiple layers and within each layer, there are many nodes because the neural network structure is rather complicated. Since there are a lot of parameters in the model, neural networks are usually very difficult to interpret. Neural networks are also expensive to train because the structure of the neural network can be complicated, and so the number of parameters to be estimated is very large. In Scikit-learn, there are indeed some neural network frameworks for us to use. But since the deep learning revolution, people have been developing frameworks to design, train, and estimate, and implement neural networks in a brand new fashion. This has led to the creation of many layers and within each layer, we have many nodes, so the input data can be much more larger than before. Because this training data in neural network is very large and very high-quality, can be very useful. In the deep learning frameworks that are popular today including MX net, TensorFlow, Cafe, and PyTorch. Those are developed from different sources, but all of them can be accessed from Python. Using Python, we can design, train, fit, and implement a neural network very easily for any deep learning framework. One specific neural network that's very useful for image analysis is called Convolutional Neural Networks. In convolution neural network, the input is either an image or a sequence image that has waited. For an image, we're using kernels as filters to extract local features. In the example shown here, we have the input image and we're using filters to convolve with the image to create the next layer. Depending on how many filters we're using, will have different layers or different channels in the output from the convolutional layer, one in this particular case. Another concept in your convolutional neural networks is the pooling layer. Once you have a particular output, you may want to reduce the size of bit. To do this, we can use max pooling or average pooling. We will reduce to just a single scalar by taking the maximum of the two by two or taking the average of the two-by-two. The pooling layer is virtually a dimension reduction process. Based on the application of convolutional neural networks, you really have a lot of layers and the number of dimensions is pretty high. We need to reduce the size of the data for better convergence. We can add a few different layers for the convolution neural networks, but at the end of the day, we are to convert the tensor into a vector and make it become a fully-connected layer. The fully connected layer will be used to link to the output. The output is usually a particular category of the graph or the image that is contains. For example, the output from this image could be a digit zero. By the training process, we have a lot of good labeled data. By using convolutional neural networks, we can try to find out the best number of filters and the variance in the filters that will give us a near human-level accuracy for image recognition. In this particular case, for handwritten digit recognition, we can achieve a neural near human level of accuracy. Another type of neural network is called Recurrent Neural Network. For the feed-forward, neural network and the convolutional neural network, the input data is relatively independent. The neural network cannot model the dependent structure among different input observations. But often for time, sharing data, or any other natural language processing or translation applications, the sequence of input data really means something. When the data involves sequential features or time sharing features, the recurrent neural network is the right way to go. For example, in this high-level conceptual illustration of a recurrent neural network, we have the input layer, and output layer, and all of the hidden layers. Within the input to this recurrent neural network, there are a set of characters, but they do have meanings as a sequence. So each individual word, each individual character doesn't mean much until we have a sequential relationship among them. During the training process, information flow is not just in one direction. The information flow is actually reused in propagating through different nodes at different sequences. In the final result, the input layer and output layer are actually connected with this recurrent neural network. I'm [inaudible] and thank you for watching.