(Music) In this video we will provide an overview of Neural Networks using an example with one Hidden layer In this video we will cover: Introduction to Neural Networks with One Hidden Layer featuring Two Neurons. Create a Neural Network with One Hidden Layer using nn.Module. Create a Neural Network with One Hidden Layer using nn.Sequential. FInally, how to Train the Neural Network model. A neural network is a function that can be used to approximate most functions using a set of parameters. Let's look at an example for classification. Let's look at an example of classification where we overlaid the color over the feature. In the context of neural networks its helpful to think of the classification problem as a decision function. Just like a function when y equals one the value is mapped to one on the vertical axis. We can represent the function like a box, this box function is an example of a decision function. Any values of x in the following region is one, any value of x in this region is mapped to zero. Let’s Build a Neural Network with a few Linear Classifiers. In this example we cannot use a straight line to separate the data. We can also view the problem as trying to approximate the box function using logistic regression. This line can be used to linearly separate some of the data, but some of the data is on the wrong side of the line. We can use the following node to represent the line and the edges to represent the input x and output z If we apply the logistic function, in the context of neural networks this is called the activation function. These values of the function are incorrect and we get an incorrect result in this region. We can represent the sigmoid function with the following node taking the input z from the linear function and producing an output, technically “A” is a function of z and x. We will call the function “A” the activation function and the output of of “A” is called the activation. This line can also be used to linearly separate some of the data but some of the data is on the wrong side of the line. this line looks like it can be use to separate the data but lets see what happens when we apply the sigmoid function. After applying the sigmoid or activation function, we get an incorrect result for most of the samples Consider the following sigmoid functions we call them "A sub script one” and “A sub script two”. If we subtract the second sigmoid function from the first sigmoid function. We get something similar to the decision function, we can also apply the following operations with a linear function i.e just subtract the second activations from the first activation function. we will use the following graph to represent the the operation. The "A sub script one” and “A sub script two” represent the two sigmoid functions. The superscript represent which layer of the neural network we are on. If we apply a threshold setting every value less than 0.5 to zero and grater than 0.5 to one. We get the exact function we are trying to approximate. We can use the graph to represent the process, we apply two linear functions to x and we get two outputs. to each linear function we apply a sigmoid, we then apply a second linear function to the outputs of the sigmoid. We usually apply another function to the output of this linear function then apply a threshold. This diagram is used to represent a two-layer neural network, we have the hidden layer. The second layer is called the output layer. Each linear function and activation is known as an artificial neuron. In this case the hidden layer has two artificial neurons. The output layer has one artificial neuron. As models get more complicated. we will use the following representation, the non linear term overlapping the linear term, some times we will leave out the bias term. As models get more complicated, we will sometimes only show the number of layers and artificial neurons and drop the parameters. Its helpful to look at the outputs of each component of the neural network. The output of the linear component is a 2D plane that looks like this. These red data points get mapped to these points in the 2D plane. These blue data points get mapped to these points in the 2D plane and so on. This output is not very insightful. Let's look at the activation, the outputs of the activation function is a 2D plane that looks like this. These red data points get mapped to these points in the 2D plane These blue data points get mapped to these points in the 2D plane and so on. It turns out that we can split the point using the following plane. This is what the linear function on the second layer does. Lets see how to built a neural network in PyTorch. We will need to import the following libraries, in this case we import sigmoid directly. Here is the class we created for a neural network we repeatedly apply linear functions and activation functions. We will call the class net. Let's review the different components for the network we discussed earlier, we use the linear class to represent each layer. The hidden layer is given by the first linear constructor. The output layer is given by the second linear constructor. In the constructor, we start with the size of the input to the network In this case one, this is used in the constructor to the first linear function in the class or module. In this class H is the number of neurons, it is the size of the output of the linear function. In this case H is two. H is also the number inputs to the second linear layer. D under score out is the size of the output layer. After we create a neural network object, we can apply it to a tensor as follows. Just like the other methods forward makes the prediction. Let's see what happens. The linear operation is applied to x. the function sigmoid applies a sigmoid function to the two outputs of the first linear function. The second linear function is applied to the output of the two sigmoid functions. The second sigmoid function applies a sigmoid function to the single output of the linear function. The output is then returned. In summary, we apply two linear functions to x and we get two outputs to each linear function we apply a sigmoid. We then apply a second linear function to the outputs of the sigmoid We usually apply another function to scale the output. Just a note, we can also use Neural networks for regression by simply removing the last sigmoid function and changing the loss function, we will show you how in some of the labs. A neural network is essentially performing matrix multiplication, let's look at the matrix interpretation. As you recall, linear is essentially a matrix. Let's represent the sample x as follows with an orange box. We can interpret the operations as follows in this case our matrix W has 1 row and two columns similar for the bias. Is the number of columns in the matrix is W is two representing the number of neurons. The number of rows is 1 representing the input size. We will apply a sigmoid to each of the elements of self dot linear, the result is still a two by two tensor. the second linear applies the following matrix operation. The parameters has 2 rows representing the size of the inputs and 1 column representing the neuron. We then apply a sigmoid function to the output. For multiple samples the process is the same, we can represent it as a matrix operation. The operation is applied to every row in x, as a result each row in z represents a sample and each column is the output of the artificial neuron to that particular sample. We then apply the sigmoid output. We then apply the linear function to each row and we get an output for each row. We then apply a sigmoid function to each output, the result is a tensor. to get a discrete value you will need to apply a threshold, here is one way to do it. The method state dictionary has the model parameters, we have the parameters for the first linear term as well as and the second linear term. We have the parameters for the first linear term and the second linear term. Let review nn.Sequential. The process for Sequential is the same, we input the linear constructor with the input and output dimensions. we add the sigmoid function, we add the second linear function. We add the final Sigmoid function, we can then apply the model to a tensor. Let's see how to train the model. The training procedure is similar to the other methods. We create the data, we create a training function. In this case, we cumulate the loss iteratively to obtain the cost, we will try different metrics in different labs. The process for training is identical to logistic regression, we create a BCE Loss. Then, we create the X and Y values for our dataset We create a our model, we specify two hidden layers, we create an optimizer, then train the model. See the lab for more. (Music)