In this video, you're going to learn to implement

feed-forward networks with Keras and

build a little application to predict handwritten digits.

In the introduction to deep learning in this course,

you've learned about multi-layer perceptrons or MLPs for short.

These kinds of networks are also sometimes called densely-connected networks.

And to build them, we essentially have to sack layers

of so-called dense layers on top of each other with activations.

A technique that we're going to use for regularization is called a Dropout,

and we build Keras Dropout layers into our network to achieve that.

We then build a sequential model from both Dense and Dropout layers

to arrive at a application.

Okay, let's start with Dense layers.

To initialize the Dense layer,

we'll have to do a few things.

First off, you always have to specify the number of output neurons or

units that the layer is going to have.

Secondly, usually, you want

to provide an activation function. So, if you don't, there won't be any.

Okay.

So, there's None right there as a default keyword.

And if you want to have a Sigmoid,

then simply put the name Sigmoid there,

or [inaudible] , whatever you like.

The third argument is use_bias which set to true,

which indicates that we are using a bias term.

And you probably shouldn't touch that unless you know what you're doing.

And the last two keywords in this signature are the kernel_initializer and

the bias_initializer which are set to specific initialization that meets,

so the kernel, or the weights of this dense layer are set to

glorot_uniform initialization and the bias's are simply set to zero.

So, unless you know a lot about

initialization which we don't really cover in this lecture here,

you probably shouldn't touch many of the keywords that are provided in Keras for you.

So Dropout layers are much easier to specify.

Actually, you just have to specify rate.

Meaning, a value between zero and one,

which indicates the fraction of units to drop in each forward pass.

If you want, you can also specify random seed for reproducibility.

All right, let's move on to a building, an actual application.

We're going to use the mnist dataset of hundred digits.

The mnist datasets consist of 60,000 train sample and 10,000 samples for tests.

And each individual sample is a 28 by 28 image which has a handwritten digits on it.

The labels are simply encoded as the actual digits, 0-9.

So to built this application, you first,

import the mnist dataset from Keras and also import the [inaudible] function of these later on.

And our sequential model,

and the two layers that we're going to use, Dense and Dropout.

Okay. The first thing, we specify here is the batch_size.

And we set it to 128.

This batch_size will be used in the forward pass and also for the predictions.

The number of classes is the number of digits there are [inaudible].

And we are going to train on that for 20 epochs in total.

So, and something what we have to do to load data is called,

mnist.load_data to retrieve training and test features and [inaudible].

Okay. Next step is data pre-processing.

So I mentioned before that,

the mnist samples of 28 by 28 images,

and we need to flatten them to 784 vector to feed them into dense layer.

So first, we're going to reshape both train and test data.

Then, set them to a float type,

and divide them by 255 to arrive at values that lie between zero and one.

As a last step in pre-processing,

we're going to [inaudible] the labels that we have,

with our function two categorical.

So that means, for instance,

if we have a label with the number zero on it,

is that with zero,

this is going to be transformed into a vector of length 10 that has all zeros

but one at the first place.

Next, we can proceed to defining and running our model.

So we start by initializing a sequential model and then,

adding Dense and Dropout layers one by one. All right.

In first layer, you see that we also specify the input shape,

which is essentially 784,

the length of our vectors.

This input shape has to be provided only in the first layer

and succeeding shapes and other layers are then referred by Keras for you.

So as you can see, we have three Dense layers in total.

Then one with output length of 512.

Another one with 512.

And then, the final layer as 10 output classes.

And we also adds two Dropout layers with a drop rate of 20 percent. All right.

Once we have specified our model,

we can get a summary, print it on the command line by imposing model of that summary.

Next, we compile our model with categorical_crossentropy,

and specify the optimizer as to [inaudible] and also evaluate

the accuracy metric Okay.

So we can then fit our model with the train data that we have.

We set the batch size as defined previously in the epochs.

And we can also specify validation data namely the test data that we've updated.

So the last step we do in this model is we create a score by evaluating the model.

In this case, we get back a pair when we did

test loss and the accuracy which we print to the command line as well.

If you do so, you should achieve about 98 percent accuracy with this model. All right.

That's it's for Multi Layer Perceptrons.

In the next lecture, we learn about [inaudible] with pairs and [inaudible] in particular.