The Triplet Loss is one good way to learn

the parameters of a continent for face recognition.

There's another way to learn these parameters.

Let me show you how face recognition can also be posed

as a straight binary classification problem.

Another way to train a neural network,

is to take this pair of neural networks to take

this Siamese Network and have them both compute these embeddings,

maybe 128 dimensional embeddings,

maybe even higher dimensional,

and then have these be input to

a logistic regression unit to then just make a prediction.

Where the target output will be one if both of these are the same persons,

and zero if both of these are of different persons.

So, this is a way to treat face recognition just as a binary classification problem.

And this is an alternative to the triplet loss for training a system like this.

Now, what does this final logistic regression unit actually do?

The output y hat will be a sigmoid function,

applied to some set of features but rather than just feeding in,

these encodings, what you can do is take the differences between the encodings.

So, let me show you what I mean.

Let's say, I write a sum over K equals 1 to 128 of the absolute value,

taken element wise between the two different encodings.

Let me just finish writing this out and then we'll see what this means.

In this notation, f of x i is the encoding of the image x i

and the substitute k means to just select out the cave components of this vector.

This is taking the element Y's difference in absolute values between these two encodings.

And what you might do is think of these 128 numbers

as features that you then feed into logistic regression.

And, you'll find that little regression can have additional parameters w,

i, and b similar to a normal logistic regression unit.

And you would train appropriate waiting on these 128 features in

order to predict whether or not

these two images are of the same person or of different persons.

So, this will be one pretty useful way to

learn to predict zero or one whether these are the same person or different persons.

And there are a few other variations on how you can

compute this formula that I had underlined in green.

For example, another formula could be this k minus f of x j,

k squared divided by f of x i

on plus f of x j k. This is sometimes called the chi square form.

This is the Greek alphabet chi.

But this is sometimes called a chi square similarity.

And this and other variations are explored in this deep face paper,

which I referenced earlier as well.

So in this learning formulation,

the input is a pair of images,

so this is really your training input x and the output y

is either zero or one depending on whether you're inputting

a pair of similar or dissimilar images.

And same as before,

you're training is Siamese Network so that means that,

this neural network up here has parameters that are what they're

really tied to the parameters in this lower neural network.

And this system can work pretty well as well.

Lastly, just to mention,

one computational trick that can help neural deployment significantly, which is that,

if this is the new image,

so this is an employee walking in hoping that the turnstile

the doorway will open for them and that this is from your database image.

Then instead of having to compute,

this embedding every single time,

where you can do is actually pre-compute that,

so, when the new employee walks in,

what you can do is use this upper components to compute that encoding and use it,

then compare it to

your pre-computed encoding and then use that to make a prediction y hat.

Because you don't need to store the raw images and

also because if you have a very large database of employees,

you don't need to compute these encodings every single time for every employee database.

This idea of free computing,

some of these encodings can save a significant computation.

And this type of pre-computation works both for this type of

Siamese Central architecture where you

treat face recognition as a binary classification problem,

as well as, when you were learning encodings maybe using

the Triplet Loss function as described in the last couple of videos.

And so just to wrap up,

to treat face verification supervised learning,

you create a training set of just pairs of images now is

of triplets of pairs of images where the target label is one.

When these are a pair of pictures of the same person and where the tag label is zero,

when these are pictures of different persons and you use

different pairs to train

the neural network to train the scientists that were using back propagation.

So, this version that you just saw of treating face verification

and by extension face recognition as a binary classification problem,

this works quite well as well.

And so with that, I hope that you now know,

what it would take to train

your own face verification or your own face recognition system, one that can do one shot learning.