0:00

being effective in developing your deep

Â neural Nets requires that you not only

Â organize your parameters well but also

Â your hyper parameters so what are hyper

Â parameters let's take a look so the

Â parameters your model are W and B and

Â there are other things you need to tell

Â your learning algorithm such as the

Â learning rate alpha because on we need

Â to set alpha and that in turn will

Â determine how your parameters evolve or

Â maybe the number of iterations of

Â gradient descent you carry out your

Â learning algorithm has other you know

Â numbers that you need to set such as the

Â number of hidden layers so we call that

Â capital L or the number of hidden units

Â right such as zero and one and two and

Â so on and then you also have the choice

Â of activation function do you want to

Â use a rel you or ten age or a sigma

Â little something especially in the

Â hidden layers and so all of these things

Â are things that you need to tell your

Â learning algorithm and so these are

Â parameters that control the ultimate

Â parameters W and B and so we call all of

Â these things below hyper parameters

Â because these things like alpha the

Â learning rate the number of iterations

Â number of hidden layers and so on these

Â are all parameters that control W and B

Â so we call these things hyper parameters

Â because it is the hyper parameters that

Â you know somehow determine the final

Â value of the parameters W and B that you

Â end up with in fact deep learning has a

Â lot of different hyper parameters later

Â in the later course we'll see other

Â hyper parameters as well such as the

Â momentum term the mini batch size

Â 2:05

various forms of regularization

Â parameters and so on and if none of

Â these terms at the bottom make sense yet

Â don't worry about it we'll talk about

Â them in the second course because deep

Â learning has so many hyper parameters in

Â contrast to earlier errors of machine

Â learning I'm going to try to be very

Â consistent in calling the learning rate

Â alpha a hyper parameter rather than

Â calling the parameter I think in earlier

Â eras of machine learning when we didn't

Â have so many hyper parameters most of us

Â used to be a bit slow up here and just

Â call alpha a parameter and technically

Â alpha is a parameter but is a parameter

Â that determines the real parameters our

Â childhood consistent in calling these

Â things like alpha the number of

Â iterations and so on hyper parameters so

Â when you're training a deep net for your

Â own application you find that there may

Â be a lot of possible settings for the

Â hyper parameters that you need to just

Â try out so apply deep learning today is

Â a very imperiled process where often you

Â might have an idea for example you might

Â have an idea for the best value for the

Â learning rate you might say well maybe

Â alpha equals 0.01 I want to try that

Â then you implemented try it out and then

Â see how that works and then based on

Â that outcome you might say you know what

Â I've changed online I want to increase

Â the learning rate to 0.05 and so if

Â you're not sure what's the best value

Â for the learning ready-to-use you might

Â try one value of the learning rate alpha

Â and see their cost function j go down

Â like this then you might try a larger

Â value for the learning rate alpha and

Â see the cost function blow up and

Â diverge then you might try another

Â version and see it go down really fast

Â it's inverse to higher value you might

Â try another version and see it you know

Â see the cost function J do that then

Â I'll be China so the values you might

Â say okay looks like this the value of

Â alpha gives me a pretty fast learning

Â and allows me to converge to a lower

Â cost function jennice I'm going to use

Â this value of alpha you saw in a

Â previous slide that there are a lot of

Â different hybrid parameters and it turns

Â out that when you're starting on the new

Â application I should find it very

Â difficult to know in advance exactly

Â what's the best value of the hyper

Â parameters so what often happen is you

Â just have to try out many different

Â values and go around this cycle your

Â trial some value really try five hidden

Â layers with this many number of hidden

Â units implement that see if it works and

Â then iterate so the title of this slide

Â is that apply deep learning is very

Â empirical process and empirical process

Â is maybe a fancy way of saying you just

Â have to try a lot of things and see what

Â works another effect I've seen is that

Â deep learning today is applied to so

Â many problems ranging from computer

Â vision to speech recognition to natural

Â language processing to a lot of

Â structured data applications such as

Â maybe a online advertising or web search

Â or product recommendations and so on and

Â what I've seen is that first I've seen

Â researchers from one discipline any one

Â of these try to go to a different one

Â and sometimes the intuitions about hyper

Â parameters carries over and sometimes it

Â doesn't so I often advise people

Â especially when starting on a new

Â problem to just try out a range of

Â values and see what works and then mix

Â course we'll see a systematic way we'll

Â see some systematic ways for trying out

Â a range of values all right and second

Â even if you're working on one

Â application for a long time you know

Â maybe you're working on online

Â advertising as you make progress on the

Â problem is quite possible there the best

Â value for the learning rate a number of

Â hidden units and so on might change so

Â even if you tune your system to the best

Â value of hyper parameters to daily as

Â possible you find that the best value

Â might change a year from now maybe

Â because the computer infrastructure I'd

Â be it you know CPUs or the type of GPU

Â running on or something has changed but

Â so maybe one rule of thumb is you know

Â every now and then maybe every few

Â months if you're working on a problem

Â for an extended period of time for many

Â years just try a few values for the

Â hyper parameters and double check if

Â there's a better value for the hyper

Â parameters and as you do so you slowly

Â gain intuition as well about the hyper

Â parameters that work best for your

Â problems

Â and I know that this might seem like an

Â unsatisfying part of deep learning that

Â you just have to try on all the values

Â for these hyper parameters but maybe

Â this is one area where deep learning

Â research is still advancing and maybe

Â over time we'll be able to give better

Â guidance for the best hyper parameters

Â to use but it's also possible that

Â because CPUs and GPUs and networks and

Â data says are all changing and it is

Â possible that the guidance won't to

Â converge for some time and you just need

Â to keep trying out different values and

Â evaluate them on a hold on

Â cross-validation set or something and

Â pick the value that works for your

Â problems so that was a brief discussion

Â of hyper parameters in the second course

Â we'll also give some suggestions for how

Â to systematically explore the space of

Â hyper parameters but by now you actually

Â have pretty much all the tools you need

Â to do their programming exercise before

Â you do that adjust or share view one

Â more set of ideas which is I often ask

Â what does deep learning have to do the

Â human brain

Â