0:00

Welcome to the last video for this week.

Â There are many great, deep learning programming frameworks.

Â One of them is TensorFlow.

Â I'm excited to help you start to learn to use TensorFlow.

Â What I want to do in this video is show you the basic structure

Â of a TensorFlow program, and then leave you to practice, learn more details, and

Â practice them yourself in this week's problem exercise.

Â This week's problem exercise will take some time to do so

Â please be sure to leave some extra time to do it.

Â As a motivating problem,

Â let's say that you have some cost function J that you want to minimize.

Â And for this example, I'm going to use this highly simple

Â cost function J(w) = w squared- 10w + 25.

Â So that's the cost function.

Â You might notice that this function is actually (w- 5) squared.

Â If you expand out this quadratic, you get the expression above, and so

Â the value of w that minimizes this is w = 5.

Â But let's say we didn't know that, and you just have this function.

Â Let's see how you can implement something in TensorFlow to minimize this.

Â Because a very similar structure of program can be used to train neural

Â networks where you can have some complicated cost function J(w,

Â b) depending on all the parameters of your neural network.

Â And the, similarly, you'll be able to use TensorFlow so

Â automatically try to find values of w and b that minimize this cost function.

Â But let's start with the simpler example on the left.

Â So, I'm running Python in my Jupyter notebook and to start up TensorFlow,

Â you import numpy as np and it's idiomatic to use import tensorflow as tf.

Â Next, let me define the parameter w.

Â So in TensorFlow, you're going to use tf.Variable to define a parameter.

Â 2:08

And then let's define the cost function.

Â So remember the cost function was w squared- 10w + 25.

Â So let me use tf.add.

Â So I'm going to have w squared + tf.multiply.

Â So the second term was -10.w.

Â And then I'm going to add that 25.

Â So let me put another tf.add over there.

Â So that defines the cost J that we had.

Â And then, I'm going to write train =

Â tf.train.GradientDescentOptimizer.

Â Let's use a learning rate of 0.01 and the goal is to minimize the cost.

Â And finally, the following few lines are quite idiomatic.

Â Init = tf.global_variables_initializer and

Â then session = tf.Sessions.

Â So it starts a TensorFlow session.

Â Session.run init to initialize global variables.

Â And then, for TensorFlow's evaluative variable, we're going to use sess.run w.

Â We haven't done anything yet.

Â So with this line above, initialize w to zero and define a cost function.

Â We define train to be our learning algorithm which uses

Â a GradientDescentOptimizer to minimize the cost function.

Â But we haven't actually run the learning algorithm yet, so

Â session.run, we evaluate w, and let me print(session.run(w).

Â So if we run that,

Â it evaluates w to be equal to 0 because we haven't run anything yet.

Â Now, let's do session.run train.

Â So what this will do is run one step of GradientDescent.

Â And then let's evaluate the value of w after one

Â step of GradientDescent and print that.

Â So we do that of the one set of GradientDescent, w is now 0.1.

Â Let's now run 1000 iterations of GradientDescent so .run(train).

Â 4:35

And lets then print(session.run w).

Â So this would run a 1,000 iterations of GradientDescent,

Â and at the end w ends up being 4.9999.

Â Remember, we said that we're minimizing w- 5 squared so

Â the absolute value of w is 5 and it got very close to this.

Â So hope this gives you a sense of the broad structure of a TensorFlow program.

Â And as you do the following exercise and play with more TensorFlow course yourself,

Â some of these functions that I'm using here will become more familiar.

Â Some things to notice about this, w is the parameter which I optimize so

Â we're going to declare that as a variable.

Â And notice that all we had to do was define a cost function using these add and

Â multiply and so on functions.

Â And TensorFlow knows automatically how to take derivatives with respect to the add

Â and multiply as was other functions.

Â Which is why you only had to implement basically four prop and

Â it can figure out how to do the back problem or the grading computation.

Â Because that's already built in to the add and

Â multiply as well as the squaring functions.

Â By the way, in cases notation seems really ugly,

Â TensorFlow actually has overloaded the computation for

Â the usual plus, minus, and so on.

Â So you could also just write this nicer format for the cost and

Â comment that out and rerun this and get the same result.

Â So once w is declared to be a TensorFlow variable, the squaring, multiplication,

Â adding, and subtraction operations are overloaded.

Â So you don't need to use this uglier syntax that I had above.

Â Now, there's just one more feature of TensorFlow that I want to show you,

Â which is this example minimize a fix function of w.

Â One of the function you want to minimize is the function of your training set.

Â So whether you have some training data, x and

Â when you're training a neural network the training data x can change.

Â So how do you get training data into a TensorFlow program?

Â So I'm going to define t and

Â x which is think of this as train a relevant training data or

Â really the training data with both x and y, but we only have x in this example.

Â So just going to define x with placeholder and

Â it's going to be a type float32 and let's make this a [3,1] array.

Â And what I'm going to do is whereas the cost here have fixed coefficients in

Â front of the three terms in this quadratic was 1 times w squared- 10*w + 25.

Â We could turn these numbers 1- 10 and 25 into data.

Â So what I'm going to do is replace the cost

Â with cost = x[0][0]*w squared

Â + x[1][0]*w + x[2][0].

Â Well, times 1.

Â So now x becomes sort of like data that controls

Â the coefficients of this quadratic function.

Â And this placeholder function tells TensorFlow that

Â x is something that you provide the values for later.

Â 9:14

And now, if you want to change the coefficients of this quadratic function,

Â let's say you take this [-10.] and change it to 20, [-20].

Â And let's change this to 100.

Â So this is now a function x- 10 squared.

Â And if I re-run this, hopefully,

Â I find that the value that minimizes x- 10 squared is w = 10.

Â Let's see, cool, great and

Â we get w very close to 10 after running 1,000 integrations of GradientDescent.

Â So what you see more of when you do that from exercise is that a placeholder in

Â TensorFlow is a variable whose value you assign later.

Â And this is a convenient way to get your training data into the cost function.

Â And the way you get your data into the cost function is with this syntax

Â when you're running a training iteration to

Â use the feed_dict to set x to be equal to the coefficients here.

Â And if you are doing mini batch GradientDescent where on each iteration,

Â you need to plug in a different mini batch, then on different iterations you

Â use the feed_dict to feed in different subsets of your training sets.

Â Different mini batches into where your cost function is expecting to see data.

Â So hopefully this gives you a sense of what TensorFlow can do.

Â And the thing that makes this so

Â powerful is all you need to do is specify how to compute the cost function.

Â And then, it takes derivatives and

Â it can apply a gradient optimizer or an add-on optimizer or

Â some other optimizer with just pretty much one or two lines of codes.

Â So here's the code again.

Â I've cleaned this up just a little bit.

Â And in case some of these functions or

Â variables seem a little bit mysterious to use, they will become more familiar after

Â you've practiced with it a couple times by working through their problem exercise.

Â Just one last thing I want to mention.

Â These three lines of code are quite idiomatic in TensorFlow, and

Â what some programmers will do is use this alternative format.

Â Which basically does the same thing.

Â Set session to tf.Session() to start the session,

Â and then use the session to run init, and

Â then use the session to evaluate, say, w and then print the result.

Â But this with construction is used in a number of TensorFlow programs as well.

Â It more or less means the same thing as the thing on the left.

Â But the with command in Python is a little bit better at cleaning up in

Â cases an error in exception while executing this inner loop.

Â So you see this is the following exercise as well.

Â So what is this code really doing?

Â Let's focus on this equation.

Â The heart of a TensorFlow program is something to compute at cost, and then

Â TensorFlow automatically figures out the derivatives in how to minimize that costs.

Â So what this equation or what this line of code is doing

Â is allowing TensorFlow to construct a computation draft.

Â And a computation draft does the following, it takes x[0][0],

Â it takes w and then it goes w gets squared.

Â 12:33

And then x[0][0] gets multiplied with w squared,

Â so you have x[0][0]*w squared, and so on, right?

Â And eventually, you know, this gets built up to compute this xw,

Â x[0][0]*w squared + x[1][0]*w + and so on.

Â And so eventually, you get the cost function.

Â And so the last term to be added would be x [2][0] where

Â it gets added to be the cost.

Â I won't write other format for the cost.

Â And the nice thing about TensorFlow is that by implementing basically four

Â prop applications through this computation draft, the computed cost,

Â TensorFlow already has that built in.

Â All the necessary backward functions.

Â So remember how training a deep neural network has a set of forward functions

Â instead of backward functions.

Â Programming frameworks like Tensor Flow have already built-in the necessary

Â backward functions.

Â Which is why by using the built-in functions to compute the forward function,

Â it can automatically do the backward functions as well to implement back

Â propagation through even very complicated functions and compute derivatives for you.

Â So that's why you don't need to explicitly implement back prop.

Â This is one of the things that makes the programming frameworks

Â help you become really efficient.

Â If you look at the TensorFlow documentation,

Â I just want to point out that the TensorFlow documentation

Â uses a slightly different notation than I did for drawing the computation draft.

Â So it uses x[0][0] w.

Â And then, rather than writing the value, like w squared,

Â the TensorFlow documentation tends to just write the operation.

Â So this would be a, square operation,

Â and then these two get combined in the multiplication operation and so on.

Â And then, a final note, I guess that would be

Â an addition operation where you add x to 0 to find the final value.

Â So for the purposes of this class, I thought that this notation for

Â the computation draft would be easier for you to understand.

Â But if you look at the TensorFlow documentation,

Â if you look at the computation drafts in the documentation,

Â you see this alternative convention where the notes are labeled

Â with the operations rather than with the value.

Â But both of these representations

Â represent basically the same computation draft.

Â And there are a lot of things that you can with just one line of code in programming

Â frameworks.

Â For example, if you don't want to use GradientDescent, but

Â instead you want to use the add-on Optimizer by changing this line of code,

Â you can very quickly swap it, swap in a better optimization algorithm.

Â So all the modern deep learning programming framework support

Â things like this and

Â makes it really easy for you to code up even pretty complex neural networks.

Â So I hope this is helpful for

Â giving you a sense of the typical structure of a TensorFlow program.

Â To recap the material from this week,

Â you saw how to systematically organize the hyper parameter search process.

Â We also talked about batch normalization and

Â how you can use that to speed up training of your neural networks.

Â And finally, we talked about programming frameworks of deep learning.

Â There are many great programming frameworks.

Â And we had this last video focusing on TensorFlow.

Â With that, I hope you enjoyed this week's programming exercise and

Â that helps you gain even more familiarity with these ideas.

Â