1:53

We're given a board, the current state of the game, and which player we are.

Â And from that we then do the following.

Â We run a certain number of trials.

Â This is a Monte Carlo simulation.

Â So, we want to run a large number of trials.

Â Those are going to be random.

Â So, where's the randomness.

Â Well, we start with that current board, and then we just randomly play out a game.

Â So, if we're told we're player x we start by putting an x on the board.

Â Then we put an o and an x and so on.

Â Right, until somebody wins or there's a draw

Â 2:19

then we have a final position of the board.

Â We take that board and we score it, and

Â we throw it away, and we start all over again.

Â We go back to the initial board and we play another random game.

Â When we're done we score it, we throw it away.

Â Start again with the initial board, play another random game, score it and so on.

Â And once we've done the number of trials that we desire.

Â We then add up all the scores across the trials right.

Â Now we have an aggregate score for all of the, the entire Monte Carlo simulation.

Â Take those scores, and we're going to end up with a score for each grid square.

Â We look at the empty squares on the original board,

Â and we find the empty square that has the highest score.

Â And that's the move were going to pick.

Â Right?

Â If there's only one it's easy, if there's

Â more than one well we just randomly pick, you

Â know, one of the, the squares that has

Â the highest score, right, if there's a tie, okay.

Â And that's it.

Â The questions now is simply how do we assign these scores?

Â 3:12

So, how to I actually score the board.

Â Well, all right, there are a lot of

Â different ways you can score tic-tac-toe board, but

Â I'm going to show you one that works well

Â with the Monte Carlo methods we're using here.

Â All right, so I, I start with a board here,

Â and this board is the result of a random trial.

Â It doesn't even matter where we started.

Â Okay.

Â This is just the result.

Â Right.

Â What does matter is who's who.

Â So X is the machine, player, and o is the human player, 'kay?

Â And we're going to give each of these constants.

Â We're going to say okay the machine player,

Â I'm going to use the constant MCMATCH to score.

Â And the human player, we're going to use the constant MCOTHER.

Â Okay.

Â These are going to match the costs that are

Â in the template that we've given you, all right?

Â And, what are we going to do with these costs?

Â Well, we're going to add or subtract them to each

Â square on the board, the question is how, all right?

Â Well, first we going to figure out who won the game, all right?

Â Well, it looks like o won the game, okay?

Â I can see three in a row, for o right here, all right.

Â And, so what that means is that I'm going to subtract

Â this MCMATCH value every time there's, I see and x, for y.

Â Well the X plays weren't very good.

Â I didn't win the game.

Â In fact, I lost the game.

Â 'Kay.

Â All right.

Â On the flip side though, o did win the game, so I'm going to add MCOTHER.

Â Every time I see a play, or a human player play, or a y because I actually

Â want to play there, so I can block

Â the human from actually winning this game, all right.

Â Now you have to actually pick these constants all right, to be sensible, but

Â for the point of this demonstration, I'm just going to set them to be one.

Â And to make them different, so that you can tell when I'm adding one or the other.

Â All right?

Â Okay, so then all we do is go through

Â each screed square on the board and we score it.

Â All right, so here's an empty square in the upper left,

Â well that one gets a score of zero right, because I don't

Â know, there was no x or o there, I don't know

Â if it was a good score, good place to play or not.

Â Right here's an x.

Â So I'm going to say, hey that was a bad spot, so I subtract 1.

Â Hee's no, that was a good spot, I'll add 2.

Â Okay, here's an x, subtract 1, subtract 1.

Â Here's an o add 2.

Â Here's an x subtract 1, o add 2.

Â O, add 2.

Â All right, now, I've scored this board.

Â So, basically I have these nine values or however many

Â scores there are if I was playing a different sized game.

Â Okay, and I will now just.

Â Say this is the score for this particular board.

Â I will sum them up, all the scores for all the boards,

Â across all the trials, and then I will get a result, okay.

Â Now, obviously every game's going to be different.

Â And, so let's imagine I have another game here,

Â I'm going to draw it smaller where, x actually wins.

Â So, let's imagine the final state looks something like this, okay.

Â Now, x has won the game.

Â 6:21

To each play location where o is played.

Â And intuitively again this makes sense, right, I want to try

Â and play in these squares where I actually won the game.

Â So this board you know, I'd get a plus one for each location on this, on this row.

Â And over here, I would subtract two on these locations with o where

Â o'd played, and then all these empty squares would get a value of zero.

Â Okay.

Â All right.

Â So now, again, you have to figure out who won the game,

Â which player is the machine player and then you score the board.

Â We could also have a draw.

Â When you have a draw there's no useful information

Â there, all the squares should just get zero, all

Â right, because it doesn't help me figure out how

Â to win or how to not lose the game.

Â 7:03

All right.

Â So, how do we implement this?

Â Well it comes down to four functions.

Â We have MC Trial or Monte Carlo Trial.

Â It takes a board and player.

Â So, this is the current board position.

Â And which player the machine is, all right.

Â And you play one random game.

Â This is a Monte Carlo Trial.

Â All right, and we have MC updates scores.

Â This takes the current.

Â You know, running total of the scores.

Â It takes a board, which is a completed game, so

Â this is one of these trials, the outcome of one

Â of these trials, and it takes which player the machine

Â player is, so that you know how to score the board.

Â You score the board, and then you add

Â that board's scores into the total scores, right.

Â And we have get_best_move, which takes the current board, and the scores

Â and tells you which move is best for the machine player to make next, okay?

Â And then finally, we have the mc_move

Â function, which takes the current board, a player,

Â and the number of trials, that you want to use for your machine player, okay?

Â And this is going to use the other functions.

Â To basically make a move.

Â All right.

Â 8:09

Now, we are giving you a template.

Â But I want to point out the template is missing a lot of stuff this time around.

Â We may be pushing you a little bit outside of your comfort zone.

Â You're going to have to write more things.

Â And you're going to have to write them

Â correctly, because the tester is assuming that you

Â did write them, you know, the, these functions the way that we've asked you to.

Â Okay.

Â All right.

Â It shouldn't be too difficult, all right.

Â But we want you to get used to the fact that you're

Â not always starting with a whole bunch of code in the screen, right.

Â Sometimes you're starting with almost an empty screen.

Â 'Kay.

Â Don't let that deter you.

Â I'm sure that you can do it.

Â 8:41

So I played tic-tac-toe with my kids.

Â Perhaps you do to.

Â It's not and overly exciting game though, right.

Â But, I want to point out that the

Â techniques that we're teaching you here transcend tic-tac-toe.

Â Monte Carlo methods can be used for all kinds of useful things.

Â Even if we're just talking about games, there are far more

Â complicated games out there as I'm sure you're well aware, right?

Â And If I have a game that has so many possible

Â outcomes, or so many possible ways that the game could play out.

Â From the current position it might be impossible to exhaustively sort

Â of enumerate those possibilities and figure out what the best one is.

Â In that case these kind of Money Carlo

Â methods maybe actually the best that you can do.

Â Now, actually, playing out random games and seeing what happens will give you

Â some good insights into which moves are good and which moves are bad.

Â And there are some good machine players for other types of games.

Â Which do exactly this, so while tic-tac-toe may seem silly,

Â the concepts and techniques you're learning here are extremely valuable.

Â