1:53

We're given a board, the current state of the game, and which player we are.

And from that we then do the following.

We run a certain number of trials.

This is a Monte Carlo simulation.

So, we want to run a large number of trials.

Those are going to be random.

So, where's the randomness.

Well, we start with that current board, and then we just randomly play out a game.

So, if we're told we're player x we start by putting an x on the board.

Then we put an o and an x and so on.

Right, until somebody wins or there's a draw

2:19

then we have a final position of the board.

We take that board and we score it, and

we throw it away, and we start all over again.

We go back to the initial board and we play another random game.

When we're done we score it, we throw it away.

Start again with the initial board, play another random game, score it and so on.

And once we've done the number of trials that we desire.

We then add up all the scores across the trials right.

Now we have an aggregate score for all of the, the entire Monte Carlo simulation.

Take those scores, and we're going to end up with a score for each grid square.

We look at the empty squares on the original board,

and we find the empty square that has the highest score.

And that's the move were going to pick.

Right?

If there's only one it's easy, if there's

more than one well we just randomly pick, you

know, one of the, the squares that has

the highest score, right, if there's a tie, okay.

And that's it.

The questions now is simply how do we assign these scores?

3:12

So, how to I actually score the board.

Well, all right, there are a lot of

different ways you can score tic-tac-toe board, but

I'm going to show you one that works well

with the Monte Carlo methods we're using here.

All right, so I, I start with a board here,

and this board is the result of a random trial.

It doesn't even matter where we started.

Okay.

This is just the result.

Right.

What does matter is who's who.

So X is the machine, player, and o is the human player, 'kay?

And we're going to give each of these constants.

We're going to say okay the machine player,

I'm going to use the constant MCMATCH to score.

And the human player, we're going to use the constant MCOTHER.

Okay.

These are going to match the costs that are

in the template that we've given you, all right?

And, what are we going to do with these costs?

Well, we're going to add or subtract them to each

square on the board, the question is how, all right?

Well, first we going to figure out who won the game, all right?

Well, it looks like o won the game, okay?

I can see three in a row, for o right here, all right.

And, so what that means is that I'm going to subtract

this MCMATCH value every time there's, I see and x, for y.

Well the X plays weren't very good.

I didn't win the game.

In fact, I lost the game.

'Kay.

All right.

On the flip side though, o did win the game, so I'm going to add MCOTHER.

Every time I see a play, or a human player play, or a y because I actually

want to play there, so I can block

the human from actually winning this game, all right.

Now you have to actually pick these constants all right, to be sensible, but

for the point of this demonstration, I'm just going to set them to be one.

And to make them different, so that you can tell when I'm adding one or the other.

All right?

Okay, so then all we do is go through

each screed square on the board and we score it.

All right, so here's an empty square in the upper left,

well that one gets a score of zero right, because I don't

know, there was no x or o there, I don't know

if it was a good score, good place to play or not.

Right here's an x.

So I'm going to say, hey that was a bad spot, so I subtract 1.

Hee's no, that was a good spot, I'll add 2.

Okay, here's an x, subtract 1, subtract 1.

Here's an o add 2.

Here's an x subtract 1, o add 2.

O, add 2.

All right, now, I've scored this board.

So, basically I have these nine values or however many

scores there are if I was playing a different sized game.

Okay, and I will now just.

Say this is the score for this particular board.

I will sum them up, all the scores for all the boards,

across all the trials, and then I will get a result, okay.

Now, obviously every game's going to be different.

And, so let's imagine I have another game here,

I'm going to draw it smaller where, x actually wins.

So, let's imagine the final state looks something like this, okay.

Now, x has won the game.

6:21

To each play location where o is played.

And intuitively again this makes sense, right, I want to try

and play in these squares where I actually won the game.

So this board you know, I'd get a plus one for each location on this, on this row.

And over here, I would subtract two on these locations with o where

o'd played, and then all these empty squares would get a value of zero.

Okay.

All right.

So now, again, you have to figure out who won the game,

which player is the machine player and then you score the board.

We could also have a draw.

When you have a draw there's no useful information

there, all the squares should just get zero, all

right, because it doesn't help me figure out how

to win or how to not lose the game.

7:03

All right.

So, how do we implement this?

Well it comes down to four functions.

We have MC Trial or Monte Carlo Trial.

It takes a board and player.

So, this is the current board position.

And which player the machine is, all right.

And you play one random game.

This is a Monte Carlo Trial.

All right, and we have MC updates scores.

This takes the current.

You know, running total of the scores.

It takes a board, which is a completed game, so

this is one of these trials, the outcome of one

of these trials, and it takes which player the machine

player is, so that you know how to score the board.

You score the board, and then you add

that board's scores into the total scores, right.

And we have get_best_move, which takes the current board, and the scores

and tells you which move is best for the machine player to make next, okay?

And then finally, we have the mc_move

function, which takes the current board, a player,

and the number of trials, that you want to use for your machine player, okay?

And this is going to use the other functions.

To basically make a move.

All right.

8:09

Now, we are giving you a template.

But I want to point out the template is missing a lot of stuff this time around.

We may be pushing you a little bit outside of your comfort zone.

You're going to have to write more things.

And you're going to have to write them

correctly, because the tester is assuming that you

did write them, you know, the, these functions the way that we've asked you to.

Okay.

All right.

It shouldn't be too difficult, all right.

But we want you to get used to the fact that you're

not always starting with a whole bunch of code in the screen, right.

Sometimes you're starting with almost an empty screen.

'Kay.

Don't let that deter you.

I'm sure that you can do it.

8:41

So I played tic-tac-toe with my kids.

Perhaps you do to.

It's not and overly exciting game though, right.

But, I want to point out that the

techniques that we're teaching you here transcend tic-tac-toe.

Monte Carlo methods can be used for all kinds of useful things.

Even if we're just talking about games, there are far more

complicated games out there as I'm sure you're well aware, right?

And If I have a game that has so many possible

outcomes, or so many possible ways that the game could play out.

From the current position it might be impossible to exhaustively sort

of enumerate those possibilities and figure out what the best one is.

In that case these kind of Money Carlo

methods maybe actually the best that you can do.

Now, actually, playing out random games and seeing what happens will give you

some good insights into which moves are good and which moves are bad.

And there are some good machine players for other types of games.

Which do exactly this, so while tic-tac-toe may seem silly,

the concepts and techniques you're learning here are extremely valuable.