0:00

Let us know speak about Backward Induction.

And Backward Induction can be viewed as a way of computing the subgame perfect

equilibrium of a game. It's a procedure that's used, widely.

Or variants of it are used widely in game playing programs.

Be it, chess, or, or other ones. And, and, and how does this work? So this

is a busy, a busy slide and don't be daunted, we'll explain it leisurely.

also the, some of you may have not seen algorithms before but we'll explain the

algorithm in very plain terms. Before we do, let's just first give the

intuition. The intuition is very straight-forward.

What we are trying to do is associate a value of the game not only with the leaf

nodes, with leaf nodes we know what the value is.

It is simply defined by the. Payoff vector associated with the leaf

node is part of the game definition. But suppose we wanted to go to the root

of the node, or any other internal node, and say so what really is the payoff

associated here assuming that agents will play a subgame perfect equilibrium? And

that's the goal of this procedure called Backward Induction.

And the situation is very straight forward.

We'll go to the leaf and back up slowly. If at every step of the, of the way.

assuming the agent will maximize. take an action then maximize their

payoff. at that node.

And so that's intuition. Now let's see how it's done formally.

The procedure is called Backward Induction, and it takes one argument, a

node, a node in a tree, any node. It could be the root, it could be a leaf,

or anything in between. And, of course, every node has some

player associated with it, and just anticipate what we'll encounter shortly.

Row of h, will be the player associated with that node h.

And, when the procedure returns it'll give us, those pay-offs, eh.

Pay-offs of, us, to have all of the agents, associated with that.

Nodes. So how did that work? And again remember,

remembering our intuition eh, we say the following.

If h is a leaf node, Z is a set of leaf nodes.

Ifs h is a leaf node, then we simply return The path vector as defined by the

game. That's where the recurrsion bar comes

out. Most of the work of course happens in the

recurrsive step when we're not at the leaf node.

So for that we do the following. We will keep a A variable called

best_util, and best_util will be a vector, a vector of payoff associated

with the agents each one, one with each agent.

And we will be updating that, that vector as we, as we go along.

So to start out with we'll assume the payoffs are all terrible.

Let's call that minus infinity. a payoff smaller than all possible

payoffs in the game. And then we do the following.

We will look at all the actions. Available at that node.

So [INAUDIBLE] of h is a set of all actions available at that node.

A would be an example of such an action. So, well take each action, in turn one at

time, and do the following. We will look at the child you reach by

taking action a at that node h. That's called sigma h of a.

So sigma of h of a is simply another node, 1 of the children of node h that

you arrive at by taking that particular action a.

And we will recursively look at that vector associated with that child and we

will, keep it at that var-, at this variable

called util.at.child. So we have 2, 2 vectors, notice.

Best.util and util.at.child. Best.util is the best we found so far.

Best for a particular agent. And util_at_child is what we found at a

particular child will be going over all the children one at a time and if ever

the util_at_child is better for the agent than the best.util so far.

We'll update the best_util. That's what's going on here.

Here. So this is what this says.

It says util_at_child is a vector. So look at the element of that vector

corresponding to the agent. We care about the agent in, node h.

If the util_at_child is better for that agent.

Given the best.util, what you've found so far is simply updated.

Update best.util to be util.at.child other wise leave it unchanged and so in

this way we're cycling through all the children and finding from that agent

corres- that to whom [UNKNOWN] from his point of view Which of all the vectors

are bests and again the intuition is you will take the action leading to that

child and updating that that vector accordingly and that's why, when we're

done, we're returning the best we found so far called best.util.

That is the Backward Induction procedure. And notice that we don't have to return a

stradegy, just return the the list of payoffs.

And, in some sense it's you can think of it simply extending the pay off from the

nodes to all the internal nodes. But even though we don't explicitly

return the equilibrium strategy. The one that'll be sub game perfect.

It's easy to read it off, those numbers. Because, at every node, the agent will

take the action that leads it to the nodes.

The child node with the best utility. From his point of view.

So, that is the subgame perfect, That is a procedure for computing subgame

perfect equilibria. The Backward Induction.

And if we look at the special case of zero subgame, it's simplified a little

bit. Because then, there are only 2 players.

And the payoff for one is minus the payoff to the, to the other.

So really, we only need to keep track of 1 number associated with each, with each

node. And so, there's less bookkeeping to be

done. And furthermore,

In, such zero sum games. And all win lose games are, are, are zero

sum. Game, for example chess.

There is a way to speed up the Backward Induction procedure, by the way, in the

zero-sum games we simply call it the minimax procedure, because we alternate

between minimizing and maximizing the value.

One player wants to minimize it, the other to maximize it and in fact there's

a way to speed up the procedure and we won't go into it here, but the intuition

is that as you're visiting a certain child, children of a given node, you may

find out that at that point there's no, no need.

To explore the remaining children of that node as we did in the Backward Induction

procedure, because intuitively it won't matter.

You've already found a value that means that this node that you've examining, the

current node will never be visited. And it's called the alphabetic al,

alpha-beta pruning procedure an optimization of the Backward Induction or

the minimax procedure for zero-sum games, and you are invited to explore it

elsewhere. There's one more thing I want to say in

connection with Backward Induction, and in fact The sub game perfetion and and

sort of two different things here and they're all keyed off.

The same example, the famous example of the centipede game, so this well known

example you have two players, they alternate turns.

We have player one moving And then player two moving, and then player one again,

then player two, and so on and so forth. But the payoffs are constructed in a

contrived way, so that they're gradually increasing.

And, you can imagine, its called centipede, because you can imagine rather

than having only Five legs, here you have a hundred.

They slowly arise so that the payoffs here are much smaller than the payoffs

here. And indeed very much so if you keep

going. But even though they rise, they are

contrived in a way that lead to only one sub game perfect equilibrium.

Players will defect in every place. Their [INAUDIBLE] will go down in every

place here. So the only outcome, subgame perfect

outcome is this one, where the first player defects immediately.

It goes down immediately. Which is of course similar to the

prisoner's dilemma, is a little counter-intuitive because had they only

had the good sense to keep going they would get, have got to something in the

ballpark of this or this, both of which are much better for both than here.

But nonetheless, when you examine you see that there's only one uhs, one, one, one

sub game perfect equilibrium here and fact one, only one equilibrium outcome,

namely this one. And you can see it by doing again the

back one deduction production procedure. If the game progressed and in fact

reached this node, what would player 1 do? Well, they would go down getting a 4

rather than a 3. Player 2 knows this.

So knowing that player 1 would go down, he'd rather go down because he would get

a 4 rather than a 3. Similarly here, player 1 knowing that

player 2 would go down. Elects to go down here because they will

get three rather than two and so on. And this is really the Backward Induction

argument. So on the one hand clearly a clear

account of what will happen in this game but there are two problems.

One of them is sort of simply experimental and common sense and the

other is, more theoretical. On the pragmatic level, if common sense

simply tells you, this is not what's, what's going to happen.

The players will cooperate for a while. Until, at some point, in fact, somebody

would go down in the game. They know there's so much to gain by

going forward. They would, if you wish, take the chance.

And this intuition is for now repeatedly in experiments.

People do co, cooperate for a while until in fact eventually they, they defect.

So that's a problem for theory. But there's also a, another problem

that's theoretical in nature. [INAUDIBLE].

So we know that the only subgame perfect equilibrium is one where agents defect.

Go down at any point in time. Now, imagine that the game starts, and

player one goes across. Does not go down.

What does player two do? Well, on the one hand, you could say, well.

The only sub game perfect equivalent/g to go down, they should go down, because of

those [INAUDIBLE] Backward Induction argument and it'll tell them that the

best thing for them is to go down. But that same argument told them player

two that player one would of gone down right off the bat, but they didn't.

So, Maybe they won't go down again. But what will they do? And fundamentally

what happens here is you an event of going across that the standard theory

tells you Will happen with zero probability.

How do you condition on a ca, an event that had a zero probability prior prior?

there's there's, there's a big literature on this.

It's a very interesting deep issue[UNKNOWN] in game theory.

We will not delve into it any deeper, but it's interesting to know.