0:00

Let us know speak about Backward Induction.

Â And Backward Induction can be viewed as a way of computing the subgame perfect

Â equilibrium of a game. It's a procedure that's used, widely.

Â Or variants of it are used widely in game playing programs.

Â Be it, chess, or, or other ones. And, and, and how does this work? So this

Â is a busy, a busy slide and don't be daunted, we'll explain it leisurely.

Â also the, some of you may have not seen algorithms before but we'll explain the

Â algorithm in very plain terms. Before we do, let's just first give the

Â intuition. The intuition is very straight-forward.

Â What we are trying to do is associate a value of the game not only with the leaf

Â nodes, with leaf nodes we know what the value is.

Â It is simply defined by the. Payoff vector associated with the leaf

Â node is part of the game definition. But suppose we wanted to go to the root

Â of the node, or any other internal node, and say so what really is the payoff

Â associated here assuming that agents will play a subgame perfect equilibrium? And

Â that's the goal of this procedure called Backward Induction.

Â And the situation is very straight forward.

Â We'll go to the leaf and back up slowly. If at every step of the, of the way.

Â assuming the agent will maximize. take an action then maximize their

Â payoff. at that node.

Â And so that's intuition. Now let's see how it's done formally.

Â The procedure is called Backward Induction, and it takes one argument, a

Â node, a node in a tree, any node. It could be the root, it could be a leaf,

Â or anything in between. And, of course, every node has some

Â player associated with it, and just anticipate what we'll encounter shortly.

Â Row of h, will be the player associated with that node h.

Â And, when the procedure returns it'll give us, those pay-offs, eh.

Â Pay-offs of, us, to have all of the agents, associated with that.

Â Nodes. So how did that work? And again remember,

Â remembering our intuition eh, we say the following.

Â If h is a leaf node, Z is a set of leaf nodes.

Â Ifs h is a leaf node, then we simply return The path vector as defined by the

Â game. That's where the recurrsion bar comes

Â out. Most of the work of course happens in the

Â recurrsive step when we're not at the leaf node.

Â So for that we do the following. We will keep a A variable called

Â best_util, and best_util will be a vector, a vector of payoff associated

Â with the agents each one, one with each agent.

Â And we will be updating that, that vector as we, as we go along.

Â So to start out with we'll assume the payoffs are all terrible.

Â Let's call that minus infinity. a payoff smaller than all possible

Â payoffs in the game. And then we do the following.

Â We will look at all the actions. Available at that node.

Â So [INAUDIBLE] of h is a set of all actions available at that node.

Â A would be an example of such an action. So, well take each action, in turn one at

Â time, and do the following. We will look at the child you reach by

Â taking action a at that node h. That's called sigma h of a.

Â So sigma of h of a is simply another node, 1 of the children of node h that

Â you arrive at by taking that particular action a.

Â And we will recursively look at that vector associated with that child and we

Â will, keep it at that var-, at this variable

Â called util.at.child. So we have 2, 2 vectors, notice.

Â Best.util and util.at.child. Best.util is the best we found so far.

Â Best for a particular agent. And util_at_child is what we found at a

Â particular child will be going over all the children one at a time and if ever

Â the util_at_child is better for the agent than the best.util so far.

Â We'll update the best_util. That's what's going on here.

Â Here. So this is what this says.

Â It says util_at_child is a vector. So look at the element of that vector

Â corresponding to the agent. We care about the agent in, node h.

Â If the util_at_child is better for that agent.

Â Given the best.util, what you've found so far is simply updated.

Â Update best.util to be util.at.child other wise leave it unchanged and so in

Â this way we're cycling through all the children and finding from that agent

Â corres- that to whom [UNKNOWN] from his point of view Which of all the vectors

Â are bests and again the intuition is you will take the action leading to that

Â child and updating that that vector accordingly and that's why, when we're

Â done, we're returning the best we found so far called best.util.

Â That is the Backward Induction procedure. And notice that we don't have to return a

Â stradegy, just return the the list of payoffs.

Â And, in some sense it's you can think of it simply extending the pay off from the

Â nodes to all the internal nodes. But even though we don't explicitly

Â return the equilibrium strategy. The one that'll be sub game perfect.

Â It's easy to read it off, those numbers. Because, at every node, the agent will

Â take the action that leads it to the nodes.

Â The child node with the best utility. From his point of view.

Â So, that is the subgame perfect, That is a procedure for computing subgame

Â perfect equilibria. The Backward Induction.

Â And if we look at the special case of zero subgame, it's simplified a little

Â bit. Because then, there are only 2 players.

Â And the payoff for one is minus the payoff to the, to the other.

Â So really, we only need to keep track of 1 number associated with each, with each

Â node. And so, there's less bookkeeping to be

Â done. And furthermore,

Â In, such zero sum games. And all win lose games are, are, are zero

Â sum. Game, for example chess.

Â There is a way to speed up the Backward Induction procedure, by the way, in the

Â zero-sum games we simply call it the minimax procedure, because we alternate

Â between minimizing and maximizing the value.

Â One player wants to minimize it, the other to maximize it and in fact there's

Â a way to speed up the procedure and we won't go into it here, but the intuition

Â is that as you're visiting a certain child, children of a given node, you may

Â find out that at that point there's no, no need.

Â To explore the remaining children of that node as we did in the Backward Induction

Â procedure, because intuitively it won't matter.

Â You've already found a value that means that this node that you've examining, the

Â current node will never be visited. And it's called the alphabetic al,

Â alpha-beta pruning procedure an optimization of the Backward Induction or

Â the minimax procedure for zero-sum games, and you are invited to explore it

Â elsewhere. There's one more thing I want to say in

Â connection with Backward Induction, and in fact The sub game perfetion and and

Â sort of two different things here and they're all keyed off.

Â The same example, the famous example of the centipede game, so this well known

Â example you have two players, they alternate turns.

Â We have player one moving And then player two moving, and then player one again,

Â then player two, and so on and so forth. But the payoffs are constructed in a

Â contrived way, so that they're gradually increasing.

Â And, you can imagine, its called centipede, because you can imagine rather

Â than having only Five legs, here you have a hundred.

Â They slowly arise so that the payoffs here are much smaller than the payoffs

Â here. And indeed very much so if you keep

Â going. But even though they rise, they are

Â contrived in a way that lead to only one sub game perfect equilibrium.

Â Players will defect in every place. Their [INAUDIBLE] will go down in every

Â place here. So the only outcome, subgame perfect

Â outcome is this one, where the first player defects immediately.

Â It goes down immediately. Which is of course similar to the

Â prisoner's dilemma, is a little counter-intuitive because had they only

Â had the good sense to keep going they would get, have got to something in the

Â ballpark of this or this, both of which are much better for both than here.

Â But nonetheless, when you examine you see that there's only one uhs, one, one, one

Â sub game perfect equilibrium here and fact one, only one equilibrium outcome,

Â namely this one. And you can see it by doing again the

Â back one deduction production procedure. If the game progressed and in fact

Â reached this node, what would player 1 do? Well, they would go down getting a 4

Â rather than a 3. Player 2 knows this.

Â So knowing that player 1 would go down, he'd rather go down because he would get

Â a 4 rather than a 3. Similarly here, player 1 knowing that

Â player 2 would go down. Elects to go down here because they will

Â get three rather than two and so on. And this is really the Backward Induction

Â argument. So on the one hand clearly a clear

Â account of what will happen in this game but there are two problems.

Â One of them is sort of simply experimental and common sense and the

Â other is, more theoretical. On the pragmatic level, if common sense

Â simply tells you, this is not what's, what's going to happen.

Â The players will cooperate for a while. Until, at some point, in fact, somebody

Â would go down in the game. They know there's so much to gain by

Â going forward. They would, if you wish, take the chance.

Â And this intuition is for now repeatedly in experiments.

Â People do co, cooperate for a while until in fact eventually they, they defect.

Â So that's a problem for theory. But there's also a, another problem

Â that's theoretical in nature. [INAUDIBLE].

Â So we know that the only subgame perfect equilibrium is one where agents defect.

Â Go down at any point in time. Now, imagine that the game starts, and

Â player one goes across. Does not go down.

Â What does player two do? Well, on the one hand, you could say, well.

Â The only sub game perfect equivalent/g to go down, they should go down, because of

Â those [INAUDIBLE] Backward Induction argument and it'll tell them that the

Â best thing for them is to go down. But that same argument told them player

Â two that player one would of gone down right off the bat, but they didn't.

Â So, Maybe they won't go down again. But what will they do? And fundamentally

Â what happens here is you an event of going across that the standard theory

Â tells you Will happen with zero probability.

Â How do you condition on a ca, an event that had a zero probability prior prior?

Â there's there's, there's a big literature on this.

Â It's a very interesting deep issue[UNKNOWN] in game theory.

Â We will not delve into it any deeper, but it's interesting to know.

Â