0:00

In this video, we're going to look at how to define utility in infinitely repeated

games. So, remember that the way that an infinitely repeated game works is that we

have some stage game, which is a normal form game, and the players repeatedly play

the same game over and over again. And what that means is, each player gets a

sequence of payoffs, let's say, Player I gets the sequence R1 in the first

repetition, R2 in the second repetition, R3, just on and on infinitely. So, we have

this infinite sequence of real values which are the payoffs that this player has

gotten. But if we want to reason about this game, we can't really reason using

utility theory about an infinite sequence. We, instead, have to take this sequence

and turn it into a single number that talks about the utility that the player

has for having played this sequence. And so, how do we do that? What's the right

way of thinking about that? So, the first thing to notice is that previous things

that we've learned about Game Theory aren't going to be sufficient to answer

this question. So, you might wonder if we can take this infinitely repeated game and

just write it in extension form. And, of course, we can't. And the reason is that

the extensive form would be infinitely deep. We would never get to a leaf node

where we could write a payoff. And so, that won't help us. You might also just

wonder, can we sum up the sequence of payoffs and just say that my utility is

the, the, the sum of these values. And the problem is, that, that sum can be

unbounded. Because, if for example, every payoff I get is positive, then I'm going

to have some unbounded amount of, of utility at the end. And so that, that's

not going to work for me, I want to have finite utilities.

So, instead there are two canonical ways that this gets defined. And I'll tell you

about both of them in this video. So, here's the first one. so the first thing

says, let me look intuitively at my average payoff over the sequence. Now, the

average payoff of a sequence is als o not well defined because, of course, the way I

take an average is I sum everything up, and then I divide it by the number of

things. And we've already seen that the sum could be unbounded, the number of

things is unbounded as well, so I would have infinity divided by infinity, which

wouldn't help me out here. But, what I can do instead is to look at the limit of

finite averages as the averages get longer and longer. So, what I can say is let me

look at an average over, over the first k things in my sequence. And then, let me

take the limit of this average as k goes to infinity. And, it turns out actually

technically, this isn't always well defined. It's almost always well defined

and there's an easy fix we can do to this definition that I've left out here to keep

it from getting too technical. But, in cases where this is well defined this is

the right thing to do. So, and, and everything we'll talk about in this

course, it'll be well defined. So, so this is the defined as the average reward that

the player gets over the infinite sequence. So this gives us one number. The

reason we have a second definition is that there's something kind of counter

intuitive about the average reward. Let me put it back up. The reason that it's kind

of counter intuitive is that, if I get some bad payoff for a finite amount of

time, let's say, for the first 100,000 iterations, I get a payoff of negative a

million. And then, for the rest of time after that, I get some good payoff. Let's

say, one unit of utility. Then, the limit of the means would be one. Because the,

the negative payoff that I got at the beginning is only for a finite amount of

time and it washes out in the average if I go long enough out into the future. And,

well, that's what the math says. But, that doesn't always model what we want to, to

really reason about because we have an intuition that payoffs that you get early

on are kind of more important than pay-offs that you get really far into the

future. So, if we want to have a model of ut ility that has that property, we need

to say that different payoffs matter differently. So, it's more important to me

to get a good payoff in the first iteration than to get one in the millionth

iteration. And the way that I can model that is by saying, my payoffs are

multiplied by some discount factor. So, my discount factor talks about my value for

payoffs at different times. So, my discount factor, beta, is some value

strictly between 0 and 1. And, you can sort of think of it like an interest rate.

It's saying, you know, with money, if I wanted to tell somebody that I'm going to

pay them $100 in a year they would value that at less than $100 today. and so and,

and the amount by which they would value it less today kind of corresponds to the

interest rate. and that's kind of exactly what's going on in the math here. So, what

I'm saying here is that my utility for this stream of payoffs this stream of r's,

this stream of payoffs, is weighted by the discount factor to the power of which

payoff in the sequence it is. So, I, I'm going to discount each payoff

successively. So, the first one is going to have the discount factor applied once.

The second one is going to have the discount factor applied twice, so I'm

going to get the discount factor squared and so on, all the way through the

sequence. So, each of them is going to be diminished, but each of them is still

going to matter. And there are two ways we can think about what the discount factor

means. So, the first is kind of the interpretation that I've been telling you

so far. That the agent just cares more about the near term than the long term.

there 's another definition which is different but mathematically the same, so

it's interesting to think about. And that is, that the agent really is the agent we

just talked about in the average reward case, cares just about as much as every

payoff. But with some probability, the probability actually 1 minus beta, the

game will end in every given round. So, our game is not necessarily infinitely

repeated, it's sort of potentially infinitely repeated. But every time we

play the game, we're going to flip a coin. And with probability 1 minus beta, the

game is going to just end. And with probability beta, the game is going to

continue. And what that means is that here we'd be talking about my expected reward

in the game, because there's a beta chance that I'll go to the next round. There's a

beta squared chance that I'll go 2 rounds forward. There's beta cubed chance I'm

going to go 3 rounds forward and so on. So, that means my expected utility in this

game would, would just be the same formula.

And that's it for defining utility in these games. Thanks very much.