We've show how probabilistic graphical models can be used for a variety of inference tasks like computing conditional probabilities or finding the map assignment. But often the thing that you actually want to do with a probability distribution is make decisions in the world. So, for example, if you're a doctor encountering a patient, it's not just enough to figure out what disease the patient has. Ultimately you need to decide what treatment to give the patient. How do we use a probability distribution and specifically a probabilistic graphical model in order to make the decisions? It turns out that the theoretical foundations for doing this kind of inference. Pass were actually established long before probalistic graphical models came to the fore work and that framework is called the framework of maximum expected utility. So lets formulate the problem that we are trying to solve and then we can define the principle of M E U or maximum expected utility. So what are the simple decision making situation be comprised of? Well we have a set now, it's no longer a purely probabilistic model we have other elements in it. We have an action A which has several different choices that we might pick, for example different drugs that we might choose to give the patient. We have a set of states of the world X1 of the XN. And the action can affect which of those states comes about. So this set of states might include those things that I can't affect like the patient's pre-disposing factor of which disease they have or which test results came out which way. But it can also include things like the outcome that that the patient will have after I administer the drug. So this, this set of variables that defines the state can involve things that I can affect and things that I cannot affect. And then finally, the final piece of the puzzle here is something that defines the agent's preferences. That is they cannot make preferences that weigh different, that consider different actions without weighing the relative merits of different actions. And my way of evaluating the merits of different actions is to have some kind of numerical utility that tells me if I take action A in this, and I have this particular state of the world, how happy am I? And. Very happy, mediat, moderately happy, or extremely unhappy, so this numerical utility function is going to be thing that's going to allow me to evaluate different actions in terms of how much I prefer them. So, now I can go ahead and write down the notion of an expected utility for a decision, problem D, given an action A. And, the way in which that's done, is a simple expectation. I sum up over all possible states of the world. The probability that I get that state, given the action, and remember, not everything in that action might, everything in that state might be affected by the action, and then I multiply that by the utility of the state action pair. And I sum up over all possible states that I might end up with. And, that summation, that expectation, is the overall happiness that I have on average, in this decision making situation, if I take the actioning. Clearly, I want to maximize my overall happiness. And so the purpose of the maximum expected utility computation. The principle of maximum expected utility, maximum expected utility is that I want to choose the action A that maximizes this expected utility. We can use that ideas that we developed in the context of graphical models to represent decision making situations in a very interpretable way. So this is a structure called an influence diagram and it's an extension of a Bayesian network with two additional pieces. We have in addition to random variables which are denoted as usual as ovals. So here's a random variable. And a random variable is part of my state X. We also have two other kinds of nodes in this diagram. We have, action variables. Which are places where it's not nature that gets to make a choice over the value of the variable. But rather the agent that gets to pick. And then we have the green diamonds at the bottom which represents utility. So, let's look at this particular example, which is probably one of, the simplest thing that you can construct. this is a, budding entrepreneur, who just graduated from college and wants to decide whether to found a widget making company or not. So he has two possible actions, F0 which is do not found and go take a more conventional job. And this is found the company, which of course is a much more risky strategy. We have a random variable, which is a market for widgets. And, this is a poor market, moderate nad the great market. And so now we can and we have a probability distribution over those different values. The utility of the agent obviously depends on both whether he decides to found the company, and on the market. And we're denoting that by having both of these be parents of the utility variable. So the parents of the utility variable, or the utility function, are the attributes or the variables, on which the agent's utility depends. So, in this case, our utility has two arguments F and M. And we can see that if the agent decides not to found the company. Then the utility is zero, regardless of the market conditions for widgets because at that point, you don't really care. And if the agent does decide to found the company, and the market for widgets is poor, then his utility is -seven. Say he invests a bunch of money, and ends up losing all of it. If the market is his utility is five. And if the market is great, his utility's twenty. Okay? So now we can go ahead and compute expected utility for the two action choices. So in this case the expected utility of, S0 is 0 and the expected utility of F1 is going to be equal to 0.5 * -7 + 0.3 * 8 + 0.2 * 20. Which, if you add it all up, is equal to 2 so obviously in this case, the optimal action, is to found the company. Let's look at a more complicated influence diagram that involves more random variables. And also other richer structures so this is an elaboration of our student network. it has the variables, difficulty and intelligence. The grade, the quality of the letter. And, in this case, also a variable that represents the, the job. A student's job prospects which we're going to assume depends both on the quality of the letter, as well as on the student's grade because the recruiter has access to the student's transcript. The student has one decision in this example, which is whether to study in the course or not which is going to probably affect their grade. And so we have the B study variable, the study action affects the distribution that the student has over grades. Now, in this case you'll notice that we have three different utility diamonds VG, VS, and VQ. So what are these what do these represent? They represent different components of the utility function, so in this case I've broken down the utility function into constituent pieces. This one represents the happiness with the grade by itself. This is the utility that you get from saying, Yeah, I got an A in the course, or ooh not so good, I got a C. So that's one piece of your utility function. A second piece of your utility function is your overall career aspirations which is the value of getting a good job, versus not getting a good job. So that's this piece, over here. And the final one is the component that is ascribed to the quality of life, that you have during your studies. So if you don't study maybe you can go and play ultimate Frisbee and go to the movies every night. and so that might give you more happiness than sitting there hitting the books every evening. And so we're going to assume that particular utility depends on how much you study and on how difficult the course is. Because maybe it's more fun to study if it's easy. And we're going to assume that your overall utility, that the agent's overall utility, is the sum, of these different components. And this is called the decomposed utility function, and we'll come back to that later. Now, you might wonder why I would bother decomposing the utility function instead of having just a single big monolithic utility function that is affected by all of these different factors. Well, that is exactly why because it is, in fact, affected by all of these different factors. So, whereas, here, I have a utility function that has one argument. And another one has one argument and one has two arguments. If I had put them all together, I would have a single monolithic utility function that has four arguments. And that would be much harder to elicit because of the exponential combinatorial growth in the number of possible combinations that I need to consider. So this is a decomposition of a function. In this case, the utility function, into factors in exactly the same way that we, for example, like to decompose a probability distribution. As a product of factors so another interesting extension of the influence diagram representation allows us to capture is the notion of the information available to the agent when they make their decision. So, let's look at this example over here, which elaborates our entrepreneur example. And here, we come in with the assumption that the entrepreneur has the opportunity to conduct to survey regarding the overall market demand for widgets before deciding whether to found a company. Now the survey is not entirely reliable. And so, over here, we have the CPD, that tells us. How likely different survey results are to come up, given different values of the true market demand. Now the point is that, having conducted the survey, the agent can base his decision on the value of the survey. We denote that with the presence of an edge such as this which indicates that the. Founder. That the entrepreneur can make different decisions, depending on how the value of the survey turns out. So from a formal perspective, that means that what the agent's allowed to is to, choose a decision rule, which we're going to denote delta, at the action node. So the, and what a decision rule is, is effectively a CPD. It tells the agent given an assignment of the values to the parents of A, which are the val, the variables that the agent can observe prior to making a decision. The agent, based on that, can decide on the probability distribution of the action that it takes in that particular situation. So here for example we would have a CPD which tells us the probability of founding a company, which in this case is a binary valued variable, given the three possible values of the survey variable. So you might say, why would I need to have a CPD? why would the agent ever want to make a random choice between two different actions? And it turns out, in fact, that in a single agent decision making situation. There's no value to allowing that extra expressive power, because of the terministic CPD would work just as well. Nevertheless for reasons that will become clear on the next slide, it's actually useful to think about this as a CPD, even though in most cases it'll actually be the terministic at reasoning samples that we're talking about. So now that we've given the agent that expressive power how do you formulate the decision problem that they need to make? That is what do they get to choose and how do they choose it? So given a decision rule delta for an action variable A if we inject that into the decision network now all of the variables in our network, both A and the remaining variable X, variables X all have a CPD associated with them. So now we have effectively defined a joint probability distribution. Over, all the variables X union. The variable A because each of them has a CPD, and so this. Is, the, this probability distribution. And the agent expected utility, once they've selected the decision rule delta A is simply this expectation over here. We average out over all possible scenarios, where scenarios and assignment to the chance variables X as well as to H is action A. And but we're averaging out is the agent's utility in that scenario. And so that's the overall expectation that the agent prior to anything going on can expect to gain given a decision rule delta A. So obviously the agents want, the agents wants to choose the decision rule delta A. That is going to give him the maximum value for this expression, that denotes his expected utility. And so this is the optimization problem that the agent is trying to solve. So how do we go about finding the decision rule that maximizes the agent's expected utility? Well, let's look at this first in the context of the simple example, and then we'll do the general case. So over here, we see, once more, the expected utility equation for a given decision rule. And now we're trying to optimize this delta A. So let's write down the expected utility in this particular example. So the bayesian network in this case, has now two original CPD's. There is P of M, that comes in from the M variable. There's P of S given M, which comes out from the survey variable and we have the CPD that is as yet to be selected, which specifies the decisional at the at the action F. And then multiplied with all that is the utility which depends on the decision to found the company. And on the true market value. Well, this should look awfully familiar. It is simply a product of factors. Some of these factors are probabilistic factors. And one of them, the U, is a different numerical factor. But it's still a factor which depends on a scope of two variables, in this case F and M. So now we can go ahead and apply the same kinds of analysis that we've done in previous computations. Specifically we can, since we're trying to optimize over this expression over here the delta F of F given S. We're going to push in everything that we don't currently need to manipulate, which is just F and S which are the two things that appear in the factor, the decision role. And so specifically we're going push in M and we with in the summation, over M, all of the factors. that depends on M. And so, that gives us a sum over M, P of M, P of S given M, U of F, comma M. And if we marginalize out M, what we end up with, if we multiply these three factors and marginalize that M, is a factor that we're going to call mu of FS. And the reason that it's called mew is to suggest that it has utility component U, so mew and U. And now we have actually a fairly simple expression that the agent is trying to optimize. It's a summation over all possible values of S and F over here, of the Delta, the decision rule that the agent takes over F, given the situation as given the observation, multiplied by this factor that I just computed. And now if our goal is not to optimize this expression, what, the agent ought to do is to pick the variance factor, U of FF, the highest value. the S that gives it the highest value in each circumstance S. So this is a little bit of strat so let's look at this in the context of an actual numerical example to see this. So here are the three factors that we have in this network to begin with. This is the CPD for M over here. The CPD for S given M in the middle, and here is the utility factor U, it comes from here. And now if we're going to go through this expression, through this computation, and we're going to compute. This expression over here, this is what we get. It's a factor that gives us this value for each value of the variable of the action F. And each value of the survey variable S. And if we want to maximize the summation. And we look at this factor over here, it seems clear that what's going to maximize the summation is if we put full weight on F0 in the case F0, which if you think about, if you look at the CPDs at the case where the survey suggests that the market demand is bad, and at the same time in the other two cases we're going to choose the action to found the company. And if you look at this, you can see that no other choice of delta is going to give us a higher value for this summation than this deterministic choice that chooses in the case, S0, we choose S0, and in the case S1, and S2, we choose the [INAUDIBLE] And so this is the optimal decision rule. And if you ask, what is the expectation what is the agent's expected utility with this strategy before it does anything, before it, before the survey is conducted. Well, it's just going to be the sum of 01.15+2.1, + 1.15 + 2.1 which sums up to 2.25, as the overall expected utility, sorry, 3.25. As the agent's overall expected utility in this case. Now, if you think back to the agent's expected utility, when you didn't have the information about the survey. That expected utility, if you recall was 2 so without survey the agent's best map expected utility was equal to 2. So by conducting the survey, the agent has gained 1.25 utility points. So, let's generalize this to arbitrary decision diagrams. We're going to focus on the moment on single utility nodes and single action nodes. The more general case is a little more complicated and we're not going to go through, but the principles are very similar. So, once again, going into this example, we see that Delta A, the decision with A, defines a joint distribution. Over all of the Chancion, and the Accuin, and this is multiplied by the utility under this scenario. And so we can open up the expected, the, the probability distribution piece of delta A as a product of two types of CPDs. We have the original CPDs which is simply the product over I, P of XI given its parents and then we multiply that by the decision role delta A. And we're going to use Z just as a shorthand to denote the parents that is the observations that the agent makes prior who deciding on A. So we can do the exact same moving in of factors inside and outside of the summation, we're going to marginalize out everything, except. The factors, except the variables that appear in this factor, delta A. So, everything's of A and it's parent, Z and we're going to sum out all of the remaining variables, which, for simplicity, we're going to denote W. So this is summing up over all variables other than A and Z of this product of factors including. Utility variable. Utlity factor. And, so we're going to get a utility factor just as we did before which is a function of a and it's parents. And we want to optimize the decision rule so as to maximize the possible summation over here. And once again, the optimal decision rule is just the one that picks for every combination of the parent z, the action that gives us the maximum entry in this factor. So delta star of az is something that gives us a probability of one. This is, we start with probability one. The action that gives us the, the argamax action for this factor. And, and the other action is selected with probability zero in this conditioning case for Z. So summarizing how this algorithm goes, to compute the maximum expected utility and find the optimal decision rule at an action A, we take the following steps. We treat A as a random variable with, at this point, an unknown CPD. We introduce an additional factor beyond the CPDs which is utility factor who's scope is the variables that the that the utility depends on, so now we have, as usual, a collection of factors. We now do, effectively, a variable elimination algorithm. So this is plain, good old, variable elimination, to eliminate all variables except A and A's parents, which produces a factor mu of AZ. And then, for each parent assignment, Z to the variable A, we choose the action that maximizes over A within that factor. So, summarizing this the MEU principle provides us with a rigorous foundation for decision making under uncertainty. Building on that, the framework of probabilistic graphical models allows us to provide a compacter presentation for complicated, high dimensional decision making settings where we have multiple chance variables, multiple actions, and utilities that depend on multiple variables. We can now build on inference methods in problistic graphical models, specifically in this case the variable elimination algorithm to both find the optimal strategy and figure out the overall value to the agents of a particular decision situation. And where we didn't talk about this these methods can also be extended through richer scenarios where we have multiple utility components as in the example we gave. As well as multiple decisions an agent makes in a sequence.