0:00

Look at an example of an actual network and try to see what the CPDs looks like,

Â what behavior we get and how we might augment the network to include additional things.

Â Now, let me warn you right up front that this is a baby network;

Â it's not a real network,

Â but it's one that seems full to - it's compact enough to look at,

Â but still interesting enough to get some non-trivial behaviors.

Â So to explore the network,

Â we're going to use a system called SAMIAM.

Â It was produced by Adnan Darwiche in his group at UCLA.

Â And it's nice because it actually works on all sorts of different platforms.

Â So it's usable by pretty much everyone.

Â So let's look at a particular problem.

Â Imagine that we're an insurance company and we're trying to

Â decide for a person who comes into the door whether to give them insurance or not.

Â So the operative aspect to making a decision is how much the policy is going to cost us.

Â That is, how much we're going to have to pay over

Â the course of a year to insure this person.

Â So there is a variable called cost.

Â Let's click on that to see what properties that variable have.

Â And we can see that in this case,

Â we've decided to only give two values of the cost variable; low and high.

Â This is clearly a very coarse-grained approximation

Â and not one that we would use in practice.

Â In reality, we would probably have this be a continuous

Â variable whose mean depends on various aspects of them all.

Â But for the purposes of illustration,

Â we're going to use this discrete distribution that only has two values,

Â low and high. Okay.

Â So now let's build up this network

Â using the technique of expanding the conversation that we've discussed before.

Â And so what is the most important determining factor

Â as to the cost that the insurance company has to pay?

Â Well, probably whether the person has accidents and how severe they are.

Â So here we have a network that has two variables.

Â One is accident and one is cost.

Â And in this case,

Â we've decided to select three possible values for the accident variable;

Â none, mild and severe.

Â And with the percentages that you see - with the probabilities that you see listed.

Â And what you see down below is the cost variable.

Â Let's open the CPD of the cost variable given the accident variable.

Â And we can see that in this case,

Â we have a conditional probability table of accident given - sorry,

Â of cost given accident.

Â Note that this is actually inverted from

Â the notation that we've used in the class before because here,

Â the conditioning cases are columns;

Â whereas in the examples that we've given, there have been rows.

Â But that's okay. The same thing - it's the same thing, just inverted.

Â And so we see, for example,

Â that if the person has no accident,

Â the costs are very likely to be very low;

Â mild accidents incur different distribution over cost;

Â and severe accidents have a probably of 0.9

Â of having high costs and 0.1 of having a low cost.

Â So now let's continue extending the conversation and ask what accident depends on?

Â And it seems that one of the obvious factors

Â is whether the person is a good driver or not,

Â so we would expect driver quality to be apparent to the accident.

Â But there's other things that also affect not just the presence of an accident,

Â but also the severity of the accident.

Â So for example, vehicle size would affect

Â both the severity of an accident because if you're driving a large SUV,

Â then chances are you're not likely to be in an accident as severe;

Â but it might also perhaps increase the chance of having an accident overall

Â because maybe driving a large car is harder to handle.

Â And then the career might affect the chances of

Â an accident because of the presence or absence of certain safety features,

Â like anti-lock brakes and airbags.

Â So let's open the CPD of accidents and see what

Â that looks like now that we have all these parents for it.

Â And we can see here that we have, in this case,

Â eight conditioning cases after - would

Â correspond to the three variables, two values each.

Â And so here, just to look at one of

Â the sample - just an example distribution, for example.

Â So if this is a fairly new vehicle,

Â after 2000 and it's an SUV,

Â the probability of having a severe accident is quite low,

Â the probability of having a mild accident is moderate and

Â the probability of having no accidents is 0.85;

Â whereas if you compare that to the corresponding entry

Â when we keep everything fixed except that now it's a compact car,

Â we see that the probability of having a mild accident is lower,

Â but the probability of having no accidents is higher,

Â representing different driving patterns, for example.

Â Okay. So with this - with this network,

Â we can now start asking simple questions.

Â So is this an example of causal inference?

Â Let's instantiate, for example,

Â driving quality to be good - and bad.

Â And we can see that with - for a bad driver,

Â the probability of cost is 81 - low cost is 81 percent and for good driver,

Â the probably of low cost is 87 percent.

Â If we look at the accidents,

Â we can see that for a good driver,

Â there is the probability of 87-and-a-half percent

Â of no accidents and 10 percent of mild accidents.

Â And the probability of no accident goes down for

Â a bad driver and mild accident goes up and severe accidents also goes way up.

Â Now note that many of theses differences are quite subtle.

Â There is a difference of a couple of percents one way or the other.

Â And you might think if you were designing a network that you'd like

Â these really sort of extreme probability changes when you instantiate values;

Â but in many cases,

Â that's not actually true and these subtle differences are actually quite

Â significant for an insurance company that insures hundreds of thousands of people.

Â A couple of percentage points and the probability of

Â an accident can make a very big difference to one's profitability.

Â So now let's think about how we would expand this network even further.

Â Vehicle size and vehicle year are

Â things that we're likely to observe in the insurance form.

Â The driver quality, something that's very difficult to observe.

Â You can't go ask somebody are you a good driver because everyone's going to say sure,

Â I'm the best driver ever.

Â And so that's not going to be a very useful question.

Â So what more - what evidence do we have that we can observe

Â that might indicate to us the - the value of the driver quality?

Â Well, one obvious one is one's - is the person's driving record;

Â that is, whether they have had previous accidents or previous moving violations.

Â So let's think about adding a variable that represents driving history.

Â And so let's go ahead and produce that variable.

Â So you can click on this button that allows us to create a node.

Â The node is now called variable 1.

Â So we have to give it a name.

Â So for example, we're going to call it driving history.

Â And that's its identifier.

Â And we also have the other name of the variable, which is usually the same.

Â And let's make that two values;

Â say, previous accidents, no previous accidents.

Â Now where would we place this variable in the network?

Â One might initially think that the right thing to do

Â is to use - to place driving history as

Â a parent of driver quality because if we

Â observe - because driving history can influence our beliefs about driver quality.

Â Now it's true that observing driving history changes are a probability in driver quality,

Â but if you think about the actual causal structure of the scenario,

Â what we actually have is that driver quality is a causal factor

Â of both a previous accident as well as a subsequent accident.

Â And so if we want to maintain the intuitive causal structure of the domain,

Â a more appropriate thing is to add -

Â is to add driving history as a child rather than a parent of driver quality.

Â You might question why it matters?

Â And in this very simple example,

Â the two models are in some sense equivalent and we could have placed it either way except

Â that the CPD for driver quality

Â given driving history might get a little bit less intuitive.

Â But if we had other indicators of driver quality;

Â for example, of previous moving violations,

Â then it actually makes a lot more sense to have all of these be

Â children of driver quality as opposed to parents of driver quality.

Â Okay.

Â So that shows us how we would add a variable into the network.

Â And now it's going to open up

Â a much larger network that includes these variables as well as others.

Â So let's look now at this larger network.

Â And we can see that we've added several different variables in the network.

Â We've added attributes of the vehicles; for example,

Â whether the vehicle has anti-lock brakes and an airbag,

Â which is going to allow us to give more informative probabilities regarding the accident.

Â We've also introduced aspects of the driver;

Â for example, whether they've had extra tech training,

Â which is going to increase driving quality.

Â Whether they're young or old,

Â where the presumption is that younger people tend to be more reckless drivers.

Â And whether the driver is focused or more easily distracted,

Â which again, is going to affect driving quality.

Â Now we've - since personality types is hard to observe,

Â we're - we added another variable which is good students,

Â which might indicate one's personality type.

Â So let's open a CPD for that one.

Â And so we can see here that, for example,

Â if you are a focused person who is young,

Â you're much more likely to be a good student,

Â much more so than if you are not a focused person who is young.

Â If you're old, you're just not very likely to be a student and so

Â this probability is - basically says that if you're old,

Â you're just not very likely to be

Â a student and therefore not likely to be a good student.

Â So now that we've added all these variables to the network,

Â let's go ahead and run a few queries to see what happens.

Â And let's start by looking at the prior probability

Â of accidents before we observe anything.

Â So we can see that the probability of no accident is about 79-and-a-half percent;

Â the probability of severe accidents is about three percent.

Â Now let's go ahead and tell the system that we have a good student in hand.

Â And so, we're going to observe

Â that the student is a good student and let's see what happens.

Â We can see surprisingly that even though we observe somebody is a good student,

Â the probability of no accidents went down from 79-and-a-half to

Â 78 percent and the probability of

Â severe accident went up to three-and-a-half to 3.67 percent.

Â You might say well, but I told you that it's a good student.

Â Shouldn't the probability of accidents go down?

Â So let's look at some active trails in this graph.

Â One active trail goes from good student to focused to driver quality to accidents.

Â And sure enough, that trail,

Â if we consider that trail in isolation,

Â is probably going to make the probability of no accident be higher.

Â But we have another active trail.

Â We have the active trail that goes from good student up to

Â age and then back down through driver quality.

Â So to see that, let's unclick on good students and see what happens.

Â Note that the probability initially that the driver is young was 25 percent;

Â but then when I observed good students,

Â it went up to close to 95 percent.

Â And that was enough to counteract the influence

Â along the - along this more obvious active trail.

Â So to demonstrate that this is indeed what's going on,

Â let's click on the fact and instantiate the fact that the student is young.

Â And we can see that the probability of severe accident went up to 3.7

Â percent and no accident went down to a little bit shy of 77 percent.

Â And now let's observe good student and see what happens.

Â So now we observe good student and the probability of

Â no accidents went down to 78 percent as opposed to before,

Â when it was 77 percent.

Â Now that - and the reason for that is that we've now blocked this trail

Â that goes from good student through age to

Â driver quality by observing this variable which blocks the trail.

Â So we can see the reasoning patterns in a Bayesian network are sometimes subtle

Â and there are different trails that can affect

Â things in and interact with each other in different ways.

Â And so it's useful to take a model and play around with

Â different queries and different combinations of

Â evidence to understand the behavior of a network.

Â And especially if you're designing such a network for a particular application,

Â it's useful to try out these different queries and

Â see if the behavior that you get is the behavior that you want to get.

Â And if not, then you need to think back about how do I modify

Â this network to get behavior that's more analogous to the desired behavior.

Â This network is available for you to play with and

Â you can try out different things and see what behaviors you get.

Â