0:00

But [COUGH]. Today's topic is an important extension

Â on the language on graphical models. And it's intended to deal with the very

Â large class of cases. Where what we'd like to do is not just

Â write down one kind of graphical model for a particular application.

Â But rather, come up with something that is a general purpose representation that

Â allows us to solve multiple problems using the same exact model.

Â So to understand what that means, in a somewhat more concrete setting.

Â Let's go back. Now for genetic inheritance that we've

Â discussed previously it's arguably the earliest sample of Bayesian network

Â reasoning, before Bayesian network were invented.

Â And here we have as input a pedigree. Which is a family tree.

Â And we're interested in reasoning about a particular trait.

Â And for each pedigreeing each trait we can construct a Bayesian network and the

Â Bayesian network might look like this. But clearly if you have a somewhat

Â different family tree. So suddenly you had another three cousins

Â and a great grandfather join the family tree or if you had a different family

Â altogether, you would still want to use the same sort of ideas.

Â The same pieces of what we used in the, in the first network to construct this

Â other network because there's clearly a lot of commonalities between them.

Â We have what you might call sharing between models.

Â But in addition to that, you also have, in this example as a

Â fairly obvious sharing within the model. So for example,

Â this CPD that tells us how, Selma's genotype affects Selma's blood type

Â presumably is the same process by which Marge's genotype affects Marge's blood

Â type. And the same for Maggie and Lisa and Bart

Â and Homer and everybody. So so we have this tremendous amount of

Â sharing of both dependency models and parameters.

Â Similarly you might argue. You might looking at this realise that

Â the genetic inheritance model by which Bart's genotype is determined by the

Â genotype of his two parents is the same inheritance model that applies to Lisa,

Â to Maggie, to Marge, the Selma, and so on.

Â So once again we have, a lot of parameters that are shared, not just

Â between, but also within the model. So we'd like to have, some way of

Â constructing models that have this large amounts of shared structure, that allows

Â us to, both, construct very large models from a, sparse parameterization, and also

Â to construct entire families of models from a single concise representation.

Â This is not the only such application. This is probably the most commonly used

Â type of graphical models are those that have shared structure and shared

Â parameters. So here is another example that we've

Â seen previously. This is for natural language, processing

Â a sequence model for in this case, trying to identify name density recognition so.

Â A very common task for which graphical models have been used.

Â And here also we have a sequence model. And we have again shared shared pieces

Â for example the parameters that relate the latent variable. In this case, what

Â type of what type of variable, what type of part, what type of entity it is.

Â Is it a person? Is it a location, and so on.

Â There is a set of parameters here, and they're going to be independent of the

Â place in the sequence. In which we find the word because, not

Â because it's necessarily an exactly correct model.

Â I mean one could clearly imagine cases where the position and the sequence might

Â make a difference but because it's often a very useful, simplifying assumption

Â specifically because it allows us to A, use parameters, re-use parameters, and B,

Â allow us to apply the same model to sequences of varying length without

Â having to worry about what is my fifteen word model versus what's my eight word

Â model. And so this is a, another case where you

Â have We have a tremendous amount of of

Â Reuse of parameters. We've already seen similarly the example

Â of image segmentation. Clearly we don't want to have a separate

Â model for every superpixel in the image. There's hundreds of superpixels so we

Â have sharing across [INAUDIBLE] super pixels or pixels.

Â So, for example the model here that, that relates the class label of a super pixel

Â to the image features of that super pixel is generally going to be shared.

Â As are the parameters that involve adjacent super pixels.

Â So these edge potentials are also going to be shared across.

Â In this case, pairs of super pixels. These pairs of adjacent superpixels.

Â And once again, we have sharing across, models as well.

Â Because we are going to have one such model for image a.

Â And obviously, we don't want to construct a separate model for every image.

Â So, once again, we have sharing between and who then, a model.

Â 5:45

so lets look at one example a little bit more concretely because it's an example

Â that we're going to use in some of the later analysis.

Â So go, now lets return to our university example where we have

Â student who takes a class and gets a grade and that grade depends on the

Â student's difficulty of, I'm sorry, difficulty of the course, not on

Â intelligence of the student. Now this is all fine if we're interested

Â reading in just about an individual student, but now let's imagine that we

Â want to think about an entire university. So now we have a difficulty variable, so

Â these are all difficulty variables. The different courses, in this case, C1

Â up to CN are different courses that exist in our university.

Â And conversely on the other side, we have multiple students and this is a set of

Â intelligence variables that are indexed by different students.

Â So, we have the intelligence of student one up to the intelligence of student m.

Â Now note that these are different random variables.

Â They can and generally will take different values from each other.

Â but they all share a probabilistic model. And that's sort of the kind of sharing

Â that we have in mind. And what we see here is that the grade of

Â a student within a course, which are these variables done here, depend on the

Â difficulty of the relevant course and the intelligence of the relevant student.

Â So for example, the grade of student one in course one depends on the difficulty

Â of course one and on the intelligence of student student one.

Â And, once again we have, sharing of the both the structure and the parameters

Â across these different grade variables. So that they all have, the same kind of

Â dependency structure, and the same CPD. another example is that of robot

Â localization, and this is another example of, in this case, a time series.

Â 7:49

the robot moves through time from one position to another, and although the

Â position. At time t, is different changes over

Â time. We expected the dynamics of the robot are

Â fixed. We will talk more about that later.

Â So that gives us a graphical model that, again, looks a little bit.

Â this is one example of such a graphical model.

Â Where we see the B position, the robot pose over here, these X variables depend

Â on, for example, the previous pose, on whatever control action the robot took.

Â And we're assuming, that once again, we have sharing of these parameters, where

Â each extentiation of this variable. So, what then gives rise to is a class of

Â models that are represented in terms of template variables, where a template

Â variable is something that we end up replicating in many cases again and again

Â within a single model as well as across models, and so the replication is indexed

Â by the, by the fact that the variable, you can think of it as a function that

Â takes arguments, and the arguments for example might be, high in points as in

Â this example over here. So here we have a location, variable

Â that's index by time point or sonar reading with index by time point.

Â We have in this case, a genotype variable and a phenotype variable that's indexed

Â by a particular person. a pic, a label, a class label that's

Â indexed by a pixel, and similarly the difficulty of the intelligence and the

Â grades that are indexed, and in this case, different combinations of indices

Â course student or course student pairs. And the template model is a, language

Â that tels us how template variables can be, the dependency models fro template

Â variables and how concrete instantiations of, variables, what are called ground

Â variables. Like the ones that are actually indexed

Â by a particular time point or person how they inherit the dependency model

Â [INAUDIBLE]. And there is a whole range of such

Â languages that have been developed in the special purpose way for different

Â applications so dynamic Bayesian networks are intended for dealing with temporal

Â processes, for example. Where we have replication over time.

Â We have a whole range of language for object relation models, both directed

Â models and undirected models where we have multiple objects such as people or

Â students or courses such as pixels so people.

Â Horses, pixels, and lots of other things that can be related to each other in

Â different ways, and how do you represent the dependency model over that ensemble

Â in a coherent way?

Â