One very common kind of repeated structure occurs when we have multiple objects of the same type. So. That where we want to have all these different copies of the object. It's not copies of the objects, but objects of the same type all have a similar or in fact the same probabilistic model. For reasons that we'll talk about momentarily, one of the most common type of such model is called a plate model. Letâ€™s start by modeling repetition. So in this case, imagine that we're repeatedly tossing the same coin again and again. So we have an outcome variable, and what we'd like to model is the repetition of multiple tosses. And so we're going to put a little box around that outcome variable, and this box which is called a plate. Is a way of denoting that the outcome variable is indexed. Which we usually don't denote explicitly by different tosses of the coin t. And the reason for calling it a plate is because the intuition of the this is a stack of identical plates. That's kind of where the idea comes from for plate model. And looking at what that model denotes is if we have a set of coins. The coin tosses t1 up to tk. It basically says that we have a set of random variables, outcome t1 up to outcome tk. So we've just reproduced the outcome variable in its multiple copies. Now, what is that explicitly correspond to? I'm now going to do something that we're going to do a lot of times later on in the course when we talk about learning which is I'm going to put the parameters of the CPD explicitly into the model. So this random variable theta is the actual CPD parameterization. And I'm putting it explicitly, so that I can show how different variable depend on that. And so if we have the parameters here, we can see that theta is outside of the plate. Which means that it's not. Not indexed by t, which means it's the same for all values of t. So what that means is that we have this parameter theta over here. And we have all of these outcomes depend on the exact same parameterization. And the CPD of the outcome of t1 is copied. From this parameterization theta. Let's look at a slightly more interesting example. Going back to our university with multiple students, we now have a two variable model where we have intelligence and grade. And we now index that by different students s, which again indicates that we have a repetition, a copying of this template model. In this case, I only made two copies for one for student 1 and the other one for student 2. And once again, if we wanted to encode dependence on the parameters. So we might have theta i, which represents the CPD for i. And we might have theta g, which represents the CPD for g. And we would have exactly the same idea of theta i and theta g. Where theta i enforces the two i variables and theta g enforces the two g variables, and again, they're out of the plate. The importance, sometimes in many models, we will include those parameters explicitly within the model. But often when you have a parameter that's outside of all plates. We won't denote it explicitly. So we just omit it as we did in this original diagram before I annotated it. Now, just repeating the exact same model multiple times is not particularly interesting. So now let's talk about how you can overlap different plates. Or in different words, think about how different types of objects in the model overlap with each other. So in this case, we have two kinds of objects in our universe of this course. We have courses and we have students. And courses we're going to call a little c and students we're going to call a little s. And so now let's think about how you might replicate variables of correspond to properties of courses and variables that correspond to properties of students. So the difficulty variable belongs in the course plate because it's a property of course. So it's going to be difficulty of course and always think about how we are going to put students in? One possibility is that we're going to nest. The students plate inside the course plate. Now what that means is that the student of each variable here, both of these variables are indexed by both s and c. Because when a variable is nested in a plate, it means it has the indices of all plates that it's nested in. So if the intelligence variable is in both the s plate and c plate, it's going to be indexed by both. So let's build that model and see what it looks like when we sort of unravel the courses and unravel the students. It can look like that, that we're going to have the difficulty of, let say this is a two course model and the two student model. So we have the difficulty of course one and the difficulty of course two. And now we have the variables in the nested plate I and G. And we can see that they're both parametrized by both student and course. Let's think about the implications of this model. The implications are that the intelligence is now a property of both the student, intelligence, Of student in course. And that the intelligence of the student in a particular course influences the grade of the student. In that course. And you can see that by having this dependency model over here. Now, let's think about the implications of this. This tells us that there is a core specific intelligence for every student, for every student in every course and that may or may not be what we want. If you're taking radically different courses and one is in art class and one is a math class. Then you could say that there is an art intelligence representing skill if you will in art. Then you have a math skill or math intelligence that you might actually want to have two different kinds of intelligence and not assume that they're necessarily the same thing. Of course, that's kind of complicates the model, and if you have a bunch of corrupt courses that are in some ways similar to each other and take a similar set of skills. You might not want to have a bunch of independent, look independent. Random variables. Sorry, these ones. We're assuming that the student has two independent intelligences representing their intelligence in two different courses. And, in that case, you don't want the intelligence variable to be part of the course plate. And so, that gives us an alternative representation, which is what's called plates that are not nested, that overlap with each other. So in this case, we have the course plate which is this plate over here and we have the students plate which is this one over here and the assumption is the difficulty of the property only of the course. Though this is the difficulty. The intelligence is a property of the student and only the grade is a property that depends on both. And when we unravel this one, what we end up with a model is a model that looks like this. So in this case, we only have a single, we have a difficulty for the course, we have an intelligence for the student. And over here, puts the note things in the intersection in green. We have the grade of the student in the course depends on the difficulty of the course and on the intelligence of the student. And so, now we only have a single intelligence per student. And that is an alternative model. It's not that one of these is right and the other is wrong. They're just different. And once again, just to demonstrate explicit parameter sharing, I just wanted to highlight again that this motion of parameter sharing also applies to models such as this. So here we have a parameter, theta D. We have a parameter theta I and we have a parameter theta G, which influences the. And that's shared among all of the different grade variables. So why are these kinds of plate models useful? So let's look at an example to convince ourselves that by building these richly structured models, that involve multiple entities, you can actually get much more interesting conclusions. So let's look at this example over here. Imagine that we have this first quarter freshman, came into our university, and we'd like to figure out what we can determine about him. So let's say that in this particular university, our priority believe that most students have high intelligence and so this is the intelligence distribution and 80% high. Now, these students were in a call George took two classes. He took Geo101 and got an A. So probability that he's intelligent goes up. He took CS101, didn't do so well, got a C. Now, the probability goes down, but it doesn't go down to a very low number. And that's because we know from the CPD for grade that we've seen previously, so there may be other multiple reasons why student's might not do well in the class, for example, it was a really hard class, so everybody did battle and didn't take issues seriously. If these are the only two courses that George took, we're kind of stuck. But now let's think about this in a more holistic context, or Collective Inference, where we're going to think about a number of students taking a number of classes and let's imagine that we have a bunch of grades for all of those students. So what we see here are, the green ones are As, the yellow ones are Bs and the red ones are Cs and what you see here is a short transfer about to observe great variables. I didn't put in all little dots that represents the great variables I just put in these lines that indicate with their. So you can think of this network if you will. So now let's think about what kind of conclusions we can reach from this network. And seems even looking at this by eye, we can see that a bunch of people took CS101. And they all except for our friend, George, and furthermore, even if we look at this guy over here, who got a C in every other class that he took, he still managed to ace CS101. So if we do the probabilistic inference over this holistic model, what we're going to get is that we are pretty sure, but CS101 is an easy class. And if pretty sure about that, we're also pretty sure in this case of the intelligence, as well. And so, we can reach much more important conclusion in the setting but we can by reasoning about individuals and isolation. Now, this is a toy example, but we'll see later on examples of collective inference where we have multiple interrelated entities. It could be related pixels in an image, it can be related webpages in a website that webpages point to each other. That if we try and label each entity isolation, we just don't get very informed conclusion. But by thinking about how they all relate to each other, we get much stronger results, they're much more informed. So just to then summarize the plate dependency model, the plate dependency model has the following characteristics. It defines the dependency model for a template variable that is indexed By a bunch of object types. So, for example, students in courses or anything else. And for each of those template variables, we have a set of template parents. And what we have is that each of these has to be a subset of this. So what does that mean? It means, for example, that for the template variable G ( s, c), so the G corresponds to variable A, s and c correspond to the indices, in this case, U1 and U2. And what we have is two template peers. We have I of s, and D of c. And the stipulation that UI is a subset use of the set UI, is a subset of the variable MCU one of two. Tells us, for example, that we cannot have an index in the parent that doesn't appear in the child. So, for example, we cannot have in this model, for reasons that I'll describe in a minute, the notion of for example, honors for student s, depending on the grade of the student in multiple courses. And the reason for that is that this is not a CPD. You have the honors variable depending on a potentially unbounded number of parents. Which are all of the grades in which the student participated. And if not we can say that we can define such a dependency model, in fact, there are richer languages than plates for which people have to find this notion of an aggregator CPD. That is not within the standard and what are traditionally called plate models. So by preventing that, we now have effectively a traditional model where You have a random variable with a finite fixed set of parents. And so we can define a template CPD, which we can then reuse in a model for any copy of this template variable, where a copy is obtained for different substantiations of these in the CSU. So specifically if we have this model, if we have this variable A of U1 up to Uk, then for any instantiation little u1 up to uk, which are concrete instantiations of the indices, we would have the following model. We would have the variable A of little u1 up to little uk, depending on the specific, Which is potentially confusing notation, because the sets are a little bit hard to understand. But just really, just think concretely of the example. This exactly says that the grade of a particular student in a particular course depends on the difficulty of that course and on the intelligence of that student. That's all it says, okay? So it's just a general way of saying [INAUDIBLE]. And this is just the formal version of the statement that I made earlier that requires the parents not to have variables that are not explicitly instantiated in the child. So that we don't have a free floating variable that can be instantiated in arbitrarily many ways. So, to summarize, plate models are going, which allows us to find a template for an infinite set of Bayesian networks. Why infinite? Because you can have 3 students, 10 students, 1,000 students, a million students, an unbounded number of students. So there's an infinite set of Bayesian networks that we can use this language to encode. And each of them use a different combination of domain objects in our example, for instance, students and courses. The parameters and the structure are reused in both within the base net and across the different base nets. So for example, within our university example, we will use the same parameter. And if we have a different university with a different set of students and courses, we would will still use the same parameters. These models, by allowing us to represent an intricate network of dependencies, allow us to capture very richly correlated structures in a concise way. Which allows us to do this kind of collective inference, which is potentially a very powerful source for informed conclusions. Now I've presented place models, which are the perhaps earliest and one of the simplest of these languages, which allow us to represent template structures. This is a simple one, for example, it has this restriction on the parents not having variables that are not instantiated in the trial. And so for example, you can't represent temporal models here because X(T)-1 is not instantiated in the variable X(T). So you can't have X(T)-1 as a parent of x sub p. Not in the price model, I mean, obviously, we have languages that can do that, but not this one. Similarly, you can't have the genotype and the genotype of the father affect the genotype of the child, because, once again, the child doesn't instantiate the mother and the father. These are separate indices. And so this is a limited language, but there's many other languages that expand on it in different ways. And they each have different tradeoffs in terms of what they express easily and what they don't. And there's an entire literature on this that we're not going to go into. But it has provided a number of very useful languages representing these kinds of richly structured models.