[MUSIC] So we talked about algebraic optimization and then we talked about decorative languages on top of the algebra in order to simplify expression and in order to avoid specifying to the computer exactly how to do it. Right we want to leave that open and let the database figure that out. But we stopped at what I'll call logical optimization. And I want to talk a little bit about the physical level optimization. And what I mean by this is that, we hinted at this last time, but even after you specify the order of operations, we haven't yet specified every detail needed in order to actually evaluate the query, okay. And let me give you an example of that. So here's a simplified version of a query we looked at last time. When we say for every order, we wanna find all the corresponding items that were part of that order. And last time we had an extra condition so for every order, find the corresponding items that match. And the last time we had another predicate down here and this time I've taken that out. And so the algebraic plan that this translates into is very simple. It's just a join of the two tables, and that's it. So you think we're done, right, we're gonna join, order an item, and we're finished. Well we gotta specify how we're gonna do that join, and so let me tell you about a couple of the options here. So one, in sort of very high level pseudo code, looks like this. We can say for each record i in item, and for each record o in order, check to see if those two records agree on the order field, on the order attributes. And if so, return it, and that's a joined result, they match. Another option is for each record i in Item, insert that record into some sort of data structure and here I'm gonna call it a hash table. Now I'm not too concerned about what exactly that is. And then, second, for each record o in order, go look up the corresponding records in that data structure that we found, rr that we built. And return all the matching pairs okay. And if it actually is a hash table that we're talking about, then this lookup could be pretty efficient, right? It could be constant time, amertized constant time right? And so now this one says, well, for every record in Item, go scan every single record in order. And so we have kind of an in squared, complexity going on here. And here we say, well for every record and item, put into a data structure. And then after that, for every record in order, go look up those records in the hash table and if indeed this is constant time amortized, then this is a sort of linear time algorithm. So I argue that there's two different ways to implement this join, and both of these are valid. So which one is faster? Well, I've sort of hinted that perhaps option two is faster, but in practice it may or may not be. And so, I would pause here and ask the class to answer the question but since it's over video I can't do that I'll give you a moment to think about that. But I want you to think about why this one in particular might be faster in some cases than this one even though it seems like it shouldn't ever be okay? And let's see an example, maybe, in a second. So, leaving that question hanging open I wanna make the point that you have access to this underlying algebra. This isn't something that's all, that's purely sort of theoretical, right? This is something that you can use tomorrow if you work with databases at your job. For example in this particular product, Microsoft SQL Server, and in fact in all the products you're going to use the same sort of mechanism, but you can explain a query. And that will give you access to some form of this algebra that I've been talking about. Okay, so if you take a query and here I've changed the query, I've changed the scheme get again. This table, this order table is the one you will be working within the homework. I've read in the query here and I've explained it and what shows, what sequel management studio gives back to me is a little algebraic tree kinda like the ones I've been drawing here, just you know in PowerPoint okay. And so this one says a hash match is gonna be used to implement to this drawing condition. This one's kind of a complicated join condition, for a reason I'm not gonna explain right now. But it has two leaves and then they get joined with this thing called a hash match interjoin. This is very much like the hash table example I gave on the previous slide. But I want you to take a look at something. So here, I've taken the exact same query but I've added an extra condition where I'm only looking for words equal to parliament. I probably should explain this scheme a little bit. So the Reuters dataset gives you term frequencies. You have three columns, doc_id or let's just say doc, term and frequency. And the frequency is how often that term appears in that document, okay? And so this is the table you'll be looking at and so here what I've said is I'm looking for pairs of terms that co-occur in a single document is the previous query I was looking at. And now I've said, well, look, I don't want all pairs of documents, or all pairs of terms, I only want terms that co-occur with the term parliament, right? So, perhaps lawyer co-occurs with parliament frequently. So I'm looking for all the terms that co-occur in some document with parliament, is what this query is expressing. [MUSIC]