[MUSIC] All right, so how do we use these views? Well, an example of why you might want to do this, a high end, let's define a high end store as a store that sold some product for a price over 1000. All right, so a high end boutique store. Then for each customer, find all the high end stores that they visit and return a set of customer name high end store pairs. In this case, this is a fairly complicated query. But now we've got a definition of high end store that others can use in their own queries if they wanna access. Actually, I didn't do a very good job of explaining this. This is actually exercising the view we just defined in the previous slide. So this is a new query that exercises this store price view. Okay. I was actually going one step further. And I wanted to define a new view that used, where this query was the definition because I thought the notion of a high end store might be something useful. And of course you can do that. You can stack views on top of views on top of views on top of views. All right, so how do we use a view? Well, as I said all you have to do is reference a view in the query just like it's a table. And so here you wanna define the notion of a high-end store and we say well that's any store that has sold some product over a $1000. And for each customer we may wanna find all the high-end stores that they visited. Well, being able to directly reference the store price relation that we defined in the previous slide, this view helped simplify this query. So you can just write a query that directly accesses that view as if it was a table. And okay. So, how is this actually evaluated? Well, that's actually, oops. That's actually what's really fantastic about databases, is that this query will just be folded together with the view definition and passed to the database, where the whole thing is optimized in one go. So you don't need to worry about the difference between having a stack of five Us and all being folded together in one query. The database doesn't care. It's gonna translate the whole thing into one big query, exchange that for an algebraic expression, and that do the normal optimization procedure to come up with the best possible plan. So it's basically like free abstraction. It simplifies things for the programmer without any kind of performance cost. Now, you can actually get better performance than writing the whole thing by hand. It's equivalent to writing the whole thing by hand, you don't pay a penalty. But you can actually do better than that with views and in some cases buy materializing views. And we're not going to talk too much about that because they're sort of is very specific to databases and we don't see it quite as often in this broader context of Data Science that we're trying to talk about. But, it's a good trick and once you have the mechanism to store views, you can essentially cache the results and that's what we call materialization, okay? And so one final point I wanna make. Another key idea of databases is indexes. And while indexes are certainly not unique to databases, what I think databases have, perhaps uniquely, is a fantastic platform for building them and applying them and using them. Right? So if you are in a general purpose programming language, if you're working in Python or you're working in R, and you come up with a pipeline to process some data, and you decide boy it'd be a lot better if I sorted my data by customer last name before accessing it, because that would make things more efficient. That's like building an index. But you have to rewire all your code in order to take advantage of that index. The key thing about databases is that indexes are easily built and automatically used by the optimizer when appropriate. So, all you have to do is write in an expression like this, create index on some column of some table. And here I've switched the schemas yet again on you. And the database now understands that this index exists and will use it whenever it's appropriate. And you can see this in those plans. I talked about using the explain to figure out the algebraic expression in the plan and actually look at it to reason about the performance. You'll see it selecting to use an index scan as opposed to a regular scan when these indexes exist. Okay. And sort of going in reverse order here, this top bullet is just saying that one of the areas where databases shine is sort of these needle in haystack problems, for which indexes are critical. You can ignore this point about old style scalability, I skipped that point. It's not quite as relevant for data science. So the last key idea I want to convey about databases is that of indexes. So while indexes are certainly not unique to databases, databases are perhaps unique as a platform that can make them very easy to apply and deploy and automatically take advantage of. And so databases are especially, but not exclusively, effective at needle in the haystack problems. Looking up individual records or small amounts of records from large data sets. They do other things very well too but this is one thing that they're quite good at. And the reason is that you can apply indexes. This second bullet needs a little bit of context, but what I mean hear is that if you're trying to write code to do this yourself in some programming language like python or C or R, you're going to be a slave to what sizes of data fit into main memory as we've said before. Now you can absolutely be clever and start bringing in one chunk of data at a time in memory, processing it, putting it out to disk and bringing in the next set and so on. But the code will very quickly become very very complex. This is something that databases already know how to do. And so your query will always finish regardless of database size, as long as it fits on disk, right? It doesn't matter how much memory you have available, it'll eventually finish. It may not be that fast, but it'll finish. They already know how to take advantage of main memory in this optimal way. And you know, it's not easy, right it's a pain in the butt to try to code that yourself. Okay. So effective use of the memory hierarchy, effective use of indexes. These are things the databases can do well. It's a great platform for applying these tricks. And so finally this what I mean here is that the indexes are easily built and automatically used by the optimizer. So to create an index you can write a statement like this. Here I've changed the schema on you once again. But here we're sort of filtering on genetic sequences, and if I create this index then this query will- You know, this very simple query is looking for all sequences that match a particular value. It will automatically take advantage of that index if it's there. You don't have to tell it to do anything. You write the exact same query. [MUSIC]