0:00

Hi, welcome back. We will continue.

Â In this section, we will continue talking about distributed query processing.

Â So, to give you an idea of how the general process works,

Â let's look at this diagram.

Â First, again, we have a query received in,

Â for example, SQL query language with declarative query language.

Â What you do is that the first step is called decomposition.

Â We're going to see what decomposition means in a few but what it does,

Â basically, is similar to what we do in a centralized database.

Â You take a query, you check for the sanity of the query,

Â the syntax of the query.

Â All these kind of issues that can come with

Â a query that a human can write or like a user can write.

Â It checks for that. That's a query decomposition.

Â And then out of that, you create like an algebraic query under distributed relations.

Â After you do that decomposition and normalization,

Â you do something called the data localization.

Â So, you localize these kind of algebraic queries into fragments.

Â Instead of the global schema,

Â you kind of localize it into the fragments and where they're stored.

Â And then you generate fragment queries,

Â and then you do a global optimization.

Â All of this is happening.

Â Nothing is optimal, nothing executed yet,

Â this is just the optimization and all this is done at the control site.

Â After you do the global optimization,

Â you have optimized fragments with operations communication,

Â and then you do local optimizations on site.

Â You send out these kind of plans to the local site to do local optimization sites,

Â and you have an optimized local query.

Â All of this is done also in the local sites.

Â So as you can see, you go through a lot of optimization steps till

Â you reach the final optimized plan,

Â and then final executed plan across local sites.

Â So, decomposition again is checking the sanity of the queries, as we mentioned,

Â and we generate from the declarative language,

Â generate like an algebraic form of the query,

Â then localization, and then optimization.

Â Optimization will take a look of how can you do global optimization.

Â But first, let's see how decomposition works.

Â So, in the composition again, you do normalization,

Â you eliminate redundancies, you do algebraic rewriting.

Â And this is the same exact steps that we apply in a centralized database.

Â 2:40

So, the idea of normalization is that you

Â convert from a general language to a standard form,

Â which is relational algebra.

Â And we do always transform it into

Â relational algebra because relation algebra is the operational language.

Â It actually tells how the sequence of operations are running on top of database.

Â Again, this is done in the decomposition,

Â this is done on a global schema,

Â not necessarily on the fragment schema.

Â So, when you have a query like this,

Â you generate, you do the sanity checks,

Â you do everything on the query,

Â and then you generate this as an algebraic form of the query,

Â the one down there of the same query.

Â So you join R with S,

Â and then you do selection on top in conjunctive normal form like this.

Â So, what you do is that, you check whether this is correct or no,

Â you check whether there are any redundancies and you remove it.

Â So if you look at here, for example,

Â if R does not have an attribute A,

Â this is an issue.

Â In conditions also like this,

Â like A equal one,

Â and A larger than five,

Â this will always return false,

Â so this is not a correct,

Â so you eliminate the redundancy.

Â Here also, there is redundancy,

Â so you just eliminate this redundancy.

Â So, this is all in decomposition step.

Â Also, like common sub-expressions,

Â we apply selection on R here, the same selection on R,

Â why don't we just apply the condition on

Â R just once and push the conditions?

Â So, all these kind of algebraic rewriting is done in the decomposition staff.

Â After that, you go to localization which is

Â a very important step and now, it's different.

Â It does not exist in a centralized database,

Â it's only in distributed database.

Â And here is that you want to replace the global schema, the relations,

Â with the fragments because the data,

Â like you have different fragments of the relations and distributed across sites.

Â So what do you do, the very simple strategy to do is that you start with a query,

Â the algebraic form that you have,

Â you replace that relations by fragments.

Â And to do that, you replace them by using the union operation,

Â because if you need hold that fragments together,

Â it will get you back the basic relation.

Â Then you push the union up,

Â and push selection and prediction down, that's a third step.

Â After that, you simplify, eliminate unnecessary operations.

Â Example, so fragments, the fragments referred

Â to it with R and then the condition that represent this fragment.

Â If we have a query that,

Â this is the selection over R here,

Â what if we have two fragments for R,

Â for this query, R1 and R2,

Â R1 has the condition E less than 10,

Â and R2 has the condition E larger than or equal 10?

Â In that case, what you do is that you put the union between R1 and R2.

Â This is how we replace R,

Â and you create the union.

Â On top of the union, you do selection.

Â So, that's the first step, replace R with its fragments.

Â After that, you push the union up,

Â and then you push the selection down.

Â And in that case, the union up is pushed here,

Â and the selection, the selection should be here.

Â So, what we did is that we push the union up,

Â push the selection down on the fragments.

Â So if you look, is there anything wrong with this expression?

Â Yes. I mean, here, you have selection E equal three,

Â while the fragment actually has only E larger than or equal 10.

Â So, this means that all the side

Â needs to be eliminated which will eliminate the union as well.

Â And the final results of the fragmentation is going to be selection on R1,

Â E less than equal 10.

Â That will be the final result of the localization.

Â So, these very simple steps,

Â you can just follow them,

Â the localization algorithm follows them with any kind of query written in algebraic form.

Â