Hi folks. Welcome to our Module 3, Normalization. At this step, we already know that what's Entity-relationship models are including what are entities, what are relationships and you know, how to draw the attributes, identifiers, and the participation and the cardinality. Then you know how to convert a entity relationship diagram to a relational model and you know how to represent the model using a set of relational schemas. In the schemas, you know how to underline a primary key and have the FK after the foreign keys even know how to build out the connections between relations. In this module, we are going to work further on the relational models, which is a set of relational schemas. We're going to introduce, first of all, the data redundancy problems and the problem caused by the data redundancy as well. We're going to learn the functional dependencies, which is the key point of how we can do Normalization. Finally, we're going to learn several normal form, with different restrictive requirements, and also the lastly part is Normalization process which is we move our relations from a less restrictive way or less restrictive requirement to a more restrictive requirement so that we can avoid data redundancy and some other problems. Our learning objectives for this module will be, you are going to explain the problems associated with data redundancy. Then you are going to explain and identify functional dependencies, either by examining the data you have in mind or using your common sense to differentiate or to find out what are the possible relations within a relation and among the attributes in the relation. You are going to explain what are the different normal forms and you are going to normalize all the relations you got in the relational model to at least a third normal form so that we can implement your relational model into a physical database. That will be our final goal for this course. In the next course, you're going to use either a SQL or use a software management system to do the implementational parts. We talked about Normalization is about data redundancy or Normalization is the approach we use to deal with data redundancy. What is exactly data redundancy, means, and how it looks like in a real-life database? Let's look at an example that first of all, our relational database design from the entity-relationship model, from the converting as you can see, it is not that's quite hard. Actually, we just do following, the routine, and do the requirements and we eventually work it into several relational schemas as a relational model and there's nothing very difficult. However, there is a difficult apart and that is the normalization and it is about how to minimize data redundancy. We can, of course, distribute that data and I think about whether maybe this is a redundant, but it is not systematic. Normalization is a systematic way for us to detect possible data redundancy and remove them by moving some of the attributes or separate some of the relations so that we can keep the data in a place without redundancy. Let's look at this data and that we have seen that many times is about employee information. Now we can see that we have one extra column at the end, which is the address. Of course, this table looks fine. We just have nine as the cardinality and looks like there is not too much redundancy. First of all, we know that the name and date of birth, the position, they're must-haves and there's focus in the last column we added to illustrate what is data redundancy. It is the address of the store IP be. Think about we have 1506, 1546, etcetera, for each of the ID, we will have one specific address in our store information. That is another table. With these arrangements, if we add the address into the employee information table, the consequence is. For every store ID we enter as a record for an employee. We have to enter the address as well, because the address is one column here, we have to fill the value for this attribute. For example, we can see for the store added 1506, we have to enter the address twice. For store ID 1546, you have to enter the address three times as well. Think about actually, well, we know the store ID, we can always check, restore information, and get to the address, because there is a clear one-to-one relationship in a store information. Having the adjuncts in this employee information is not unnecessary because first of all, where we enter the store ID, we have the inter the address, even though the address is not making any difference, 1506 will always give us 1200 and W, Dillon Road. That is the time consumed in the interim parts into our database. It is quite time-consuming, is also not only about time-consuming, but also think about as a human, we will make mistakes, we will have capitals. When we enter a store ID as 1506, we may sometimes enter a wrong address like is not 1200, we maybe just entered 120, W Dillon Road, which will make our data has some error in there, and this error doesn't be necessary because we can avoid this error easily, but not including address in the employee information. Those are the problems or associated with dependency. We data redundancy where we enter our data. Think about if we want to update our store information. Let's say our store 1506, will be relocate to another building to 1400 W, Dillon Road, for example. If we have to do the updates, think about in these employee information. If we have 500 employees working at store at the 1506, they would have 500 addresses, and we have to do update them one by one. Updating will also be time-consuming process and again, easy to make mistakes as well. If we want to close one store, let's say we will not need 1506, this store anymore. We are going to change all the employees from this store to another store or to somewhere else. We're going to delete all the store ID 1506, and also delete are the address 1200 W Dillon Road. The consequence is, if we want to know where were our 1506 as a store, where with the location of the store? Then it is done basically because there is no employee working in that store, there is no ideal employee information table, then the address basically well-being deleted as well. We can never retrieve what was the location of the store IP 1506 as 1200. That information basically won't be done. What I just talked about are the problems, they insert problem when we have to spend more time to enter the same code, same information again and again. We may make mistakes, we have to take more time. The other problem is deleting. That is when we did each the information, we can never go back, and also for the update, if we have to change some piece of information, we have to change a lot of duplicates. Data redundancy, will bring a lot of problems into practice. Of course there is another problem well-being, we have to have more data stored in our database, because those data are, for example, 1629th street basically appear three times just in this nine cardinality sample data. If we have a reorder database for more than 1,000 employees, then you can expect you have many more lines of the same objects appear again and again. That will be something you'll want to take care of when you see your database has a table and you can see the same value again and again appears there, and you actually can find a way to avoid that. A better design is actually the one we have seen before. Rather than put the address of the storeID in the employee table, we can actually have a separate table called store information, and then in that store information, we can start the streets, city and a zip for one particular store without having those address in the employee table. Now think about our three problems by data redundancy, when we enter the information, now we only need to enter the associated storeID, and then that's it. We don't have to enter the address, we can only update the address or enter the address once in the store information, in a second table and that's it. When we do the deleting, even though currently there is no employee working in our 1506, we still can keep that information in our store information table. We can just say it is closed or temporarily closed. We will not reduce this information, where have the information in a separate table, and for that's problem, which is the updating. Now if we have to relocate 1506 to another location, then we just have go to store information, change the address once. We don't even have to go to the employee information, and modify those address multiple times because in the employee information we only have the storeID and all the other informations can be used in the storeID, and then retrieve. That is the magic of joining on a magic of connection, and that is the magic of foreign key. That is the how relational databases using tables and their relationship or the connections to create very powerful queries, so this will be a better design. Now we know that we can have the information into a separate table, and I know yes, this is a good solution. But how we can reach this solution, of course, it will be very experienced, design learn to do so. But for us as a beginner, how we can achieve this better design systematically. Not just to look into the data, but it should look at the structure and the find a way to reach to this stage and be confident that our design is good enough for implementation, that will be normalization. Normalization, as I said, is a systematic way, is a technique for producing a set of suitable relations that support the data requirements of the enterprise. We have the suitable emphasized because it consists of the minimum number of attributes necessary to support the data requirement of the enterprise, the minimal just means that we are going to remove all unnecessary information, and also we are going to remove all the redundancy of data. We're going to keep the minimal redundancy only where it is needed, such as we have to have foreign key and a primary key in both table because we need them to do the connection. Otherwise, we are not going to have the duplicate information in one table and another table and in the table for multiple times. Then lastly, we are going to group attributes with a close logical relationship, that have them in the same relation, so it will be easier for us to insert, update and delete and manage them basically. That will be the benefits of our normalization. We removed the problem of data redundancy and we remove the problems caused by data redundancy. It is easier for users to access and maintain data, because normalization results will be a good way for us to think about what a good database should be, and it may not be the database we got from the relational model, converted from the ERD. But eventually you will find out the normalized relational model, has a much better design as we have seen before. It will take a minimal storage space because we removed the data redundancy, of course.