[MUSIC] Welcome to the fourth week of my course! For the last three weeks, we have discussed on the following topics. In the first week, my lecture covered the definition and examples of spatial data science, and why spatial is special system, and analysis perspectives. In the second week, I proposed four disciplines for spatial data science, which are GIS, Spatial DBMS, Big data systems, and data analytics, and related open source softwares were introduced. In addition, we discussed spatial data and spatial big data with examples, and the value of spatial big data. In the third week, the first disciplines of the proposed framework, GIS was a topic and the five layers of GIS were introduced and discussed in detail. This week, I will introduce database Management System and big data systems. Additionally, you will learn how to use the systems for spatial data and spatial big data. Let us start with the Database Management System. [SOUND] What is Database Management System? By definition, database is a collection of interrelated data. In other word, a set of data which are related to each other. For example, in a university, there are a group of students, courses, professors, classrooms, and many others. They are all related. Let's say, students take courses, courses are offered by professors, courses are located in classrooms, and so on, which are an example of database. Database Management System - DBMS, is a computer software application to define, create, query, update, and administer databases. In order to effectively conduct such functionalities, it provides fascinating features. First, removal of data redundancy and inconsistency, which often occurs to unstructured file system. Second, persistency across failure, when a computer clash or some disastrous event occurs to DBMS, it can roll back to the condition just before the event with backup and recovery. Third, it can allow concurrent access to database, in other words, multiple users can access database simultaneously. Think about a banking systems. Thousands and thousands users simultaneously access bank account Database, through Internet banking, smartphone banking, and many others. Any anomalies never happen due to such concurrent access to database. The second and third functionality are collectively called Transaction management, which can present key functionality of DBMS, ACID, which stands for Atomicity, Consistency, Isolation, and Durability of transaction. Fourth, DBMS presents scalable solutions to access large database using disk space management, platform management, indexing and Structured Query Language, and query optimization. Fifth, DBMS provides control over access level to database. For example, I, as a professor, can see my student's GPA. However, my student cannot see their colleagues' GPA. These are all powerful features with which DBMS can overcome limits of flat file systems. In order to build a database, you have to understand data modeling. It is a process of building data model, which is an abstract representation that includes only the important aspects of a given problem. For example, it's me, Joon Heo I'm a human being and a reality that can be characterized by a lot of attributes. However, when I am modeled for university database system, as a professor, I can be modeled with such a given list of attributes. On the other hand, for a medical DBMS where I'm just a patient, then I can be modeled differently, though I'm the same reality. Data modeling takes three steps for DBMS. Conceptual modeling is to identify highest-level relationship among different entities. Logical modeling is to describe as much detail as possible, including primary key and all attributes for each entity. And the resolution of many relationship between entities. Physical modelling is the actual design of a database And it shows all table structures, column names, column data types, constraint indexing, and so on. And at the same time, it is DBMS specific. In other words, different DBMS software has different physical model for the same logical model. Now, you are looking at an example of three steps of data modeling process which has two entities, student and course, and one relationship of "student complete course". The conceptual model illustrate the relationship between entities and the relationship of each entity in ER-diagram. Logical model presents more details, such as primary keys, foreign keys, and resolution of many-to-many relationship between student and course, with a cross-referencing table named Completes. Physical modelling is to describe the way how database is physically stored in computer hardware. It adds column name, column data types. For example, the data type of student ID is a string of 10 characters, student name is string of 30 characters, “Age” is integer, constraints such as credit range from 1 to 4. Decision on column for indexing of each table, and so on. [SOUND] Relational DBMS organizes database into one or more tables, also known as "relations", which are related by shared attributes. Table has columns and rows, and each row can be identified by primary key. Rows are also called as records or tuples. Columns are also called attributes or fields. The relational DBMS is the de facto standard in DBMS. In the past, conventional relational database can only handle number and string data types. But now most of DBMS such as Oracle, DB-2, and many others can deal with user-defined data types. And now, they are called object-relational DBMS. [SOUND] Relational DBMS provides another powerful tool, Structured Query Language, also known as SQL, which can create and delete tables and submit queries to database. The basic syntax of SQL is composed of Select, From, and Where. Query processing starts with FROM, which gives a list of target tables and then goes to WHERE which present conditions of attributes of the given tables. Then goes to SELECT which specifies the final outcome of the query. Let's take a look at the query example with the given three tables of "student complete courses". The query is "List all the courses completed by student named John Oh. We have to navigate all the tables for FROM clause, we should list all three tables because we have to navigate all, where character S,C.O are the variables to point to the three tables. For WHERE clause we list all the conditions to find tuple from student table, of which student name is John Oh, then navigate student table to complete table and then to Course table by matching primary keys. Then finally, retrieve the result of the query. [SOUND] Database queries should be conducted as quickly as possible, as fast as possible. For that, indexing methods are used to quickly locate data without having to search every row in a database table. Database index is a data structure that improves the speed of data retrieval operation on a database table. However, it should be noted that it requires additional time and storage to space to build and maintain index data structure. B+ tree, which has many good characteristics, is one of the most popular indexing methods in conventional relational DBMS. Let's take a look at B+ tree. B+ tree is a set balancing tree data structure, as you can see. Self-balancing is an important feature because search performance become inefficient when the shape of trees is skewed. B+ tree keeps data sorted and saved only on leaf nodes, and it allows searches, sequential access, insertions, and deletions in logarithmic time. When it is used in file system, it has primary benefit to reduce the number of disk I/O operation, which is the major bottleneck of DBMS performance. For that reason, most relational DBMS uses B+ tree as the primary indexing method. However, B+ tree is only applicable to conventional data types such as number or string. In other words, it cannot support more complex data types such as Spatial data which means that in order to deal with spatial data in DBMS, we need a different index method. So far, we have discussed the features of DBMS which can support data management better than any other framework, including transaction management of ACID, scalable resolution to large data set, security, and SQL as a standard language for data management. Thanks to such wonderful features, a framework of application development could be changed. All the issues of data management can be handled in DBMS, and SQL provides data access for applications, which allow to remove data redundancy and inconsistency which often occurs in flat file systems. Additionally, development time can be significantly reduced because application developers have no concern about data management issue. In the figure, the upper represent application development without DBMS. Each application has its own file system and datasets with duplication. In the example, data 'B' is the case of duplication. On the other hand, the lower is application development with DBMS, where query system presents a window of data read and write. The application framework with DBMS is definitely applicable to spatial data, which I mentioned it as Server GIS in the second week of this course. If someone ask me, what is the most important layer of modern IT world? I would like to say DBMS. Certainly, DBMS cannot function by itself. For example, Internet connection, high performance computers, data storage, operating system, and many others systems are all important to shape the IT world. However, DBMS is still, I believe a backbone of modern IT world. Think about Financial systems, everyday, 100 billion dollars cross the borders by wire-transfers. Billions of credit card transactions are being made every day. Stock market, regular banking, many, many others. Faultless and flawless DBMS with all the features that we discussed enables such gigantic of transactions with full credibility. You believe or not, you are connected to DBMS everyday, very often. Whenever you use your smartphone, even when you just carry on your smartphone, when you use your computer, and email, and so on. [SOUND] Now, it's time to summarize this lecture. In this lecture, we have discussed on what DBMS is and its wonderful functionalities. For using DBMS, data modelling is important so that we overviewed three steps of data modelling. Query language and indexing methods for data management were also discussed. However, it should be noted that conventional relational database can only deal with number and strings, just conventional data types. What do you need to do in order to handle spatial data in DBMS? That's the topic for the next lecture. Alright, hopefully, I can see you all in the next lecture.