Hi. Last week you could identify the requirements of a typical analytical system and you have proposed a NoSQL database to have security, concurrency and excellent performance and analytical queries on a possibly graph database. Are you ready for the next use case? Remember that you proposal must conform to what it is request, no more no less. Remember what you learned in course three? How do you ensure that the data remains direct and complete even when things go wrong internally? How do you provide consistently good performance to clients even when parts of your system are degraded? How do you scale to handle an increase in load? Where does a good API for the service look like? There are many factors that may influence the design of a data system such as skills and experience of the people involved, legacy system dependencies, the timescale for delivery, your organization's tolerance of different kinds of risks, regulatory constraints, etc. Those factors depend very much on the situation. Let us start with a use case of these week. A book store chains called Your Best Readings has realized that book sales have not increased as expected. The company wants to answer quickly the following queries in order to take decisions. What is the most popular book? What are the books sold together? What is the least sold book? How can we increase book sales? Can we raise the average ticket by selling products in combo? Remember that all these allow to know what happened. In the case of data mining, there are plenty of task and techniques that can help to establish descriptive, prospective analysis by the data patterns of models that allow to know why things happened and what is next. HDFS stands for Hadoop Distributed File System, and it is assigned to reliable store, very large files across machines in a large cluster. It is inspired by the Google file system. Now I show a basic architecture of Hadoop Distributed File System. Each data node contains information that can be duplicated to implement fault-tolerance. Name Node coordinates the access to Data Nodes. The bookstores have massive sales transactions. So, they need to manage a high volume of data from the sales tickets. If they store all this information, they could answer these important questions. As we have learned in course two, in the case of a OLAP applications, useful data were store on a database manager because data was structured. Nowadays, analytical applications require to integrate a lab when TB and social network data. Let's remember, the OLAP characteristics and the OLAP architecture. However, the bookshelves are distributed across the country and the CEO has decided to greater distributed repository. Check the concepts of course two volume, very very large amounts of data are on petabytes or exabytes, one billion billion or one quintillion. Variety, heterogeneous semi-structured or unstructured data. Velocity, dynamic think of the Web and Facebook. Veracity, trust in its quality real-life data is typically dirty. This means it's not a cure. He's planning to store tweets, blogs, or Whatsapp messages to recommend the most appropriate books to customers and increase sales. Remember that Apache Hadoop has become a core component of the enterprise data architecture as a complement to exist in data management system. Accordingly, Hadoop is assigning to easily interoperate so you can extend your existing investments in applications, tools, and processes with Hadoop. Here I will show the main components of the Hadoop ecosystem. Suppose you have two data sources with information like this. Let me remind you that there are a number of data mining tasks such as classification, prediction, time-series analysis, association, clustering, summarization, etc. All these tasks are either predictive data mining task or descriptive data mining task. A data mining system can execute one or more of the above the specific task as part of data mining. What solution would you bring to the company? The student should submit this proposal and justifications of the following elements. One, typical architecture according to the type of information system. Two, kind of database or repository to implement and its design, ETL process if any, etc. Three, answers to the queries required. Four, how does it support scalability? Five, how does it support maintainability? Six, how does it support security and reliability? With the database system specialization course, you have developed the skills that will allow you to analyze requirements, propose, design, justify, and develop the following information system. An OLTP system using a relational database manager, an OLAP system using a relational database manager, and also our columnar database manager for better performance. A business intelligence system using SQL and data mining techniques, and also Big Data management using a Hadoop framework and NoSQL information systems. All these, like data-intensive applications, supporting reliability, scalability, security, and maintainability. I hope all these skills will help you to increase your professional performance.