So, hi. In the previous course, we saw examples of different data models and talked about a few current data management systems. In this module, we'll focus on data retrieval. Data retrieval refers to the way in which data desired by a user is specified and retrieved from a data store. Note that in this course, we are using the term data retrieval in two ways. Assume that your data is stored in a data store that follows a specific data model, like for example the relational data model. By data retrieval, we will refer to, one, the way you specify how to get the desired data out of the system, this is called the query specification method, and two, the internal processing that occurs within the data management system to compute or evaluate that specified retrieval request. While query specification can apply to small data stores or large data stores, we'll keep an eye on the nature of query evaluation when the data is big. Further, we'll consider how the query specification changes when we deal with faster streaming data. A query language is a language in which a retrieval request is specified. A query language is often called declarative, which means it lets you specify what you want to retrieve without having to tell the system how to retrieve. For example, you can say, find all data from relation employee where the salary is more than 50k. Now, you don't have to write a program which will tell the system to open a file, skip the first 250 bytes, then in a loop pick the next 1024 bytes, probe into the 600th byte and read an integer, and so forth. Instead of writing such a complicated procedure, you just specify the data items that you need and the system does the rest. For example, SQL, structured query language, is the most used query language for relational data. Now, in contrast to a query language, a database programming language like Oracle's PL/SQL or Postgres's PgSQL are high-level procedural programming languages that embed query operations. We will look at some query languages in detail and show examples of database programming languages. The first query language we'll look at is SQL, which is the ubiquitous query language when the data is structured, but has been extended in many ways to accommodate other types of data. For this course, we'll stick to the structured aspect of the language. Now, you should know that SQL is used for classical database management systems like Oracle as well as modern Hadoop style distributed systems such as Spark. Now, we will work with an illustrative example. First, we need to define the schema of the database. Now, think of a business called the Beer Drinkers Club that owns many bars, and each bar sells beer. Our schema for this business has six relations of tables. The first table lists these bars, the names, addresses, and the license number of the bar. Notice that the attribute name is underlined because it is the primary key of the bars relation. Recall that the primary key refers to a set of attributes, in this case just the name, that makes a record unique. Note that the relation bars with the attribute name within parenthesis is the same as the table shown on the right. We will use both representations as we go forward. The second table called Beers, this is the names and manufacturers of beer. Now, not every bar sells the same brands of beer, and even when they do, they may have different prices for the same product because of differences in the establishment costs. So the Sells table records which bar sells which beer at what price. Now, our business is special. It also keeps information about the regular member customers. So the Drinkers relation has the name, address, and phone of these customers. Well, not only that, it knows which member visits which bars and which beer each member likes. Clearly, the Beer Drinkers Club knows its customers. The most basic structure of an SQL query is a SELECT-FROM-WHERE clause. In this example, we're looking for beer names that are made by Heineken. So we need to specify our output attribute, in this case the name of the beer. The logical table which will be used to answer the query, in this case, Beers. And the condition that all the desired data items should satisfy, namely, the value of the attribute called manf is Heineken. Now, there are few things to notice here. First, the literal Heineken is put within quotes, because it's a single string literal. Remember that in this case, the string is supposed to match exactly, including the case. Secondly, if you go back to the data operations discussed in course two, you will recognize that this form of query can also be represented as a selection operation on the relation Beers with a condition on the manf attribute, followed by a projection operation that outputs the name attribute from the result of the selection operation. So the selection operation finds all tuples of beer for which the manufacturer is Heineken, and from those tuples it projects only the name column. The result of the query is a table with one single attribute called name. We illustrate some more features of SQL, using two example queries. The first looks for expensive beer and its price. Let's say we consider a beer to be expensive if it costs more than $15 per bottle. From the schema, we know that the price information is available in the table called Sells. So the FROM clause should use Sells. The WHERE clause is intuitive and specifies the price of the beer to be greater than 15. Now notice that the Sells relation also has a column called bar. Now, if two different bars sell the same beer at the same price, we'll get both entries in the result. But that's not what we want. Now regardless of the multiplicity of bars that have the same price for the same beer, we want the result just once. So this is achieved through the SELECT DISTINCT statement, which ensures that the result relation will have no duplicate. The second example shows the case where more than one condition must be specified by the result. In this query, the business must be in San Diego and at the same time it must be a temporary license holder, which means the license number should start with 32. As we see here, these conditions are put together by the AND operator. Thus, the query will pick the third record in the table because the first record satisfy the first condition and not the second condition. In a few slides, we'll come back to the evaluation of this type of queries in the context of big data. Now, remember, one can also place a limit on the number of results to return. If our database is large, and we need only five results, for example, for a sample to display, we can say LIMIT 5. Now, the exact syntax of this LIMIT clause may vary between DBMS vendors.