In a prior course we looked at JSON as an example of semi-structured data, and we demonstrated that JSON data can be thought of as a tree. In this course, we'll focus on querying JSON data. Before we start, lets review the details of the JSON structure, and get an initial sense of how to query this form of data. Let's consider a simple JSON collection and look at the structures, substructures actually, it's composed of. The atomic element in the structure is a key value pair, for example, name is the key and sue is the value, in this case, an atomic string value. To query a key value pair, we should be able perform one basic operation given the key, return the value. Now, the value can also be an array, an array is a list. So the query operations on it can either be on its position in the list or on the value. Thus, we should be able to ask for the second element of the array called badges. Or we should be able to seek objects of which the key called badges has a value of blue. Notice here that the document collection here is itself an array, within square brackets and it's just two elements in it. The top level array does not have a key, by default it's called db. These key value peers are structured as tuples, often with a name In the snippet shown, favorites has a tuple of two key value pairs. Now, tuples can be thought of as relational records, as the operations would include, projection of an attribute, and selection over a set of tables. On the other hand, the area called 'points', has two tuples, these two tuples named. As you will see, we'll address these tuples by their positions. Finally, this one has nesting, that means a mini structure can be embedded within another structure. So we need operations that will let us navigate from one structure to any of it's embedded structures. Now just like a basic SQL query states, which parts of which records from one or more people should be reported, a MongoDB query states which parts of which documents from a document collection should be returned. The primary query is expressed as a find function, which contains two arguments and an optional qualifier, there are four things to notice in this function. The first is the term collection, this tells the system which document collection to use, and therefore is roughly similar to the From clause when restricted to one table. So if the name of the collection is beers, the first part would say db.beers.find. The second item is a query filter which lists all conditions that the retrieved documents should satisfy, so it's like a Where clause. Now, if we want to return everything, then this filter is left blank. Otherwise, we'll fill it in a couple of ways shown in the next few slides. The third term is a projection class which is essentially a list of variables that we want to see in the output. The fourth and last item sits after the find function ends and is separated by a dot, it's called a cursor modifier. The word cursor relates back to SQL where cursor is defined as a block of results that is returned to the user in one chunk. This becomes important when the set of results is too large to be returned all together, and the user may need to specify how much, or what portion of results they actually want. So, we'll start out with a few queries, where we show how the same query can be expressed in SQL, and in MongoDB The first query wants everything from Beers. The SQL query is structured on the table Beers, and the SELECT * asks to return all rows. In MongoDB the same query is more succincted, since the name of the collection is already specified in calling the find function, the body of the find function is empty. That means there are no query conditions and no projection clauses in it. The second query needs to return the variables beer and price for all records. So the find function here needs an empty query condition denoted by the open and closed brace symbols, but the projection clauses are specifically identified. There is a 1 if an attribute is output and a 0 if it is not. As a shortcut, only variables with 1 are required. Okay, so when do you use 0? A common situation is the following. Every MongoDB document has an identifier named _id. By default every query will return the id of the document. If you don't want it to return this designated attribute, you should explicitly say, _id:0. Next we will add query conditions. That is the equivalent of the Where clause in SQL. Our query number three has the query condition where name is equal to a value. In MongoDB, that equal to translate to a variable colon value form. Notice the symbol used for a string is quotes. Query four is more interesting for two reasons. First, we see a way in which the distinct operation is specified. Notice here that the primary query function is not find any more but a new function called distinct. As we'll see later again in our slides, MongoDB uses a few special query functions for some of the operations. So, you need to know, which function should be used in what context, when you write MogoDB queries. Secondly, in this query, we have a non-equality condition, namely, the price is greater than 15. This example shows MongoDB style of using operators in a query. It's always variable: followed by MongoDB's name for the operator, and then the comparison value. So where would you find MongoDB's operators? Here are some of the operators supported in MongoDB. These operators and others are listed in the URL shown at the bottom. The operators shown here are color coded. The top blue set are the comparison operators. We see the $gt, greater than, operation that we used in the last slide. The green colored operations are array operations which we'll see shortly. And the yellow operators at the bottom are logical operations that combine two conditions in different ways like the AND operation we saw in SQL. Now, the last operator $nor, is interesting, because it is used to specify queries when neither of two conditions must hold. For example, find all beer whose name is neither bad nor is the price less than $6 per bottle. Now I would strongly encourage you to play with these operators in your hands on session. Now I'm sure you remember the like query in SQL. MongoDB uses regular expressions to specify partial string matches. Now some of you may not know what a regular expression is. Let's first use some examples. The first example is the same query we saw before when we're asking for beer manufacturers, whose name has a sub string A-M in it, so A-M can appear anywhere within the name. To do this, the query condition first states that it is going to use a $regex operation. And then we have to give the partial string as /am/. Then it gives the directive that this match should be case insensitive by placing an i after the partial string. And if we just wanted to do names we would stop right after the find function. But here we also want to do a count, which is a post operation after the find, so we use .count at the end of the find function. Now, what if we have the same query, but we want the partial string A-m? To appear at the beginning of the name and you'd like the A to really be a capital letter. In this case we use the caret sign to indicate that the partial string is at the beginning of the name. Naturally we also drop the i at the end because the match is no longer case insensitive A more complex partial string pattern will be a case where our name starts with capital A-m, then has a number of characters in the middle and ends with corp. So for the first part, the string pattern is ^Am. For the second part, that is, any character in the middle, we use dot to represent any character, and star to represent zero or more occurrences. For the third part, we say corp but put a dollar at the end to say that it must appear at the end of the string. The regular expression pattern is a sub-language, in itself, and is supported by most programming languages today. We will refer you to the following URL to learn more about it. Also, note an example that, instead of saying, find.count, we can directly use the count function, natively defined in MongoDB. One important feature of JSON is that everything contain arrays, as a type of collection objects, this enables us to query arrays in multiple ways. One of them, is to consider an array as a list and perform intersection operations, the first query shows this. The data item is shown on the right. It has the area value attribute called tags with three entries. The first query asks if two specific strings belong to the array. In other words, it wants to get the document whose tagged attribute intersects with the query supplied array. In this case, there is an intersection and the document is returned. In the second case, it is asking for a documents who's tags attribute has no intersection. Now notice the $nin operator so there is no intersection with this list. So in this document there exists and intersection so nothing will be returned. A different kind of array query uses the positions of the list elements and wants to extract a portion of the array. This is illustrated in the third query which asks for the second and third items of the array. To encode this in MongoDB, we use the $slice operator which needs two parameters, the number of variable limits to skip, and the number of variable limits to extract after skipping. In this case, we need to extract items two and three, so the skip value is one and the number of items is two, thus returning summer and Japanese. Now, we could get the same result if we pose the query using the last statement. In this case, the minus says that the system should count from the end and the true says that it should extract two elements. Now if we omitted the minus sign, it will come from the beginning and fetch the first two elements. Finally, we can also ask for a document who's second element in tags is summer. In this case we use an array index tags.1 to denote the second element. Compound statements are queries with multiple query conditions that are combined using logical operations. The query shown here has one condition, which is the and,are in terms of MongoDB, the $and of three different clauses. The last clause is the most straight forward, it states that the desired item should not be Coors. The first clause is an or, that is, a $or, between two sub-conditions, A the prices either 3.99, or B it is 4.99. The second clause is also an or of two sub conditions, A the rating is good, and B the quantity is less than 20. This query shows that the $and and the $or operators need a list that is an array of arguments. To draw a quick comparison, here's the example of the same query imposed with SQL. Now, an important feature of semi-structured data is that it allows nesting. We showed three documents here, where there is an area named points, which in turn has two tuples with the elements points and bonus. Let's assume that these three documents are part of a collection, so they form three items in an area called users. Our goal is to show how we can write queries to extract data from these documents with nesting. The first query wants to find documents for which, the value of points should be less than or equal to 80. Now, which ones? Now, points.0, refers to the first tuple under the outer points, and points.0.points, refers to the first element of that tuple. Clearly, only the second documents satisfies this query. Now, what happens if we have the same query but we drop the zero? Now, we are looking for points.points without specifying the array index. This means that the points element in any of the tuples should have a value of 80 or less. So now, the first and the second document will satisfy the query. We can put multiple conditions as seen in the third query. It looks for a document where the points element of a tuple, should be utmost 81, and the bonus should be exactly 20, and clearly the second document qualifies. But does the third document qualify? In this case, the first tuple satisfies points greater than 81 and the second tuple satisfies bonus equal to 20. The answer is no, because the comma is treated as an implicit and condition within the same double, as shown in the yellow braces. Now remember that we said in course two that all semi-structured data can be viewed as a tree. Now, what if I pick a node of a tree and ask for all the descendents of that node? That would require the system to recursively get chideNodes, over increasing depth from the given node. Unfortunately, at this time, MongoDB does not support recursive search.