For Expressions look somewhat like an embedded equerry language. The session is going to explore the details of the relationship between for expression and and database queries. The for notation that you have seen in the last session is essentially equivalent to common operations on query language for databases. Languages such as SQL and XQuery, say. To see this, suppose we have a data base of books, which is represented as a list of books. Each book we model as a case class with a title, which is a string, and the authors, which would be a list of strings. here is a mini database that will fit comfortably in memory. So it's just a list of five books and each book has title and authors. What I've done here is actually, I've used name parameters, which you can do in SCALA, so because Book had title and authors, I could have written just Book and then a string and a list. But you can also write Book where title equals the string and authors equals the list. And often that's clearer. So I've picked that here so to make it clear what was the title and what were the authors. So, now that we have this database, let's run some queries over it. First query would be, find the titles of books whose author's name is Bird. So, what a query like that could be written like as you see here. We let b range over books. We let a range over the authors of the book b. And we demand that a starts with Bird and a comma. That would be the author's last name is Bird. So that would be the author's last name is Bird. And for each of these books we want to produce the title of the book. Or to find all books which have the word Program in the title. One way we could write that is, again, we let b range over the books. And we asked where the b.title has the word Program in it. A way to achieve that would be to use Java's indexOf function, which produces the index of the substring if it appears at minus 1. If it doesn't. For all the books where this condition is true, we yield again, the title. So let's do one of these queries in the worksheet. I have here the first kind of query. I demand now all the authors whose name starts with Bloch and if I run that then what I would get back here is the two books in my list that Joshua Bloch has written, Effective Java, and Java Puzzlers. A slightly more involved query is this one here. We want to find the names of all authors who have written at least two books that are present in the database. So a way to do this would be to actually be have two iterators ranging over the books by database. So we let b 1 range of our books and b2 range of our books and we demand that b 1 and b 2 are different. Now we have pairs of different books. And we let a1 and a2 range over the authors of these pairs. And if we find a match, so if you find an author that appears in the authors list of both b1 and b2 then we find an author that has published at least two books so we give it back. If we do that, then I have put the query that you see on the slide up here. Let's see what the result is. While we get Joshua Bloch, that's fine but, we actually get him twice. So, why do solutions show up twice? The reason is that we have two generators that both go over books so, each pair of book will show up twice once with the argument swapped. So, for instance here with the books, we would have one pair that would read Effective Java that was one of the books just showed. And the other would create Java Puzzles and we would have another pair where the two were swapped. So that's why we get the same couple of books In two pairs and why the solutions show up twice. How can we avoid this? Well one easy way to avoid that would be to say instead of just demanding that the two books are different, we demand that the title of the first must lexicographically smaller than the title of the second book. So, that would mean that in our previous one, we would get Effective Java and Java Puzzlers, as before, but we wouldn't get the pair in reversed orders, because in lexicographical order, Effective Java comes before Java Puzzlers. Let's see what happens if we do that change. So, we say b1.title less than b2.title. And what we get array is indeed just a single author. But are we done yet? A question for you. What happens if an author has published three books? Is the author still printed once as we desire? Or is it printed twice or maybe three times? Or maybe the author is not printed at all? Make your choice. To find out what the solution is, let's add a third book that's also published by the same author. Effective Java 2. And see what would happen. Well, we see that the same author is printed three times. Why is that? Obviously the problem now is even with this added condition, we have three possible pairs of books. So if you have a book, B1, B2, B3, all published by the same author, then you have three possible pairs of two books out of these threes, and for each of the three possibilities the same author will be printed. So the author is printed three times. What can we do about this, how can we avoid printing the author several times by one solution would be to remove duplicate authors when result is twice or several times. There's a function for this it's called distinct it works on all sequences. We will simply remove duplicate elements from the sequence. Keep the first one, remove the other ones. So one thing we could do is we could take the query that we've seen here, put it in braces or parenthesis and call distinct on the result set. And that would do the trick. On the other hand, maybe these problems are a sign that we started off with the wrong data structure. Remember that we have written a database as a list of books. In actual databases actually the order in which the rows in which the books appear shouldn't matter. So databases are much more sets of rows than lists of rows, and sets have the advantage that duplicates our eliminated by design. So let's try this. Let's make books a set of rows, and then yes, indeed. You will see that the results at consist again of a set of just a single author, like what we wanted. Okay, good.