Welcome to unit five, in which we're going to talk about the parsing logic. In the previous unit, we talked about parse trees, and we saw some example of English parsing and Geck parsing. And I pointed out that, I can draw a parse tree using either a diagram like this, or I can also use a XML format, some agreed upon way to describe the structure of a given input. The syntactic or the grammatic structure of the input. And the question of course, as I pointed out is, how do we do it? How do we actually start with some obscure input, and end up with such a nice and well structured description, of what is going on in this input, of course from a linguistic standpoint. Well, I'd like to kind of fast forward, and show you the actual code that we're going to use in order to carry out the parsing. We're going to develop, at a later point in this module, a program for the Parser. Actually the name of this module is going to be CompilationEngine. And this CompilationEngine is going to consist of a set of methods, one method for almost for every non-terminal rule, in the corresponding grammar. So, for example, if you look at the grammar, you will see that we have a rule called statements, plural. And indeed, we also have a method called compileStatements. Then we have a rule called ifStatement. And indeed we have a method called compileifStatement, and so on and so forth. For most of the non-terminal rules in the grammar, we have corresponding methods that are responsible for parsing the part of the input that corresponds to this particular rule. There are some rules for which we don't have specific methods, but we also take care of them implicitly in other rules. We'll discuss this little detail, later on in the module. So to illustrate, let's focus on one of these methods, compileWhileStatement, which corresponds to the whileStatement rule. And let us see how, you know this sub-routine actually works in practice. Now, in general, the method or, I call it sub-routine, is going to be written according to a set in logic. And the logic is taken directly from the rule. So, here is how we do it, we follow the right-hand side of the rule, and we do exactly what the pattern that is documented in the right-hand side dictates. So, for example, if we look at the whileStatement, it says you should expect to see the word while, or token while. And if we find it in the input, we record the fact that indeed we found the token while. And then the rule will say, the next thing that you should expect to see, is the token left parenthesis. So, we look at the input and we check, do we really have a left parenthesis? And if we do, we can record it in the output file, and so on and so forth. At some point, we will get something like expression. So if the right-hand side specifies a non-terminal rule rather than a token, then we call the corresponding method which is designed specifically to deal with this particular non-terminal rule. So as you see, this whole thing is highly recursive. So we continue to recurse, until we exhaust the input, and run out of tokens, and then we know that we have completed the passing of the given input. So how do we get started? Well, we have to advance the tokenizer, and remember, I should have told you that before, I wish to remind you that the input is tokenized. So, I have a tokenizer to my disposal that I can exploit. I can tell the tokenizer, advance, I can ask questions about what is the current token, and so on and so forth. So I advance the tokenizer, and since the first token is while, I know that I can call the compilewhileStatement, and let the parsing begin. So, what we'll do from now on is, I will walk you through an elaborate simulation of how we parse this particular input. So let's see. We call the compilewhileStatement and sub-routine. So this sub-routine starts running, and it records the fact that it starts running by writing the tag whileStatement. And then we look at the rule, when the rule says that the next thing we should expect is the token while. Well, let's see, what is the current token? It is while. So happily, we record the fact that we have a while. We look at the rule again. The next thing we should expect to see, and I'm sorry, after we record the token, we advance the input. I should have said this before. So, then we look at the next part of the rule. And the rule says you should expect to see a left parentheses. Well, we look at the current token, it is a left parenthesis, so we record it, and advance the input. Back to the rule, the rule says now you should expect to see an expression. Well with that in mind, we call the compile expression sub-routine, which starts running. And let's look at the right-hand side of expression. The right-hand side says you should now expect to see a term. Well, a term is a non-terminal rule, so we call compile term, which starts running. And the right-hand side of term says you should expect to see either varname, or a constant. And we look at the current token, indeed, we have a varname. So we record the fact that we have a varname or an identifier called count. And we are done with, and we advance the input. And then we are done dealing with this term. And then we go back to the rule, and we see that we are done with the term, so we are back to the expression. And the expression says, the right-hand side, you should now expect to see optionally, an op followed by a term. Well, we look at the input or be kind token, and see that indeed we have an operator, or an operation there, right? So let's record it. We record the fact that we have an operator, or a symbol. And we advance the input. Going back to the expression definition, we see that we should now expect to see a term. And let's see, so we call the compile term sub-routine which starts running and the logic of the term says we should now expect to see either a var name or a constant. We'll look at the kind of token and indeed we have a constant. So, we call this constant, we advanced the input, we're done with the term, we go back to the expression. We are done with the expression and we go back to the while. So now, we look at the right hand side of the while and it says you should expect to see a right parenthesis. We look at the current token and, indeed, we have a right parenthesis. So we record it, advance the input, go back to the rule, and the rule says you should now expect to see a left curly bracket. We look at the current token, indeed, it is a left curly bracket, so we record it, back to the rule. And the rule says you should now expect to see statements, which is, zero or more occurrences of statement. So, we call the compileStatements method and the compileStatements method starts running and the right hand side says you should look for a statement. And we look at the current token, the current token is a let right so we know right away that we have a let statement. So we call, compile let statement and the right side of the rules says, you should expect to see a let token, indeed we have a let token, so we recalled it. Then you should expect to see a varName, I'm sorry we recalled it in advance. You should expect to see a varName, we look at the current token indeed, we have varName here so we call the fact that we have the identifier count. Then we should expect to see and we advance then we should expect to see an equal sign. Indeed we have an equal sign so we code it in advance and so on and so forth. Now we have an expression that we are processing in a similar way and at some point we'll be done with the let statement. And let's see will be decked to the statements. There will be no more statements to process, so we will go back to the while and we see that once we finish parsing the statements. The next thing which we should expect to see is a right curly bracket, indeed we have a right curly bracket, this is the kind tokens. So we record it, we advance the input, and we're done with the tokens. There are no more tokens to process and therefore, we've finished parsing the input, and we got this lovely XML file here which describes the parsing of this input. So, that's the end of the story, or that's the end of the simulation. And I want to kind of recap and take another look at the structure of this compilation engine module which implements our parsing logic. Once again it consists of a set of sub-routines. One for each almost for every non-terminal rule in the grammar, and once again I said that the handling of some rules. Which are the shallow or less interesting rules are handled in the context of other rules as we'll see later on. Now let's take a closer look at the compileWhileStatement and I would like to actually show you the code of this particular parsing sub routine, so here it is. I'm going to use a private method that I would describe later on called It, so. I look at the grammar, well, let me retract a little bit. What I'm doing now is I'm putting myself in the shoes of the software developer who has to develop this syntax analyzer. And this software developer is going to use the grammar of the language as the kind of dominating recipe for writing the code of the parser. So the parser or the parser's code is going to be informed by the grammar. And in particular if I look at the grammar, I see that the grammar said you should expect to see a while. So I'm going to call a method called eat('while') which means you should expect now to eat the while token. And then I'm going to write some code to handle the while. In particular if it's an XML code, I will simply write the while out with the proper tags and then according to the logic of the rule, I should expect to eat left parenthesis and I will handle it. And then I'm going to compile expression, right, because that's what the rule specifies, or mandates, to do. And then I should eat the right parentheses and then I should eat the left curly bracket, and so on and so forth. Okay, so that's how I'm going to write the compile while statement subroutine. Now what about this eat method? Now, this is a private method, which I'm not going to expose to users of this syntax analyzer, and in particular, the method says you should expect to see the given strength. If you don't see it, you should throw an error and otherwise, simply advance the input. So, by using this logic here, I encapsulate the advancing of input, and I don't have to worry about it. This eat method is going to do it forming. So as you see, the code of each compilexxx method follows the right-hand side of the corresponding xxx-rule. And each of these methods is responsible for advancing and handling all the tokens which correspond to this particular rule. And by having all these methods call it each other recursively I'm going to accomplish the parsing of the given input. Now I'd like to end this unit with some theoretical remarks about the character of these grammars that we are using all the time. And first of all, I'd like to introduce the notion of an LL grammar. Now quite frankly, I'm not sure what LL stands for but I'm quite sure that one of these Ls stands for leftmost. And an LL grammar is a grammar that can be parsed by a recursive descent algorithm parsing algorithm without backtracking. Now what I mean by this is that this grammar is quite friendly. It's friendly in the sense, that once you start to parse something, you never have to go back. When you make a decision that what you have is a while statement or an if statement, you don't have to kind of retract your progress and find out that you made a mistake and you had to pass it In a different way. So this is a very nice property of parses that make the implementation of the parser rather simple. The next terminology that I would like to present is LLK. Now an LLK parser is an LL parser In which we have to look ahead at most k tokens before you know which rule you are dealing with. And luckily, the grammar that we saw so far was LL1. And it's LL1 in the sense of once I have a particular token in my hand, like Y or let or do I immediately know which rule I have to invoke. And I don't have to back track later on. So if a grammar is LL1 then once again it lends it's self to a parcer which are relatively easy to write. Now, when you deal with a language like English, the situation is much more challenging because English is not LL1. For example, suppose I tell you something like lift, well we have a token right lift and yet we still have no idea what we have to do with this token because the next word may well be something like me. Well once we have lift me well then I know I have a verb followed by a noun. And therefore I have a statement which means lift me, verb, noun, everything is now clear. However, instead of lift me, I could also say something like lift operator and if I say lift operator then it's not It's not a sentence. It's just the beginning of a sentence because what I have is an adjective, lift is now serving as an adjective and now operator is a noun and it's not a sentence you know I need to do something with this information further on. So once again you see that in this particular example I had to look ahead two tokens before I knew which rule in the English language I have to invoke. And in general I may have to look more tokens downstream before I know which rule I have and you know maybe I have to backtrack and reconsider some of the rules that I thought, I had to use and so on and so forth. Our brain is trained to do this parsing in a blinding speed and it would have been extremely difficult to write a computer program to do the same. Because English is probably something like LL5 or 6 or 7, you have to look ahead several tokens before you know which rule of the English language you have to use. Whereas, programming languages are mostly LL1 which makes the passing of these languages a far easier task than parsing in natural language like English. All right, so with that, we discussed the logic of the parsing process. And the next thing which we are going to look at is the Jack grammar.