Now, first we're going to look at the process of simulating the operation of an NFA. And in order to do that, we have to look at the representation. so for state names we're just going to use the integers from zero to N and those are just indexes into the regular expression strain with one extra one for the except state. So, if RE has M characters, we've got M plus one states. and then, we forget the match-transitions. so for with this, we'll just keep the regular expression in an array. and then for characters that are not metacharacters that are just characters from the alphabet there's match-transition just to the next element just to add one, so that's an implicit representation of the match-transitions. and then the next thing is the epsilon transitions. So, since there might be multiple edges leaving from any given state and it is directed, it always says, go from this state to another one we'll use directed graph. So we just have a bunch of edges with given vertex names. And that's convenient. That's what we use for our graph processing algorithms, was indexes for vertex names. so, this is totally well-suited to building a digraph using the basic digraph processing methods that we talked about when doing graph processing. so, that's our representation and then given that representation it's very, you know, straightforward to find out for any state what are the possible next states and if it's got a character in a possible next state is to match a character in angle up one otherwise it's the vertices that they are pointed to by the edges and its adjacency list does have epsilon transitions. So, let's take a look at just given that basic idea, that we can always get to the transition from any given state how we're going to do a simulation? and so, the idea is summarized by this diagram here. what, what we're going to do is at every step each text character that the machine does read, we're going to keep track of the, the set of all possible states where the NFA could be after reading in the i, I text ch aracter. So, say, we've done that and, you know, there's this is just a schematic. So, there's a bunch of states that we could get to after reading i characters. Now in some of those states they, they are labeled with characters. and if that character matches our text character, they can take us to another state. So, say, in this case, there is three of them that matches and the other one has some other character or different character and it couldn't take us to another state. So, that gives us a all the possible states that you could be in just after reading the i plus first symbol. And now, what we do is take a look at the possible null transitions that we could go to from those states. And that might take us to lots of other states. but that's all the machine could do. It could read a character. And then it can take a bunch of null transitions. and but that's all it knows how to do. And that will give us all the possible states that it could be in after reading i1 plus one symbols. And then, we just continue in that from every one of those states. some of them match the character match the character, and then, look at all the epsilon transitions. So, that's the basic idea is to simply keep track of all possible states the NFA could be in after reading the first i text characters. so the only thing that's complicated is how do we get all the states that we could reach by epsilon transition. and we'll look at that in just a second. It's actually very straightforward. But let's look at a demo first. so this is our machine for A<i>B or A, or AC followed by D. and let's suppose</i> it has this input. So, let's do this demo. Okay. So, we're going to, to check if the input is matching the pattern. we're going to start the machine in state zero and so the zero is one of the states, you could reach from the start. and then now we want to get to all the states that you could reach by epsilon transitions from the start. and so there's a bunch of them so we could go to one and from one, we could go to either two or six. from two, we could go to three. From three, we could go to two or four. so we could get to all those places from the start without scanning a single character. So, zero, one, two, three, four, and six the machine could be in any one of those states before it scans a single character. so, that's where it could be after zero characters. But what about now after the first character? Well, out of those states only two and six involve matching an A. So the only thing that could happen next to read, the machine could do next, is to read the A in either state two or state six. I know there's going to be match-transitions. So in the first case, it goes to three and, and the second case, it goes to seven. So, the stead of states, it can be reachable just after matching the A, is just three and seven. But now from those states it might get somewhere with epsilon transitions. We have to look at the epsilon transition graphs and what states could it go to from epsilon transitions from these two. well, from three, seven has nowhere to go, and from three it could go either to two or four. from two, it could go back to three so the total number states that it could be after matching A is two, three, four, and seven. and so that's one step. We've matched one character, and we have kept track of all possible states the machine could be in after matching that character. Now, what about the next character? You can see, these four states take us all different ways. You could have an A, a B, or a C next, and it could get somewhere. But it happens, we have an A, and the only one of these states that matches an A is two. So, after matching the second A the only place it could get to is three. and so that now, we have only one state to work with. But where could we get with via epsilon transitions, from three? and so, well, you can go from three to four or two. Well, we did this before. And then, from two back to three. So, we could be in two, three, or four after matching two As. so that's all the possible states we could be in after mat ch, after matching the two As. now, what's next is B, only state four matches to B, so the only place we could be right after matching the B is in state five. And that's after matching the B, now what about epsilon transitions? Well, state five has an epsilon transition to state eight. so we could take that one and eight has got one to nine. So it could be an either five, eight, or nine after matching AAB and then following all of the epsilon transitions. but its really important to keep in mind, there's no other state the machine could be and it doesn't have any other way to get to after matching AAB it couldn't get state seven or six or two. those are the only possible states it could be in. since we're, each time, progressing through the input we're making progress to the end for sure. So now to finish up this example the only state out of those three that matches D is state nine. and so, that's a match-transition that reads the D. and the only place you could be after matching AABD is state ten. and then now, we follow epsilon transitions. and that Epsilon transition could take us to eleven. so, the only place that machine could be after matching ABD is ten or eleven. Now, our condition on whether we accept the string or not is whether we could get to the accept state, and in this case, we could. it is possible for the machine to get from zero to the accept state and read all the input characters so that simulation is approved, that the machine accepts the input AABD when simulated its operation, all possible ways and we managed to find the accept state. Of course, if we had tried some input that the machine doesn't recognize, we'd get stuck somewhere and either not get through the input or have no possible states it could be in. and that would be a proof that it does not accept since we try all possibilities. So, the only thing that's complicated in this computation is reachability. but actually from our study of digraphs this is the, one of the simplest problems that we discussed. what we really discussed was single source reachability. That is given the source, can you find all vertices that are reachable from that source? And that was Depth Firts Search. so we have very simple depth first search but also in that API we put vertices reachable from a set of sources, so an iterable of sources and so can we get the, the, what we really need is all vertices reachable from a given source from its source set of vertices. and it's easy using DFS. You just run it from each of the sources and you don't unmark the ones that you get to and that stops DFS from revisiting any vertices and it marks all the vertices that you could get to from that set. So, its just a simple extension of our DFS implementation. it's going to run in time proportional to the number of edges plus number of vertices in the digraph and it's a very simple computation digraph reachability. so given that capability which we discussed in graph processing the implementation of in a, in the NFA stimulation is very straightforward. so this is a, a data type that implements the NFA. so, we have a constructor that takes a regular expression as its argument. and it's going to build the NFA and its got a, a method to build the digraph that we'll talk about later but, but its also got a, a client method recognizes where the client can after the NFA is constructed, it can that they could text string and return true or false by simulating the operation. so we'll take a look at the building the digraph in a second but the one that we're talking about now is simulating the operation of the NFA once it's built for a given text string. and this is the complete code and it's expresses in code what we talked about in English during the demo. it's amazingly compact implementation of this idea of simulating the operation of an NFA. So we keep a, a bag of integers or set of integers called PC, that's kind of like program counter. So, that's the set of all possible states that the NFA could be in. so and we build a DFS from our epsilon transition graph. so the first thing that we do is do a, a, a for we put on to the, the PC all the states that you could get to from state zero. so that's build a DFS that marks all the states you could get to from state zero and then go through and put all of the states on, on to the PC. So, that's our starting point, is all the places that you could get to via epsilon transitions from state zero. So now during the execution of this for loop PC is the thing that we have all the states that you could reach after scanning past the ith character, so we initialize for the 0th character. And then all we do is for everyone one of those states well, first thing we do is test if we reach the accept state. If we reach the accept state we're going to have nothing, we're, we have nothing left to do. if we have a match, that is, if the a character in the RE if that state position matches our ith text character, then we're going to keep another set of possible states called match. So, those are the ones you could get to just after matching a text character. And so, we'll just add V. plus one. So, if we find a matching character at V, we just go to V plus one, that's the implicit match-transition. and here, we'll throw in, don't care, cuz it's so easy to do if the regular, regular expressions says, I don't care what character matches then throw that one to. so this just adds don't care to the implementation. and so we do that for all states in our PC in the states we can, can be in just before, just while looking at the ith character. If we have a match then we put the next states into a match. So now, what we have to do is follow the, epsilon transitions from match. So now, we build another DFS which gives us marks all the states that you could reach by starting with any of the states in match. and now when we build a new PC, we'll just set PC to a new, new bag, and put all the marked states for that search. So, those are all the ones that you could get to via epsilon transitions right after a match, and put that in the PC. Now. we have a new PC. and we've r ead the ith, character and we go to the i plus first character, and simply continue. so, when we're done with all of the text then we can test if we made it to the accept state. that is, if any of the states in our current set of states is the accept state, we return true. and if we didn't get to the accept state after all of that, we return false. A very compact code that takes advantage of our implementation of reachability for digraphs in a, a reasonable way to simulate the operation of an NFA by keeping track of all reachable states. so, what about the N analysis? Well it's easy to see that the not difficult to see that if our text is N-character and we have an M-character pattern. the worst that can happen is that we take time proportional to M times N. and that's just because for everyone of the characters we have a set of states, a set of all possible states. And we iterate through that set of possible states a few times. we even run DFS on it, but that's linear. and there's the extra point that the number of edges that we have is less than 3M. No node has more than three edges leaving it. So the total amount of time for each character is proportional to M and there's N-characters, so there are any proportional to M times N. So, that's the, the cost of the simulation. Of course a lot of times, the size of the set of states is much smaller than that. so in many real world problems, it's closer to linear. that's NFA simulation.