theorem, but it can take exponential time. and you can try it yourself in Java or

Perl, if you try to look for a match for this regular expression in a string like

that, you add just a couple of characters to the string then running time will

double. again, going back to our algorithmic performance lectures. if

that's what happens, if I just add one thing and my running time doubles that

means I have exponential time. and so if I'm using a facility that has one of these

exponential time implementations then, this is, what's called a SpamAssassin reg

ular expression. it's going to take exponential time on certain e-mail

addresses and, somebody can create trouble by sending such addresses to a mail server

the mail server will take exponential time just trying to determine if it's spam or

not. and by having an inefficient algorithm, a server like that is

definitely subject to such a tax. generally, you don't want to have

exponential algorithms you particularly don't want to have exponential algorithms

if you have arbitrary clients out there that can cause trouble for you. a another

pitfall is things that kind of look like regular expressions but aren't actually

regular expressions. So it's common to have the backslash one notation. So that's

supposed to match the self expression that was matched earlier. So this thing dot

plus back slash one is a string that is two copies of something, like couscous or

with at least one character. and this one is a similar one. This one actually is

number of ones is not prime. it matches, so you can write pretty sophisticated

computational thing just with these little references. but there's a problem and it's

a fundamental problem, is that . that kinds of languages of the sets of strings

that you specify with such notation are not regular. That is, like I said, it's a

point you can't write a regular expression to specify. and so these are examples like

strings to the form ww For sum W, you can't write a regular expression that,

will recognize, that will describe all such strings. There are even scientific

applications like complemented palindromes. So that's like a palindrome,

but also complements a's to t's and c's and g's. So that's another, example. And

if they're not regular, then Kleene's theorem doesn't hold. there's no NFA that

corresponds to them, and in fact we're, we'll talk in a couple of lectures about

intractability. in fact, nobody knows an efficient method for guaranteeing

performance when you have back references. so, if you're allowing back references

you're pretty much subject to performance attacks like the one we just mentioned.

and that's where, we'll will finish up this lecture. this is an amazingly

elegant, and efficient, and ingenious algorithm, and it's based on the theory of

computation that has been studied since the 1930's and it's the basis for much of

the programming, programming languages that we do. it's really worth

understanding this theory and Grep is a wonderful example of why it's important to

understand theory like this. it really it takes us to the next level of algorithms

and that's writing programs that translate a program to machine code. actually, what

we've looked at, both for KMP and for Grep are examples of compiler, they're simple

examples of compiler. for KMP, we took a string and we built a DFA. For Grep, we

took an RE and we built an NFA. and all Java C does is take Java language code and

translate it to a byte code. Now, it uses more complicated theorems and laws from

theory of computation to get the job done, because a java program is also not

regular. But, it really is worthwhile thinking about the progression from KMP to

a Java compiler. Whereas in Java compiler, you have a program, now, you want to

compile of and know if it's legal you want to get byte code and then you've got a

java simulator, and that's not so substantially different from what we just

did for Grep in a stage along the way. that's quite interesting to see this kind

of context to get us such a practically useful and important algorithm. so the

summary is we did some from the standpoint of the program, we implemented substring

search by simulating a DFA and we implement a regular expression

pattern-matching by simulating an NFA. from theoretician what's interesting about

these abstractions is that NFAs and DFAs and RE are equivalent in power. if you can

describe a set of strings for one of them, you could describe a set of strings with

any of them. so that's interesting. And also interesting is that there's

limitations, there's sets of strings like palindromes and so forth, that you can't

specify with a DFA or, o r an NFA. but from a student in an algorithms class it,

it really is you know, worthwhile to see that these principles are not just theory.

they're a practical use and this is an essential paradigm all throughout computer

science. you know, we want to find an intermediate abstraction that we can

understand and make sure we pick the right one, and then we pick clients that can use

it in a real implementations that can implement it. And by building the right

intermediate abstraction, we can solve some important practical problems. those

are the lessons that Grep teaches us.