To begin, we're going to start with rigorous definitions of our cast of characters, which are classes of problems. We want to classify problems in terms of efficiency. So, to start out, we'll have the concept of a search problem. You're going to call a search problem any problem for which there's an efficient algorithm to certify a solution. I want to say, at least, if you have a problem and you've given a solution, you want to know that you've got a solution. So, for example, the travelling salesperson problem. We have a set of cities distances and threshold in. If somebody gives you a solution, which is a permutation of the cities that the order in which you're supposed to visit them. Then you can easily check by just adding the distances to make sure that the total is less than M. So, that's just a linear time algorithm. As long as the algorithm that certifies solution is guaranteed polynomial time, that's a search problem. We got a problem. If we have a solution, we know that we have a solution, because we have an algorithm that can check it. So that brings us to the class NP. We're going to call the class of all such problems, all search problems, we're going to give it the name NP. I'll give you the reason for that name in a minute, but now let's just take that to be the name. So here's example. So just to show that TSP with a set of cities and a threshold value is a search problem, because there's an algorithm that checks that anything that's supposed to be a solution is actually a solution. Integer linear programming is a search problem. Problem is you're given a set of inequalities, and you're supposed to find a binary vector that satisfies those inequalities. And again, if you're given the values of X that are supposed to be a solution, you can plug those values into the inequalities and check each equation to make sure that it in fact holds. And the same for satisfiability. Again, as I did when I was explaining it, given a set of values for a solution, you can plug them into the equations and check that it's in fact a solution. And there's and many, many other search problems. For example, here's an important one factoring. So you're supposed to find a factor of the integer M. Well, M and 1 are factors, so a nontrivial factor besides those two. So you're given a number and somebody says, well, here's a factor. You can just use long division to check whether that's a factor or not. So definitely a search problem is a polynomial time algorithm to check that a purported solution is actually a solution. All problems of that nature are called search problems. And what we think of those is really those are the problems that we would like to solve. And many, many problems like that, where we can precisely formulate what the problem is. And if we have a solution, we know that it's a solution, or we can find out that it's a solution. Very, very significant class of problems, really all the problems that people would want to solve with a computer. Now, with a search problem you always can find a solution by checking all possibilities. We call that brute-force search. So for TSP, you can go through, and try all possible permutations of the cities in order to find one that has a length less than M. And that's cemented for solving the problem for sure. Integer linear programming, you're supposed to find N values. That's a the number of variables that satisfy the equations that you have. Well, if you have in binary variables, each one of them can take on either value 0 once in this two in the end different possibilities. And you have to try those all to find a brute-force solution to integer and linear programming, and similar for satisfiability. For factoring your size of the problem, it's the number of digits in the number, and so number of possibilities. You only have to go up to try the square root of the number. You try dividing everything up to the square root of the number to see if you find a factor. But still, lots of possibilities, but it is a brute-force solution that's possible by checking all possibilities. That's brute-force search. Now the thing is, that we can easily implement articulate brute-force search even as the blogger said, beginning programmer can code one of these things up. But the problem is it's not efficient, it's too slow when N is large. Even when N is moderately large, but if N is huge, like the ones that might appear in practical situations, there's absolutely no hope ubercomputer can't even do it. So now let's look at the class P. That's the class of all tractable search problems. So those are the ones that we can solve, guaranteed, in polynomial time, no matter what the input is. So like sorting, as we mentioned, is fine. And we forgot search result and and merge sort. Or like the three sum problem that we looked at, we have to a find a triple in the numbers of sums to 0. So you just try all triples. And it's only a polynomial number, so there's an efficient algorithm for solving it. And again, solving simultaneous equations are appropriate version of Gaussian eliminational solving. And since the 80s, we've known that even linear programming, there's an efficient algorithm for solving it. So, significance of P, that's the class of problems that we're actually solving. We have guaranteed polynomial-time algorithms. We can make them better. We can solve bigger instances. We're not blocked by this exponential barrier that even the ubercomputer can't address. That's the class P. The class of search problems for which we know efficient algorithms. Now, of course all of these are also in NP. If we can find a solution polynomial time, we can certainly check that it's a solution of polynomial time. So that's the first thing to remember. These definitions of sets of problems can get a little bit confusing, but think back to the basic definition. NP is the one that if you have a solution, you can efficiently check that it's a good solution. P is the ones that you can find a solution in poly time with an efficient algorithm. Okay, so before going on, I just want to mention that there are a number of different ways to characterize these sets of problems. We've been talking about them in terms of search problems, where the goal is to find the solution. This other way is to formulate the problems, like the so-called decision problem is does there exist a solution? And then there's optimization, which is find the best solution. So for some of the problems, and we're going to look at a lot of problems there. There are more naturally formulated in one regime than another. And some authors and researchers work with the problems in different domains. For example, traveling salesperson is usually formulated as an optimization problem. Find the shortest tour connecting all the cities. And I just mentioned this to avoid confusion by considering all of these later on. Technically, these regimes are not equivalent, but the main conclusions apply to all three and you kind of like picking a programming language. You're going to have to pick one, and then develop everything associated with that choice. And so I just mentioned that, because the definitions that we give for P and NP are a little bit different than the classic definitions which usually talk about decision problems. We just find the search problem regime a little more natural to explain because we only have a single lecture to talk about these issues. So this is just as a sanity check, if you look at other literature, other treatments of this material. In this lecture, we're always going to be talking about search problems. So here is the basic question, as we have here are two, classes NP is the ones that we can have solution, we can officially check that this it's a solution. But there's problems that the only method that we know for solving seems to be some brute-force algorithm. And then there's P, which are the ones that we can solve in polynomial time. And the question is, are these classes and problems the same or not? It seems like there should be an answer to this if you just look at the ramifications. So if P is not equal to NP, so that means there's a problem that's intractable. At least one, and maybe many, problems that, we can prove, you cannot guarantee to solve in polynomial time. On the other hand, if P = NP, all problems that are in NP, even the ones that seem to be difficult like the traveling salesman problem actually are tractable. There actually exists of polynomial time algorithm we just haven't discovered it yet. Unlike linear programming eventually, maybe we can discover a solution. P is not equal to NP would mean for those problems, the best we can do really is brute-force search if we want to guarantee that we can solve a polynomial time for any input. But if P equals NP, then there are efficient algorithms out there for these problems that we don't know efficient algorithms yet. It's a big difference. Is brute-force the best we can do or are there efficient algorithms there? So those are the two situations, two possible situations. And the frustrating situation is that most people believed that P is not equal to NP, that those problems actually are intractable, but nobody's been able to prove that. That's a very frustrating situation that's existed for many decades now. That's the central question that we want to address. This problem is so important that we're going to look at it from a couple of different points of view. The first one is called nondeterminism. So we're going to think about the concept of a nondeterministic machine, which can choose among multiple options at each step. This is in opposition to all the abstract machines we've considered, and all the real machines. Any computer that we've talked about, if you know its state, you know what's going to happen next. It's deterministic. Nondeterministic to machine is different. So in particular, since it can choose among multiple options, a nondeterministic machine can just guess the one that leads to the solution. So this seems like a fanciful idea, but it's actually not so complicated mathematically. So, for example, let's imagine that Java has an either or statement. So if you say either x[0] = 1, that's an either or statement, it'll pick up one of those two. And so this is going to be a solution to integer linear programming. Whatever the instance is, the machine will pick the proper option be either or statement to do the assignments that we care about and it'll get the right solution. And really the question is, is there a way for the machine to make these choices to get to the answer? Not all problems are quite so simple as this one, but this one really well illustrates the point. So that's a fine solution to integer linear programming. Just do that for all N variables. And actually the way the theoreticians looked at it in terms of a Turing machine where given a state, and looking at a particular character under the tape head, you have two possible choices for a given character. And again, the machine can guess the option that leads to the solution. And this one maybe you can see how nondeterminism might work. Just imagine a machine of states that are all connected together, and really what you're asking is, is there a way to get through that to get to the whole state to do the right computation or not. You can certainly formulate a nondeterministic machine and imagine that it computes. And sometimes, there'll be a way to get there, other times, there won't. So, it's a fine mathematical formalism for describing a computation. And of course, even nondeterminism isn't going to change the problems that we can compute. It has to do with efficiency. We saw actually an example of this when we talked about regular expressions and converting regular expressions into automaton. In Kleene's Theorem, which takes a nondeterministic turns it into a deterministic one. The difference is the deterministic one has an exponential number of states. That exponential blow up in such a conversion, that's what's related to the kinds of efficiency questions that we're talking about. So anyway, whatever context that you pick, nondeterminism seems like a complete fantasy. We couldn't possibly imagine a machine that can always guess the right answer, can we? But the fact is that if P is not equal to NP, then intractable search problems exist. P = NP, no intractable search problems exist. But if P is not equal to NP, then nondeterministic machines would give us efficient algorithms. N of P equals NP. Nondeterministic machines would be of no help. It wouldn't make it more efficient at all, and so the P = NP question is used with nondeterminism helping if we could build it. It seems like a theoretical question that we should easily be able to resolve, because no one believes that we could build nondeterministic machines. They could actually compute the answer. But no one's been able to prove that for many years, many decades. Here's another way to look at the situation around creativity. So it's the idea of creating something versus being able to appreciate it or understand it. So Mozart composes a piece of music. We think that's difficult, but the audience can appreciate it and that's easier. Or a mathematician proves a deep theorem whilst proof from his last theorem. So, however he proved it, a colleague can check his proof. We think of that being easier. Or a Boeing designs on airfoil. The design is one thing, checking it verifies that it's easier. Or any scientist that proposes a theory, then it gets validated by an experimentalist. It's the difference between Mozart and an American Idol judge. And that's the analog P vs NP, is getting to the solution is one thing. We think that's difficult. Checking the solution's another thing, but no one's been able to prove that creating a solution to a problem is more difficult than checking that it is correct. We're going to look into this a little bit more deeply next.