0:00

[MUSIC] Next, let me mention, embarrassingly briefly, linear

Â programming. So linear programming is a problem that,

Â on the one hand, is efficiently solvable both in theory and in practice.

Â And, on the other hand, is very general. It includes as a special case, bipartite

Â matching, maximum flow, and tons of other stuff.

Â The general problem solved by linear programming is to optimize a linear

Â function, maximize or minimize it doesn't matter, optimize a linear function over a

Â set of linear constraints. Geometrically, that corresponds to

Â finding the best point, the optimal point, in the feasible region which is

Â representable as the inner section intersection of half spaces.

Â So let me just draw a cartoon picture, which is going to be in the plane.

Â Although let me emphasize, the power of linear programming is that you can

Â efficiently solve problems that are in a massively higher dimensions.

Â Not 2 dimensions, but think millions of them.

Â Dimensions. So in the plane, the intersection of half

Â planes, assuming it's bounded, is just going to be a polygon and you could think

Â about optimizing a linear function are just trying to push as far as possible in

Â some direction subject to being inside the feasible region.

Â So, for example, maybe you want to push as far to the northeast as possible

Â Subject to being one of these blue points, somewhere in this polygon.

Â For example, the maximum flow of problem is easily encoded as a special case of

Â linear programming. You use one dimension, that is, one

Â decision variable for each edge. The decision variable describes how much

Â flow you route on a given edge. The linear objective function would be

Â the maximized The sum of the flows on the edges going out of the source vertex.

Â And then you would have linear constraints just insisting that the

Â amount of flow coming into a vertex is the same as the amount of flow coming out

Â of a vertex. And, linear programming code is not just

Â maximum programming flow, but tons and tons of other problems as well.

Â Despite its generality linear programming problems can be solved efficiently both

Â in theory and in practice. The systematic formalization and

Â algorithm solution of linear programs was pioneer by George Dantzig back in the

Â 1940's. In particular, Dantzig invented what's

Â called the simplex method which remains to date one of the most important.

Â 2:32

Practical methods for solving linear programs.

Â Have you ever heard that story about the student who walks in late to a class,

Â sees what he thinks is homework written up on the chalk board and solves them,

Â not knowing that they are in fact major open research questions?

Â Well, maybe you thought that story was apocryphal, assuming I did for many

Â years, But it's not. Turns out it's totally about George

Â Dantzig. In 1939, he walked into his graduate

Â statistics class late. The 2 major open questions in the field

Â were written on the chalkboard, and he solved them within a few days.

Â That wound up being his PhD dissertation before he got interested in linear

Â programming. Now, the algorithms used to efficiently

Â solve one of your programs, they're pretty complicated, both versions of the

Â simplex method and so called Interior Point algorithms.

Â So, it may or may not be a good use of your time learning more about how the

Â guts of these algorithms work. But what I'm most surely is a good use of

Â your time is how to be an educated clients of these algorithms that is to

Â get some practice formulating problems as linear programs and solving linear

Â programs using either op, open source solvers.

Â Or commercial linear programming solvers. Those solvers are a very powerful black

Â box tool to have in your algorithm assists toolbox.

Â In fact there are black box subroutines available to you that are strictly more

Â powerful than linear programming. One generalization is convex programming,

Â so linear functions are special case of convex functions and if you want to

Â minimize the convex function or equivalently maximize a concave function

Â subject to convex constraints that again is a problem you can solve efficiently

Â both in theory in terms of polynomial time and in practice.

Â May be the problem size is you can reach on.

Â Quite as big with linear programming, but there pretty close.

Â The different generalization is energer linear programming, so these are like

Â linear program but where you add the extra constraint that certain decision

Â variables have to take on and interger variable.

Â So something like 1/2 as a value for a decsion variable would be disallowed in

Â an interger program. Now these in general are no longer

Â solvable efficiently in theory, it's easy to encode empty complete problems, like

Â lets say vertex cover as a special case of engineer programming.

Â But there is quite a bit of advance technology out there.

Â For solving at least in special domains, integer programs.

Â So again if you have an empty complete problem and you really need to solve it,

Â integer programming is a technique your going to want to learn more about.

Â What are some other topics we haven't had time to discuss? Well, first of all, for

Â many of topics we have discussed, we've only scratched the surface.

Â There's beautiful and useful stuff beyond what we've discussed in these discussed

Â in these classes. On topics ranging from data structures,

Â for example, with hashing in research trees, to graph algorithms. We already

Â mentioned matchings and flows. To approximation alogrithm like better

Â heuristics for the travelling salesman problem.

Â A topic we've barely discussed at all is geometric algorithms, one exception being

Â a divide and conquer algorithm for the closest pair problem that we discussed in

Â part one. Roughly speaking, the study of geometric

Â algorithms has two flavors, low dimensional problems and high dimensional

Â Dimensional problems. So when I say low dimensional, I mean

Â probably 2 or 3 dimension. So problems in the plane or problems in free space.

Â A conical computational problem here would be the convex hull problem.

Â Which means I give you a bunch of points and then you want to know which points

Â are on the convex hull of the point set. So, here's a way to think about the

Â convex hull problem in the plane. So, take a wooden board, a flat wooden

Â board, that represents the plane. Now, pound a bunch of nails into the

Â wooden board. Those are representing the points in the

Â plane. Now, to compute the convex hull, what you

Â can do is you can take a rubber band, stretch it out really far so that it

Â bounds. All of the nails you pounded into the

Â wooden board and now let the wo, rubber band go, and it's going to snap around

Â the boundry nails. That's a way to computer the convex hull

Â in 2 dimensions. And so the question is, how do you do that efficiently given just

Â a bunch of coordinates of points? The state-of-the-art here are divide and

Â conquer algorythm, very much in the spirit of the closest pair problem back

Â in Part 1. So why would you care about computing

Â convex hulls? Well, there's lots of reasons.

Â But, you know, as one example, suppose you were doing some 3D graphics, and you

Â had moving objects in your scene, and you wanted to know whether, when two objects

Â collide. Because then you have to respond

Â appropriately in rendering the scene. Well, to know where the two objects

Â collide, you don't have to remember all of the details of the objects.

Â You just have to keep track of their convex hulls.

Â They collide exactly when their convex hulls intersect, and so that's one reason

Â you might want to do that computation. A very different flavor of geometric

Â algorithms is the high-dimensional case. And here, I'm thinking of the number of

Â dimensions as being a thou, in the thousands say, or perhaps even quite a

Â bit more than that. Now, why would you ever have points in

Â the thousands of dimensions? Well, imagine you were doing say, information

Â retrieval, imagine you had a bunch of documents.

Â Now, documents don't really have geometry intrinsically, but it can be useful to

Â imbue them with geometry. How do you do that? Well, imagine you

Â make a list of all of the words in the world.

Â That you care about. So maybe say, 10,000 words that you think

Â are interesting. And for a given document, you just count

Â the number of occurrences of each of these 10,000 interesting words in the

Â document. So maybe lots of them are zero, the words

Â don't occur at all. You know, but maybe there's some word

Â that occurs seven times in the document. The document.

Â So then you'd give it a seven in that coordinate and boom, you've now

Â represented each document as a point in 10,000 dimensional space.

Â One coordinate for each of the interesting words, the value of the

Â coordinate being the frequency of that word in that document.

Â Now you can start asking questions like, given a new document is it similar to any

Â of the documents you've already been storing? And geometrically, that just

Â corresponds to asking something called a nearest neighbor query.

Â Given a bunch of points and possibly high dimensional space, and given a new point,

Â which of the previous points is closest, say a new distance, to the new point you

Â where just given? That would be a canonical question you would ask, in high

Â dimensional computational geometry. In these Design and Analysis of

Â Algorithms courses, we've been focusing on the most classical style of

Â computation, where you're given upfront an input.

Â You do a computation and you produce some output.

Â And you take a bow, and you leave the stage.

Â But if you think about computation in the real world, there's plenty of algorithms

Â whose work Is never finished, algorithms that in effect run forever.

Â Two applications that we've mentioned in passing in this class are the caching

Â problems, so for example if you're writing an algorithm to manage a cache or

Â an algorithm system, that algorithms work is never really done.

Â It's going to have to manage the cache Indefinitely.

Â Similarly, you can think about routing in a network, say in Internet routing.

Â Again, that algorithm's work is never done.

Â There's always going to be link failures. There's always going to be new nodes.

Â There's always going to be new data to route.

Â So, it has to keep making routing decisions indefinitely, as long as the

Â Internet exists. As you would hope, both the theory and

Â the practice of understanding algorithms that run indefinitely, making decisions

Â in real time, is based quite squarely on the lessons that we've learned in the

Â classical computational paradigm here. But it is an interesting.

Â Topic worthy of study in its own right. A major concern of algorithms research in

Â the 21st century is massive data sets, meaning data sets that are way too big to

Â fit in the main memory of a single machine.

Â One hot topic has been so-called streaming algorithms, that is algorithms

Â which have to in effect take a firehose of data being generated at a torrential

Â pace and boil it down using small space into just a few accurate statistics.

Â So you might think, for example, about an algorithm running locally at a network

Â router with data packets flying through the router at a ridiculous pace or an

Â algorithm which is responsible for summarizing the data generated by a bunch

Â of telescopic observations. A final, important topic by which the

Â current theory is actually quite immature is understanding how to exploit

Â parallelism in tackling massive data sets.

Â For example, using the distributive system's approach exported by the map

Â produced Hadoop Systems.

Â