0:15

Hello and welcome to this lesson which we'll introduce manifold learning.

Â Manifold learning is similar to

Â dimensional reduction and that it is trying to reduce the dimensionality of the dataset.

Â However, in this case,

Â we're trying to determine a manifold or

Â a high-dimensional curve that captures the signal in

Â the data and then to map

Â that high-dimensional curve or manifold

Â to a much lower-dimensional manifold that can be visualized.

Â Typically, these algorithms are going to be used

Â to enable visualization of high-dimensional data,

Â and we'll see that in the Notebook that I demonstrate this to you.

Â So, in this lesson, I really want you to understand what

Â manifold learning is and why it can be an effective tool.

Â You should be able to explain the differences between

Â the primary manifold learning algorithms we are going to use,

Â including LLE, Isomap, and the t-SNE algorithm.

Â You should be able to apply manifold learning to do

Â this two-dimensional visualization of these high-dimensional datasets.

Â And lastly, there's another related concept that I'm

Â i going to introduce called FeatureUnion,

Â which allows us to combine the feature selection done

Â by different algorithms into a do set of features.

Â So, for instance, we might want to apply PCA and select only a few components,

Â and we also might want to select recursive feature lamination.

Â And that way, we can see different techniques to try to

Â build a more representative set of features.

Â And so, we'll show how that works.

Â The readings or activities for this particular lesson include a reading,

Â the Notebook Manifold Learning Project, by Jake VanderPlas.

Â You can skip the example section.

Â And then, there's the, of course, Notebook.

Â So, just a little fast, the discussion here.

Â He does a very nice job discussing what

Â manifolds are in these manifold learning techniques,

Â uses a special type of data,

Â which is called Hello. And you'll see it here.

Â You can see its points that are colored showing you the word Hello.

Â Obviously, this is a nonlinear representation of the data.

Â And so, you can apply MDS algorithms to try to capture this signal.

Â And so, he shows you this and how it can all be computed

Â and then goes through different algorithms and tries to recover that distribution.

Â In our Notebook, we're going to focus on looking at the digit dataset.

Â We're going to show how PCA with just two components can

Â sort of provide a representative view or visualization of this data.

Â And then, we're going to walk through several different algorithms

Â including LLE or locally linear embedding,

Â multidimensional scaling, isometric mapping or ISO map,

Â and then the t-SNE algorithm.

Â We're going to see all of them can generate

Â two-dimensional representations of this high-dimensional dataset.

Â Remember, the handwritten digit dataset is actually

Â 64 dimensions since they are eight by eight pixels,

Â and we're got to see how well this actually separates the data out.

Â Can we recover the original signal?

Â Remember, this is an unsupervised learning algorithm.

Â So we don't use labels in the actual generation of the output.

Â We simply use the labels at the end to determine did we do a good job of taking

Â this high-dimensional space where these digit data were located and keep them together,

Â because that's the fundamental idea that you're going to generate

Â a mapping from this high-dimensional space to this low-dimensional, say,

Â two-dimensional space, where points that are near each other in

Â this high-dimensional space remain near each other in the low-dimensional space.

Â And then, lastly, we'll look at feature unions.

Â So, again, we start the Notebook app the same way. We read in our data.

Â We've got a code here that plots these digit data,

Â and we'll see what this plot looks like in just a second.

Â Basically, we're just trying to represent

Â the different data in a way that makes it easy to distinguish them.

Â So, we load our data. There is the digit dataset.

Â I've now inverted it so that it's black on white,

Â just to sort of give a different view.

Â We're going to be using 25 neighbors,.

Â You can, of course, change this value and see how it impacts the rest of the Notebook.

Â So, this now is what that plot method does.

Â You can see that it plots all of the digit dataset.

Â It colors each class differently.

Â The algorithms themselves don't see this labeled data.

Â But when you apply PCA,

Â you can see that the light blue or zero digits stay fairly well-clustered.

Â Six is fairly well-isolated as well,

Â two and three seem to be as well.

Â But some of them like five,

Â which is this red dot,

Â reds are all over the place.

Â They're kind of intermixed. If you think about that,

Â it should make some sense.

Â Zero is different than the rest of the numbers.

Â It's pretty easy to distinguish that.

Â Three and two, they're different as well.

Â Five looks a lot like all of these others that it's mixed in here with.

Â And so, that's part of the challenge.

Â Can we separate those out in a way that we're able to pull that information out?

Â So, one last thing I want to complete here.

Â When you see the number five,

Â this is actually the number five displayed on

Â the plot at the exact center of the distribution of its data.

Â So, the zero, this is the exact center of the distribution of these light blue points,

Â same with six, four, et cetera.

Â So, this gives you a feel for where the centers of the individual clusters

Â of points that represent these different digits are located.

Â So, let's take a look at the output of the rest of these algorithms.

Â I'm not going to talk in this video about these algorithms.

Â You can read about them. I just want to look at what that plot looks like.

Â So the first one ear is LLE.

Â And if you change the parameters,

Â it changes the mapping.

Â And here you notice that they're not

Â nice spherical clusters. They're all different kinds.

Â So here you can see seven is this long spike.

Â Six is this long spike.

Â Zeroes over here are very tight little spike.

Â So, in this particular choice of hyperparameters,

Â you can see that six,

Â and even eight, two degrees, zero,

Â four, and seven, those are well-separated, but one, two,

Â three, five, and nine are all really clumped together tightly.

Â So, LLE didn't do a great job of pulling those up,

Â but it did do a nice job with

Â this particular set of hyperparameters of pulling out these.

Â As you tune those hyperparameters,

Â you might see different results.

Â Let's take a look at the next algorithm, MDS.

Â MDS does a different job.

Â You notice that the overall distribution of points is

Â sort of spherical or circular in this two-dimensional space.

Â That zero, six, and four are nicely pulled out.

Â Three and two, nicely pulled out.

Â The nine is not too bad,

Â but fives are all over the place again.

Â And ones aren't too bad as well.

Â So you see that it did a better job of PCA and

Â the first algorithm LLE in terms of actually separating out the clusters,

Â but there are still some, like particularly five,

Â that's intermixed with the rest.

Â We can look at Isomap,

Â which is actually a form of MDS, and see what it does.

Â And here, interestingly enough,

Â you can now start to see four, six,

Â zero is also those are pulled out, well-separated.

Â Two and three, not too badly separated, seven, not too bad.

Â And remember, this is a 64-dimensional space

Â that we've now projected down to two dimensions.

Â And so, that's actually pretty impressive what this algorithm has done.

Â The last one I'll look at is t-SNE.

Â t-SNE is actually one that some people just get blown away by how well it does.

Â It's very powerful at taking

Â a very high-dimensional space and generating this two-dimensional visualization.

Â The challenge is it's much more computationally complex.

Â And so, a lot of times people will do, say,

Â PCA initially to take a very high-dimensional space and reduce it down to, say,

Â 50 or 100 dimensions and then use

Â t-SNE to visualize that because it's going to be much faster.

Â So let's take a look at t-SNE on that digit dataset.

Â And here you understand now what I'm talking about.

Â Look at the zeroes. The zeroes are all pulled out.

Â The nines are pulled out.

Â There's just these little clumps of them remaining,

Â six, four, et cetera.

Â Two, nicely pulled out. Three, nicely pulled out.

Â And even where the data are not part of the main cluster,

Â they're still clustered themselves.

Â So, it does a nice job of maintaining that clustering of the data.

Â And again, this is just the default hyperparameters.

Â We haven't really tried to do a lot of tuning here,

Â and you could still see that it's done

Â a really amazing job of separating these clusters out.

Â We wouldn't expect perfect separation I mean.

Â Fives and sixes sometimes look alike,

Â and you can see there's a five over here.

Â And the ones are sort of spread out a little bit.

Â You can see that they're not very well-bunched.

Â But then, again, a one can look like a seven,

Â and it can maybe look like a five or something.

Â So you can understand where some of these are occurring.

Â But again, if you think about it,

Â this algorithm separated these data with no guidance from us.

Â We simply told it there's a high-dimensional space,

Â generate a two-dimensional space,

Â and preserve that geometry in the high-dimensional space so

Â that points stay together when they're down here in this lower-dimensional space.

Â The last thing, of course, was this feature unions.

Â This Notebook walks through how to apply this.

Â I'm not a talk a lot about this in the video.

Â It's a nice way of taking different techniques and combining them together to

Â hopefully end up with a better set

Â of features than you might have gotten with just one technique.

Â So, I'm going to go ahead and stop the video with

Â that particular mention of feature unions.

Â If you have any questions, let us know.

Â And of course, good luck.

Â