Now that we've discussed sample preparation and imaging for single particle analysis, let's discuss reconstruction. So given an image of a bunch of purified particles, the first step is to identify the particles. So here again is an image of the hemocyanin, and we need to identify that here's one particle and here's another particle and here's a third particle. There's many ways to do this. One way is simply with a mouse and a viewer to simply mark each of the particles as they see them. And pick them one by one manually. Another common way is by cross-correlation. If you have some kind of template, and you know at least a little bit about the particle you're looking for. For instance that's it's round or at least something about its size. You can use that as the template and search computationally by cross correlation for everything in the images that looks close to that. There's also been other methods tried, neural nets, edge-detection, and other things. In fact, there's been competitions held to motivate the development of new algorithms to find particles within these images. So by whatever method it was done, let's assume that now we have images where each of the particles has been picked, here shown by the crosshairs on each particle. Now, this image makes the task look easy, because the particles are so high contrast. It's a highly focused image of a very large particle, and it's a very good image at that. And so it seems easy to pick out the individual particles. In other cases, it's not so easy. This for instance, is a picture of a much smaller particle, groweli, taken closer to focus. And a particle picking algorithm has already found this particle, for instance. And this one and this one and this one. So you can see in this case it's very difficult to detect the particles from the background but nevertheless it needs to and can be done. Now, let's think about what makes each particles image look a little different from the next. First of all, each particle froze in the ice in a different orientation. They were all tumbling randomly in the moment that they were plunged frozen. And so each on presents a different view of that particle. In addition, some of the particles may be complete and some of the particles may have started to fall apart of have other sub-units added to them. So there's stoichiometric heterogeneity in the particle set. In addition, there can be conformational heterogeneity. If the particles have subtly different confirmations, we need to be able to detect that. So the next step in a typical single-particle analysis image processing flow is to take each of these individual particle images And classify them into classes of images that all look the same. And the hope is that within a single class of images all of the particles in that class are stoichiometrically homogeneous, conformationally homogenous. And they were all recorded from the same point of view. Now note that if different copies of the particle all froze in the ice in the same orientation, they could have different in plane rotations. And there images should still be classified into the same class. because the images look the same even though they have different rotations. Now there's lots of different algorithms to classify the particles. One of them for instance is called hierarchical Ascendant Classification. And the basic idea is what if we started with N images. And we try to separate those into two classes. We'll call them Class A and Class B. We could ask that these two classes be defined by maximizing the inter class variance. And minimizing the intraclass variance. And we could allow the computer to try different potential classifications and pick the one which created two classes with the maximum interclass variance. And the minimum intraclass variance. Now let me rename this class A2. And this will be class A1. And then we could do the same thing for those classes. We could take all the images in class A1 and we could ask the computer to further divide them into two sub-classes. Let's call them B1 and B2. That would maximize the interclass variance between those two and minimize the intraclass class variance. And likewise we could do that to A2, to create classes B3 and B4. Then we could further take another step and classify this into C1. In C2, in C3, in C4, etcetera. We could do this in as many steps as needed until we had lots of different classes. And like I say, there's lots of different ways that one could classify the images to try to create pure classes. But once that procedure is done, then you would align all the images within a class so that they're transitionally on top of each other, and check for any in-plane rotations of one image with respect to the other. And once they were all aligned in that fashion, they're averaged together to produce what are called class averages. And that's what is shown here. And this again was taken from a tutorial for EMAN, one of the software packages for single particle analysis found at this website. And so this for instance is the class average of a lot of images. Of the grow EL particle and this is obviously a top view and so in the class average of the signal to noise ratio is much higher. Here is a class average showing particles from a side view where you can see the four different rings that form the grow EL molecule and here again, the signal to noise ratio is much higher then in any individual image. And so we see a number of different class averages. Some of them are similar, some look like side views, some are top views, some are in between the views. Now there are many ways to group images into classes. One of them that we'll mention now which is particular importance is multivariate statistical analysis. And let's begin thinking about an image, it's pixellated, so it has all of these pixels. And let's imagine that we could take the density of that pixel there and define some kind of space where this access would be the density in pixel number one and this axis here would be the density in pixel number two. And this axis would be the density in pixel number three. Obviously each image could then be plotted in this three dimensional space according to the density value here here and here. And it would product a particular spot somewhere. Well now imagine in a computer if you defined an N dimensional space, where here let's say there are N pixels total in the image, and you defined an N-dimensional space, which of course we can't draw, you could still plot the position of each image in that space. And so you'd have scattered points throughout the space. And it could be that you had a number of images that were near each other in space, in different regions. You might then find that your original axis of each density of each pixel were not nearly as useful in describing the diversity of images. And there are ways to calculate what are the so called principal axis through this space that best characterize the big differences that exist between all the images. For instance, in this kind of a space, if you have a lot of the images along this line, this axis might well define the difference between these images that cluster over here and a set of images over here that cluster over there. And there might be a second principal axis that best reveals that there's a cluster over here separated from the clusters over there. So these are called principal axes, or a principal component analysis. And images can be created, which define the extremes of these difference axis, so there can be an image, and these are going to be called the eigen images. This one and this one and perhaps there's one over here. It's a large number of eigen images that go through this space that define the extreme differences between all the images. And then, you can reexpress each image as a linear combination of each of the eigen images. And so image number one would become some weight number one times eigen image number one plus a different weight on eigen image number two, plus etc, to as many eigen images as you want to consider. So basically, decomposing the images with a basis set of these special eigen images that represent the diversity within the image set. At this point, you can cluster the images according to their positions along these principle axis and generate different classes. And within each class, you can then align and average each of the images in that particular class to give you some kind of a class average image. That if you were to plot it in this space, would appear near the center of that cluster. And so you could get a class average there, and a class average there. Then you could take those class averages and realign each of your experimental images with the class averages and proceed. Once we have a set of class averages, the next step is to figure out the relative orientation of each of those class averages, so that we can merge the information that we get from each one into a full three dimensional reconstruction. And to illustrate that I'll use slides, again, take form the EMAN tutorial here. And as I showed in the intro, but we'll go into a little bit more detail this time. The process begins by taking a class average. So this is the real space average image of one particular class. In this case, you can see it's a top view of the barrel. And when you first calculate it's fourier transform. So here are the amplitudes and phases of each component. And then, we need to insert that into reciprocal space. So here we have spacial frequency in the x direction, spacial frequency in the y direction and spacial frequency in the z direction. And the very first transform, we can insert anywhere it's arbitery how we insert it. So for simplicity, we just insert it let's say on the spacial frequency in x and spacial frequency in y, that plane in reciprocal space. Then we go to the next class average. So this is a different class average. As you can see, this isn't really a top view. It's not a side view either, it's some kind of oblique view. And we calculate its fourier transform to get the amplitudes and phases here. And then we have to find what is its relative orientation compared to the first one which is here. Okay, now one principle is that both of these transforms will share a common line of amplitudes and phases. And so the goal is to discover that line. And here, we've put that line on the axis of spatial frequency in the x direction. So this is where the two central sections through the 3D fourier transform. This central section, which comes from the second class average, and this central section, which came from the first class average. Both of those, of course, cut through the origin and there's at least one line of amplitudes in phases that match, and initially we'll just set that to arbitrary be spacial frequency in the x direction. Now what is still not known is the angle between these central sections that's still left unknown. But let's spend a moment thinking in more detail about how would we find which line in these transforms are in common between the two? So we'll begin by putting this picture up here in the left corner for now. And let's imagine that we had calculated the fourier transform of the first class average. And so these were series of amplitudes and phases in that fourier transform. Well, we could construct a set of lines through that fourier transform. I'll make the first one blue. Imagine that we took the amplitudes and phases in that line across it, and we just map them and wrote them into a table right here. And then let's say, we took the amplitudes in phases on this line. And we wrote them across in a table right there. And then we took the next set here across the line like that and we wrote table here, and you see the pattern. These amplitudes and phases we would write in the table here. And so, we can take the amplitudes and phases of lines across this transform and merely write them out in a chart looking down this chart. Then once we get all the way around 180 degrees, we realize that the next one were we to map it out. The first line had amplitudes in phases across the transform right there, and after we went around 180 degrees we realized the next line, if we were to draw it, would be amplitudes in phases going across in the other direction. I mean, for those lines to be superimposed, it's just one is at this direction, one is the other direction. So we can see that the next line we would write would have the same amplitudes and phases as this line, just in the opposite direction. In other words, this one's going that way, this one is going that way. And we could continue that all around the transform and replicate these values. My box wasn't quite big enough, I'm going to squeeze the last one in. But you get the picture of setting up this table. Now, we could take the 4A transform of this second central section and, let me try to draw this appropriately. If we took the 4A transform of the second central section, and again it's full of amplitudes and phases around there. We could do a similar procedure and take all of the amplituding phases across that line right there that line right there then we could right them in a table. And then we can take all the amplitudes and phases in the next line and write them in a table, and all the amplitudes and phases here, and write them in a table, and these, etc. And again if we were to continue going around the circle we'd start hitting the same amplitudes and phases but in the opposite direction. Okay, and at this point, we could then start to compare them. And plot the correlation of say this blue line with the blue line here. Now their colors match now in colors. But it's not necessarily the case that the amplitudes in phases here would match the amplitudes in phases here in this other blue line. So the colors aren't meant to code for anything other than these different colored lines or the amplitudes and phases going across the transform in different locations. Anyway, we could compare these amplitudes and phases with these amplitudes and phases and there would be a value here for the correlation. And we could compare these amplitudes and phases with these and there would be another value for the correlation and these amplitudes and phases with the first one and there would be a correlation. And we fill out this whole chart of all the cross-correlation values of all these lines compared to all these lines. And in a successful case there will be one particular spot, one peak that's much higher than the rest, telling us that it was the amplitudes and phases here, say, that matched best against these amplitudes and phases. Meaning so in this case it was this line across the transform matched very well this line across this transform. And this kind of map has been called a sinogram and it allows us to find the relative orientation between this transform and this transform, which allows us to find this common line and fill three dimensional reciprocal space with amplitudes and phases appropriately. Now we move on to the third class average. Here it is. Again it doesn't really look like a top view or a side view. It's some kind of a bleak view. We calculate it's foray transform. And then we find the common line that it's transform has with both the second transform and also the first transform. So we already knew where the common line was between the first transform and the second. It was right here but we didn't know what that angle was. Now given the third class average, we can find the common line that the third class average has with the second. So this is three versus two and we can also find the common line that the third transform has with the first transform three verses one. And now we have enough angular constraints to reveal the correct angles, the correct orientation of all three cross-sections. There's only one way that each of these three central sections could cross through each other. With these particular common lines. And then we move on to the fourth class average. Here it is, it's clearly a side view. We calculate its Fourier transform, amplitudes and phases, and then we find the common line that it has with the third, the second, and the first and together this is now an over determined set. Which reveals how to insert the amplitudes and phases of this power spectrum into 3D 4A space. And so this, the process is shown with four different central sections. You might have though 20 good class averages though. Giving you 20 different planes. And so it becomes a least square fit problem, to determine the vast relative orientations of all those cross sections, to make the common lines fit as best they can. It's heavily over determined at that point and so you pick the best fit. Once you fit the power spectra of all your class averages into reciprocal space, you then interpolate that data onto a regular Cartesian coordinate system in reciprocal space and then calculate an inverse Fourier transform in 3D. To generate a three dimensional reconstruction of your object. The next step is to take this three dimensional reconstruction and reproject it in all possible directions. So if we were to take a projection image of the particle from that view, it would look like it's four stacks of rings. And if we took a projection of the particle from the top, it would look like a circle. And if we took a picture from this angle, it would look like the stack of four rings. And, of course, from the top it would look like a circle. And we could imagine projections in all kinds of angles here, which I don't even know if I draw right. Just imagine that we could calculate the projections in a very large number of possible directions. And not just in the plane. Imagine an image coming out towards us and an image coming out there and an image going out towards the back. So the full three-dimensional view, we calculate projections of what would a picture of this object look like from all possible directions. And we call the set of these pictures, we call them reprojections of the initial model in this case. And so in the actual example of this growEL model data set in the tutorial, the reprojections of the initial model look like this. Here's the top view, here are other views that are further and further away from a top view and then here is a nice side view and so this is a set of all the different views that you could obtain from that initial model. Now once we have the set of reprojections then we take all of our initial images and we compare them to each possible reprojection and reclassify which Orientation or which view does each of our original images give us of the particle? And this process is beautifully depicted in this nice figure from a recent review. So the idea here is that these are the initial images, individual images of the particle of interest. And then there was an initial model, and the initial model is reprojected into all different possible views. And so here they are showing only give, but there might actually be 40 or even hundreds later in the process. And then we take our initial images and we ask using cross correlation usually, which view of the particle does each of these images represent? And we might find that this image say and this image, they match this reprojection the best. So these two are now assigned to this class and used to produce a new class average. So if we average that image and that image, we can get this new class average here for that view of the actual particle. Then we take our next experimental image and we check it against all the reprojections of the model. And maybe it matches this one the best. And we go through and we check all the experimental images, and in this case they're showing maybe this one also looks very much like that reprojection. So we take that image and this image, and we average them together to produce a new class average of the particle, which is represented here. In like fashion we go to the next experimental image, and we check it against all these reprojections, and we find that it matches this one the best. And perhaps there are many more. I mean, there are seven example images shown. The process might actually involve 2,000 particles or maybe 200,000. Or even more. And so in this case though, the simplified case, there's only one image that matches that reprojection, so it becomes the new class average for that view. In like fashion, this one is assigned to this view. And is used to produce this new class average. And finally this experimental image is found to match best this view of the model, and so it's used to produce this new class average. Once you have this set of new class averages, they are merged in reciprocal space again to produce a new refined three dimensional model of the particle. Then this refined model, here we go, the refined model, can then be reprojected into another set of reprojections. And again, our experimental images can be matched against them. And so, individual experimental images can be assigned to different views and different iterations of this process. And the whole process is iterated until the classes converge and the final model converges, and the resolution increases all along the way. And the kind of resolution gains that can be achieved is illustrated well in this figure from Steve Ludtke and his colleagues, showing that given that initial model, it's reprojected, the experimental images are reclassified, realigned, and used to produce a next model of the object. And then in the second iteration, the resolution goes up a little bit more and the third iteration, and the fourth iteration, and then here is a final Reconstruction for this particular data set. And it's compared to an x ray crystal structure. That's low pass filter to approximately the same resolution. Showing that the procedure does give a correct structure. Now, an important variation on that basic process, takes advantage of what is called a maximum likelihood method. And in order to explain this, let's go back to this figure. And begin by erasing the arrows which show that each experimental image is assigned to a single class. And instead, there is an estimate calculated of how likely it is that this experimental image is a representation of this view. And how likely is it that it represents this view of the particle? And how likely is it that it represents this one, and this one, and this one? And these likelihoods are quantified with coefficients. For instance, this could be a waiting coefficient. We could call it 1a, which is the likelihood that experimental image a is a representation of this view of the particle. And this would be 1b. And this is 1c and 1d and 1e. In like fashion, the likelihood that experimental image number two represents this view of the particle is estimated, and this view, and this view, and this view, and this view. And we could represent these weights as, say, 2a, and 2b, and 2c, 2d, and 2e, et cetera. The likelihood that image three represents that class, and this class, and this class, and this class, and this class are calculated 3a, 3b, 3c, 3d, and 3e. Et cetera, et cetera for all the experimental images. And then, in order to generate a refined class average, this new class average will be the weighted sum of all of the experimental images. For instance it will be the weight 1a times image number one, plus the weight 2a times image number two, plus 3a times image number three, et cetera, to image n, in the entire data set. So each of these refined averages becomes a combination of all the experimental images with different weights according to the probability that they represent that point of view. Then the class averages are used to generate a refined model, and the cycle is iterated. So that's how a maximum likelihood method is used in the generation of models in this iterative cycle. There are other ways maximum likelihood methods are used in cryo Image processing. For instance, another one is in the generation of multiple three dimensional reconstructions from a single data set to help sort out heterogeneity within that data set. And we'll get to that a little bit later.