[MUSIC] Hi, my name is Ole Seberg, I work at the Natural History Museum of Denmark and today most of our research is centered on studies based on molecular data. But I am, in fact, trained as a classical botanist. Ever since Darwin, one of the major goals of biology has been to reconstruct the tree of life, asking simple questions like, what are our closest relatives? What are mammals related to? Are dinosaurs really related to birds? A logical follow up on these questions has been related to questions, such as how far back in time can we trace humans? When did birds and mammals replace dinosaurs? Questions like these have often been answered by reference to fossils, but as molecular data started to accumulate rapidly, focus changed to this new type of data. Based upon comparison of differences between amino acid sequences of haemoglobin from different species over time, Emile Zuckerkandl and Linus Pauling proposed the existence of a molecular clock in 1962. Zuckerkandl and Pauling had simply plotted the amount of observed amino acid changes between species as a function of the age where the two species separated as determined from the fossil record and they observed a nearly linear relationship between changes and time. They also found that the actual rate of change differed among proteins. Some proteins like histones evolved exceptionally slowly. Others, like cytochrome C, had an intermediary rate. Accordingly, each rate was characteristic for the protein in question. But it was very surprising that the rates were nearly constant. This is in marked contrast to morphological evolution, which was supposed to take place with very different rates over time. Later, the production of DNA sequence data became much easier. And it was no surprise that changes in DNA strands follow a similar linear pattern as can be seen in this figure. There was only one illustration in The Origin Of Species, a hypothetical diagram showing the diversification of what Darwin called a large genus over time. And in this sense, it's not particularly interesting. Accordingly, much of the vocabulary we use today describing the tree of life, of phylogeny, stem from Charles Darwin's champion in Germany, Ernst Haeckel. Good examples are terms like phylogeny, ontogeny, monophyly, and paraphyly. Haeckel produced a lot of trees, which are rather imaginary and fanciful, as this one. But, nonetheless, he did a lot to introduce tree thinking in biology. Despite the acceptance of evolution, reconstructing the tree of life was devoid of rigorous methodology and was to a large extent irreproducible, which of course didn't mean that everything was up in the air. There were many relationships agreed upon then and which we still accept today. The problem was when points of view differed. Then there was no common ground to have a sensible discussion on. So what is the problem? Well it's to some extent a problem of scale. If we want to date biological events, here the branching order of life that took place millions of years ago, we need to know the relative order of speciation events, and that's a huge problem. This slide uses relative size to compare the number of species of different well-known groups of organisms. By far the largest group is the insect, with close to 1 million species, and with around only 400 species in our own group, the mammals, which is a very minute group. Perhaps surprisingly we probably know, less than 2 million of the estimated 10 to 40 million species out there. However, this in itself is not an insurpassable problem as most of the species unknown, represent types that we already know. There's every reason to believe that most species will fit neatly into our current perceptions of the tree of life. Though, of course there might occasionally be exceptions. Luckily enough, species in general carry an imprint of their own history. Both in how they look, their morphology and in their genes, the DNA sequences. It is these data, we use to reconstruct the tree of life. Let's start by looking at morphology. After all, we use our eyes and recognize our surroundings, and need to be able to convey information through our surroundings. Something that has wings, feathers and beaks, is very likely a bird. But we can easily be cheated. I trust you all know these animals, shark, salmon, and lizard. But which is actually each other's closest relative? The shark and the salmon, the shark and the lizard, or the salmon and the lizard? Well, think about it. And why is it that the salmon and the lizard are more closely related to each other? Which is the correct answer. The answer is extremely simple. Because the tree of life, the phylogeny, shows it. Now let's look at a very simple phylogeny of this relationship. This phylogeny has an extra entity, the lamprey. But it can be ignored. It's just put there as a root on the tree to give it direction. If you look at the black numbered bars, say number 1, paired fins, it's placed on the branch leading to all three species in question. It is what is shared between them all. That's why they're placed below them. At the same time, it defines a group consisting of the shark, the salmon, and the lizard. It sets apart these three animals from the lamprey which does not have paired fins. The same argument applies to character number 2, jaws. The number corresponds to the characters. Some characters can be observed by anyone like 1 and 2, the fins and the jaws, others require specialist knowledge. But they all work in the same manner and any of you will be able to separate the lamprey from the rest whereas characters 3 and 4 set salmons and lizards off from the shark. Then what about 11, character 11, fin rays,which is not black because it does not characterize a group, but has developed twice on the figure. It has developed once on the line leading to the shark and once on the line leading to the salmon. If a scientist, however, believes the shark and salmon should be grouped together, what would then happen? Well, character 11 would have to be black and 4 plus 5, not. This would be a more complicated hypothesis. Thus the salmon and the lizard are each other's closest relatives. Obviously, lizards do not have fins, but a lot of evidence indicate that legs have developed from fins. Would it be possible to "turn" character 11 black then? Yes it would, if you place it together with characters 1 and 2 and made a new character called "lack of fins" and placed it together with 5 and 9 on the lizard branch. It will still be only two events, as in the original position. Thus the diagram is only a hypothesis, we make hypotheses about how the tree of life looks. The diagram only gives the relative sequence from branching events. It has no, absolutely no indication of real time. That is very important to remember. So let's look on how we do make a phylogeny. Well, we produce a file, let's produce a phylogeny based on morphology. Well this is in principle rather simple, you usually just look at the species. Defining species is not necessarily easy. And a nearly endless number of publications has been written on this particular subject. Here we will just accept species as something that exist, even though they are also hypotheses. Then, if you look at an individual organism belonging to each species and then you extract characters such as we just did in the diagram or cladogram as they're called we just saw. Simple as it may sound it is very important that what you compare is the same in biological terms that you can defend a hypothesis that the characters are homologous. Often it's very simple. A wing of a bird is with an extremely high probability, directly comparable across all birds. Being a hummingbird or an ostrich. Maybe, I can also convince you that bird wings are homologies or the same as the four legs of mammals. But it can become harder, are the wings of bird, homologies with fins? Well, I would guess they are, but it's difficult. The same list of characters is just a hypothesis, and some hypotheses are easier to defend, others are more difficult. Let's look at an example which shows a variety of flowers in the orchids. Most scientists agree that flowers have only evolved once. They are homologous in that the orchid type flower is only known from this family, the orchids family and you can easily see a lot of potential characters of the figure. The arrow points at just one special character, the special petal found in all orchids, the labellum variation in this petal could be a potential character. All the observations you have done can be transferred into an array composed of characters by species. The first column in this figure is the species designation. Orchids in the upper right corner is a higher entity - a genus. And the n column's character numbers and recodenings. 1 could be labellum shape, 2 could be color. 17 could be anything else, leaf color, root structure etc. A row is a summary of the observed characters of just one species codifying it into numbers as I've just done. It is just a matter of convention. It could be letters. It could be anything. It could be shape, circles, squares or triangles. It's not important because we are not going to do any calculations with them. They are just place holders for differences. And such an array is called a matrix. But how do one go from a matrix to a phylogeny? When it comes to morphology, there is only one realistic way to do it. But as I will show you later for molecular data, there are several. Let's stick with the easy to understand method. Take a look at this matrix. As in the example with the shark, the salmon and the lizard, where the lamprey was put there to root the tree. A here, the letter A, is here for the same reason. Thus, A, the taxon A, the species A, whatever you call it, has a fixed position. Remember that A and D corresponds to species as I just said. 1-6 indicate characters and the black and the white squares different manifestations of the same character. We should now neglect the characters, if A has a fixed position then in how many different ways can you create a bifurcating tree of B, C and D. Well, one, two, or three. Just think about it. Three is, in fact, the right answer. Either C and D are more closely related to each other than they are to B, or B and D are more closely related to each other than they are to C. Or C and B are more closely related to each other than they are to D. Which one should you then choose? And now, the characters come into play. Let's map character 1 onto the tree, different bifurcating trees or cladogram. Well, it can only be in one branch leading to C, B, and D. It is in fact the characters that make A the root on all different trees or cladograms. Let's do the same with character 2. It only occurs on the branch leading to B on all three trees and is easy to place. And then if we go to character 3 the problems start already in the first figure. It is on the branch leading to C and D. On the second and the third phylogeny or bifurcating tree, it occurred once in each individual branch leading to C and D. The same applies to the third possibility. If you map all characters in this way, onto the three different bifurcating trees or cladograms, you end up with three different distributions of the black squares. Which one should you then choose? Number one? Number two? Or number three? In fact, the right answer is number one. And why should you choose it? Well, you should choose it simply because, it's a simplest solution of all hypotheses as all the other hypotheses, all the other, the two others require one more character evolution than the one we have chosen. This is in fact a scientific principle known as Occam's razor. You should always choose a hypothesis that requires the least amount of extra hypotheses, which is called parsimony or you should choose the simplest possible explanation explaining your character state distribution or your observation. It is very important to stress that this is not a statement about evolution. It does not indicate that evolution is parsimonious. It is just a scientific principle. This picture shows a standard published phylogeny. I will admit, it's pretty boring, but you do have to remember that almost anything we do when we reconstruct the tree of life is hypothesis, from species circumscription to final trees. This may be surprising, but every detail is subject to scrutiny and can be changed accordingly. So, there's a couple of important things you have to know about phylogenies. The only real important thing about phylogenies is that it is a matter or it shows a branching order. It's almost like with the London tube map. It can be drawn in an endless number of ways, but the order and stations and the actual lines that stop at each station is the only real important information. It does make sense if B and C are more closely related to each other than they are to A, then this also applies to C and B. Basically, everything can be broken into statements involving three entities on a phylogeny, just as we have done. If there are not three entities either implicitly or explicitly, relatedness has no meaning. Let's do a simple analogy. No matter whether you like it or not, I'm related to you. I am also related to Alice in the US who's also watching this program. This could go on forever, almost at least, but there first become empirical contents in the statement when I say I am more related to my brother than I am to you or Alice. This can surely be tested so that's what gives it a sort of meaning. [MUSIC]