Welcome back to "Bioinformatics: Introduction and Methods" from Peking University. I'm Liping Wei from the Center for Bioinformatics at Peking University We are delighted today to have Dr. Michael Waterman with us here today. We also have a few students in the audience who are here to speak with us. This is Dr. Waterman as a young boy, handsome from a very young age, as always. And this is a historical photo of Dr. Waterman with Dr. Temple Smith, taken by Dr. David Lipman. The main purpose for us to be here today is really to hear from Dr. Waterman himself how he and Temple Smith created the historically important Smith-Waterman algorithm. And he'll tell us there is a great story behind this photo. This is Drs. Waterman and Smith later at Boston University, I believe. In addition to his exceptional research career, Dr. Waterman also has done a lot of teaching and a lot of interaction with students. And he'll tell us the story behind this as well. As a pioneer and leader of the field of bioinformatics, he is also one of the three founders of the RECOMB conference. Without further ado, let's welcome Dr. Michael Waterman. Thank you very much for having me here on this beautiful campus and this amazing undertaking you are starting off on. I think you asked me to say something about my days as a student and how I got into this. I was from a rural place. And I majored in mathematics as an undergraduate because the courses at the beginning were easy for me. And they gave me a lot of room to take other science courses and lots of courses in literature and philosophy. But when I was a junior I took a graduate course in mathematics and quickly learnt that all math courses were not easy. I did my Ph.D. on a field that's called dynamical systems now. A very theoretical Ph.D., but we used computer skills at the end. And I had never taken a course in biology, and I had no intention of doing any research in biology. But at my first faculty job, I occassionally spent summers at Los Alamos National Laboratory, and there was a great mathematician there. And he, for some reason, had the idea that there was going to be mathematics in this new biology. I didn't really know what it was. And when was that? This was around 1971. I think in '72 or '73 I spent a summer there, working on what turned out to be sequence alignment. The goal was to align protein sequences and build evolutionary trees, which we still do today. And we had a very large dataset. We had 27 Cytochrome C sequences. It sounds tiny today, but it was not tiny at that time. And we studied distance methods for aligning sequences and published a couple of papers. And later in the 70s, [in] the protein data bank Margaret Dayhoff used the Needleman-Wunsch algorithm to align the sequences, which was a similarity-based alignment system, a very strange algorithm. It aligned, it used, multiple gaps, but they were all weighted the same thing. So it could easily be a ready algorithm if we knew what the implication was. And as mathematicians we thought that the distance was obviously superior. And the woman who ran the protein data bank said "I don't see it." So we sat down to show that our methods were better. The result of that was our paper which showed that distance and similarity were two sides of the same thing. Or, with a similarity algorithm, you could write an equivalent distance algorithm. The sequence alignment would be the same. And this quite surprised me, I must admit. So we had this at the back of our mind. In 1977, introns were discovered, while later a Nobel Prize was given. Well this discovery, it was shocking. Perhaps today you aren't shocked by introns. But if all you know about biology is genes in E. coli, the fact that there were introns--the stuff got snipped up and thrown away--and it was just mind boggling. And immediately we realized that the alignment methods we were using wouldn't work if you have to snip out pieces and throw them away. We didn't know what the problem was or what to do. But immediately we wanted to address this problem. And we failed for some time. A mathematician published a distance-based method for this. But if you think about it, three A's in a row are zero distance from three A's in a row. And also if there are three million A's in a row, they are zero distance from three million A's in a row. So you had no way to distinguish between distances that were small. And in 1980 we stumbled on to what's now called the Smith-Waterman algorithm. And the algorithm was there before the definition was there. If someone had given us the statement of the problem in that little paper we wrote, we would have solved it in 30 minutes. But we had no idea what the problem was! So these were very interesting times of stumbling around trying to figure out what we should be working on. And it wasn't immediately accepted at all. I mean, it was, if you look at the citation build-up over the years, it's linear. And the slope--there was very slow growth at the beginning. But eventually I think people appreciated it, perhaps because it's easy to code. It's also appreciated. So in the first week of the course, I showed the paper, your paper in the Journal of Molecular Biology. It's an elegant paper. We'll teach the details in the course. Very elegant modifications that did the trick. Yes, it's subtle. It's quite subtle. It's subtle, but [there is] a simple elegance in the way that it was worked out. I think last time I checked in Google Scholar, it was cited around 8000 times, one of the most highly cited papers in life sciences. Hiding in that paper when you read it, we gave scoring parameters that haunted me for years when people asked me about that. And I did the silly thing of making the expectation be zero, and did something for gaps. But of course if you take a fixed length sequence and insert the letters at random, and make the expected score to be zero, that's not the same as the expected score when you allow that gap to vary all over. And I knew there was a problem there that eventually was Karlin-Altschul's statistics. But it was a couple of years before we started working on that dimension. Always as I see that paper, I remember the trouble it causes me. People either believe we were promoting those as appropriate scores, or of course not. So maybe you can tell us the story behind this photo. When was this taken? It was taken... this was Los Alamos. And David Lipman has an M.D, and he was... he did his residency in Arizona, Tusan. His degree's from New York. On his way he stopped to see us. He knew about Temple and I. He hadn't published anything in this area. He knew about Temple and I. He stopped to see us at Los Alamos and he took this photo. So it was before--the joke is--it was before he was "David Lipman". And some things about the picture--at Los Alamos you have to have these security clearances. And the security clearances were hanging off our shirt collars. That's how you did it at Los Alamos. And Temple Smith was an Easterner. I grew up in the west, riding horses and so on. So Temple Smith was the Easterner. But he's got the cowboy hat, and he's wearing a belt buckle that has "New Mexico" on it. He's very proud of his cowboy gear. I had just spent a year in Hawaii. So that's why my Hawaiin shirt. And I think my belt had some sash that's hanging off it. So this picture... David sent us this picture after he was back in New York. And I remember really liking it, but I lost it. And at some point years later, I wrote to both of them and said, would you guys send me a copy of that picture? No one sent me. When I finally moved out of my office into a new building, this picture popped out. So I scanned it, I think, at this point. How did you meet and start to collaborate with Temple? He was brought to Los Alamos by Stan ???. What was his background? His background was a physicist. He wrote a Ph.D. [thesis] on neutron transport. He was getting out of that. He had written at least one paper in biology that looked at the entropy of the genetic code. So he had done some work. And he and I really get along. Then for years when I was working for Los Alamos, Temple would come for a month or two a year, and we would work on these problems. I thought it was just my hobby. How did it become a career? Well, in early 1981 I decided to leave the lab and I thought what should I do next? And I thought I'd done some work in geology. I thought about going into geology. And I looked at the work I was doing in biology, and it looked like the most fun. And I thought there would even be data to check some of these things. I had no idea, of course. So that's how I decided. I went from Los Alamos to be visiting professor at University of California in San Francisco. So that's when I became full time, doing whatever this is. And this next picture was taken at Boston University? Yes, at Boston at Temple's retirement party. And this one? And this one... I was for five years a residential faculty in a residence hall at USC. so I lived in the building with the students. I had a much better bedroom to live in, for sure. But I ate in the student halls, which were I think where this was taken, and went to events. I spent 20 hours a week, doing one thing or another with the undergraduates. Most of them...almost none of them were in science or engineering. It was great. And you have to tell us about RECOMB. Yes. I believe this was 1997. And we wanted a conference that took statistics and computer science, and biology all seriously. The first conference was in Santa Fe. And Sorin Istrail really worked very hard to set this up. And it's uh... usual at RECOMB conference to have biology talks as well as computer science talks. And I started that actually at this conference. And the audience were shocked. They had no idea why a biologist would be talking about how he gathered the data and so on. Today when I go to the RECOMB conference almost every attendee knows more biology than I do. The change in this area is amazing. Nobody asks me why we are having technologists and biologists under one roof. The field has grown so much. and really matured. And you definitely helped as a pioneer to get us where we are today. I'd also like to share with everybody that Dr. Waterman is receiving a Friendship Award from the People's Republic of China. It's a prestigious award in China for people who have made significant contributions to this country. So congratulations! Thank you very much. Maybe you could tell us a little bit about this as well? Uh, well, you probably know as much about this as I do. But I'd like to repeat a story I told you coming over here. When I was an undergraduate student, as I said I took a lot of literature courses. And I came on to an English translation of Tang Dynasty poems. And I thought these poems were amazing. They are amazingly modern. And they remain amazingly modern. The English translations, for sure, which leaves all kinds of implications that're in the original poems out, but they have their own existence. But at that time, the only Chinese person that I had come in contact with, I was at Oregan State University which was agicultural school. The house I was living in, once for a few weeks, there was a Chinese guy. He knew no words of English. I knew as many words of Chinese. And I just wondered about this country. It was so behind a mist. I didn't know anything about it. I was so pleased that I got to spend time in China and hang out with Chinese people. And you mentioned you came to Peking University campus in '97! '97, yes! There were... the streets were seas of bicycles with an occassional automobile. And of course it's the opposite today. So maybe we'll give the students an opportunity to ask you some questions. Sure. it that's ok with you. Of course! Nice to meet you, Dr. Waterman. I'm Boxun Zhao from National Institute of Biological Sciences. I admire you very much for your great work. As we all do! Today I have a general question about team work. You know many research milestones are made by a team, just like you and Dr. Smith, Needleman and Wunsch, Watson and Crick. So do you have any suggestions for us on helping a team function well? Thank you. That's a very good question. You know in the small team that I worked in, I think one of my roles is to make sure other people are really informed and really smarter than you are, and have different skills. Back at the time when I was working with Temple Smith, he knew much more biology than I did at the time. And as a physicist he had a physicist's way of looking at things, and I had a mathematician's way of looking at things. But now we have these huge teams. There are 50 authors. That's a different order of business, I think. So the boss needs to know what is going on, bringing in, once again, bringing in people with different skills, who have a lot of integrity and are committed to the overall goal. Nice meeting you. My name is Wenxiong Zhong from Peking University. My former major was biology, but now I'm working on engineering and we are studying how to build a new sequencer. I'm working on algorithms for image processing. So I have a problem that, is it possible to apply your famous algorithm to the the problem of face recognition, so that we can know if the figures in two differnet images are actually the same person. So can your algorithm work on higher dimension in the problem of alignments, so that we can align not only DNA sequences, but also protein structures which are much more complex, or even gene regulatory networks. Thank you. Dynamic programming has a very great limitation when it comes to looking at higher dimensions, comparing higher-dimensional objects. So we write algorithms for two-dimensional matching, but they are not particularly useful. It really has its power in one-dimensional situations. And even there, of course, the computational complexity is high. My name is Zheng Xianing from the National Institute of Biological Sciences. We all know that there are many people transfering from other fields such as physics, mathematics, or computer science to biology. And my question is, in your view, what role does mathematics or computer science play in the past in bioinformatics, or in the future of bioinformatics? I mean, I have to say it's essential. What else could I say? But, but, having said that, it can't be just pure computer science or mathematics. It has to be integrated with biology, working on the right... of course the right biological problem. And very often these problems won't have polynomial time algorithms necessarily. You have to devise a simulation method or what's called a machine learning method. So... but I don't think bioinformatics can go on if you make computer science and mathematics illegal to use in bioinformatics, it would die. Nice meeting you, Dr. Waterman. I'm Liu Xiaomeng from BIOPIC. Thanks for sharing with us your experience. I have a question. When you are actively doing research, sometimes you face some difficult time. My question is, did you meet some difficulty, and how did you overcome with it? In my year on sabbatical, at University of California in San Francisco, I got interested in the occurrence of non-overlapping patterns in sequences. If you specify the pattern, one pattern going along the sequence is something called the renewal theory. And Willian Fellor worked on this after World War II, wrote a beautiful chapter in his book about this. And I was thinking about restriction enzymes. And what about a collection of restriction enzymes? So you have number patterns that couldn't overlap with each other. So during that year I started trying to work on this. And I would work on this for a week, and decided that it was impossible. It couldn't possibly be done. And I would give it up for a few weeks, and it would keep bothering me, and I would come back and work on it some more. And eventually now I can describe this in a one-hour lecture, and it is understandable and I just had to look at it the right way. But it took the longest time. And I gave up several times. What kept you going? It just crept around in my consciousness. It would just be there. And I, you know, it can't be impossible, I would say, and tackle it. So I couldn't leave it alone. That's the answer. I actually also have a couple of questions to ask you. With the next-generation sequencing, the amount of data is increasing exponentially at an even greater rate. The Sequence Read Archive, the SRA database, is doubling every 5 months. >From 2010 to 2013, the data increased by 100 times. 100 times as much data now. 10 to the power of 15 nucleotides now. And it's going to double in another five months. So how do we keep up with this? No single laboratory can keep up with this data. Even NIH in 2011 thought about discontinuing SRA because it was running out of budget. But fortunately it changed its mind so it's still supporting SRA. But as you see, it's even a problem for NIH. How do labs like ours handle this? I don't have a good answer for you. But we may come to the time, when it would be easier, instead of trying to find something in database, to just sequence that person again. It may have become so easy to sequence that keeping the archive may not be worthwhile. I don't have a good answer for you. Everyone is struggling with this, my building, my university, my campus, the medical campus, and of course, everywhere else. It's a challenge. It really is. It's not so easy to get all of these data from one place to another either. Put it on a floppy disk and carry it across the room. It also has opportunities for technical development and research. Definitely. Definitely. There are serious people in computer science who are studying data compression and data analysis. What do you think about the future impact of next-generation sequence? It's already making great impact in biology. What do you think the future holds? Uh, you know, you caught me, because often I'm asked to describe the future of bioinformatics. And I always answer by saying, "Well, tell me the future of biotechnology, and I'll tell you the future of bioinformatics." But you are asking me the technology future. I'm just amazed at the sequencing machines that had come out since the human genome project. It's astonishing. And I think there's more to come! And since you mentioned it, what do you think about the future of bioinformatics? Well, yeah, that I just said it. Well, it follows the technology, right? And the biology, but biology follows the technology too. There are some amazing single-cell imaging techniques to see different numbers of proteins moving in a cell. It would have been interesting to think about it in 1990s, but it's done this. Do you feel the field reaching a plateau, or nowhere near it? We are probably nowhere near it. It's certainly an exciting time. I mean, just think that there are incredible discoveries in biology that are almost just as exciting as introns that are still happening. It's still a young field, even though I sometimes don't look at it that way, but the fact is that it is. Well, thank you very much! This is really wonderful to have you here. Thank you for having me.