Hi everyone. I'm delighted to have with us here today Oren Etzioni, who is one of the best known figures in NLP. He is the CEO of the Allen Institute for Artificial Intelligence since its inception in 2014. He's also a professor at the University of Washington's Computer Science Department and a venture partner at Madrona Venture Group. Oren has received multiple awards, including Seattle's Geek of the year, which I thought was cool, as well as been a founder or co-founder of several companies. Really glad to have you with us Oren. Thank you Andrew, it's a pleasure. I've known you for many years, even when I was a student at Carnegie Mellon, I remember hearing about some of your work on explanation based learning from Tom Mitchell. Something I've never really asked you before is, today you are a well-known researcher, but how did you get started in AI? Tell us about your personal story. Well, I really became fascinated with the field in high school when I read the book, Godel, Escher, Bach, which many of us read at some point by Douglas Hofstadter. More than anything, what the book gave me an appreciation for, is that asking, what is the nature of intelligence? How do we build an intelligent machine? One that has human-like capabilities, it's really one of the most fundamental questions in all of science. Like, what is the origin of the universe or what is the basis of matter. So I became fascinated with it, at the age of 18 or so. So what happened? You're an 18-year-old, reading Godel, Escher, Bach, I remember my father trying to push that book on me as well when I was a teenager, but I think unlike you, I did not read it as a teenager. But then what? Well, I did two things. One is, the summer before college I started studying Lisp, the Lisp processing ancient programming language, which of course, gave some ideas into Java and Python, and I found it just endlessly fun to hack Lisp code. Then when I went to college, I went to Harvard, I was intent on studying computer science because that seemed like the path towards AI. I think it's inspiring to think that maybe today there's a high school student somewhere, or maybe the parents of a teenager reading this, thinking that if their son or daughter is reading Godel, Escher, Bach, could pick up an interest in AI as a teenager, maybe they could someday have a career a bit like yours, they'll act to follow, I think it's something to think about. Well, Andrew you're very generous. The other thing that's happened, which of course is drawing a lot of people into the field in which I had no clue about is the fact that our explorations of these algorithms and these methods have led to a very powerful set of technologies, which of course, you've been very intimately involved in. So deep learning, which is now revolutionizing the field in so many ways was not something we anticipated back then and we didn't understand that asking this fundamental intellectual question could lead to so much commercial success as well. You have found a lot of the great scientists were driven by fundamental questions, it sounds like yours was, what is the nature of intelligence? That type of question is enough to drive someone for an entire career. Yeah. I remember also back in the early days you were one of the pioneers in what was called Open Information Extraction from the Web. Tell me more about what that was like working on information extraction back then. Sure. So information extraction basically means the mapping from a sentence to a more structured piece of information, like we have the sentence, Google acquired YouTube, and we can map it to a database tuple that says, "Acquisition, Google, YouTube". Now that, when I got into the field of information extraction, was very narrowly focused. They would look at M and A events or they would look at terrorist events, and they would try to extract the particular semantics around specific events, and it occurred to me that maybe we can do it in a much more open-ended way. One of our mottos was, "No sentence left behind". The idea that maybe we could extract information from any sentence on the web and thereby map the huge amount of information that's available in the web corpus, literally billions of sentences. Maybe we can map it to a very powerful and comprehensive knowledge base. So, we set off on doing that, and the first thing we had to do was vastly generalize the techniques because the information extraction back then was still a machine-learning technique, but it required training examples specific to a particular relation or predicate, like acquisition or seminar location, and there are so many. In fact, that's one of the things we studied. There's potentially hundreds of thousands, if not more, different predicates that are being expressed in a national language like English or Chinese. That would require millions and millions of label training examples. We attempted to solve that problem by creating a new technique which had much more of an unsupervised flavor. So that rather than recognizing that they were acquired is one of the words or predicates that indicates an acquisition. Their lots of ways to say, x acquired y, and to learn that from data was a huge breakthrough. Thanks Andrew. Thanks for remembering right, the field moves forward so quickly. One of the things that I found particularly inspiring at the time was the fact that we realized that there are certain linguistic invariants. Certain regularities where whatever the sentence was and whatever the topic was, there were certain regular ways in which people would express information, activities and so on. For example, the simplest is verbs. Whether it's acquired or graduated or married. Often the verbs were a very strong indication of the predicate involved in the arguments of the verb. Joe married Betty. Now we know a lot about what's going on here. Basically what we were able to do, and I consider this a pretty fundamental piece of observation about natural language, is we were able to realize that sentences express relationships in certain stereotyped ways. Since we did that work in the nineties, the result has actually been replicated in lots of different languages. We've Spanish, Arabic and Korean. In many languages, they found that these regular ways of expressing relationships are available. If something happens, they're a number of ways to refer to an acquisition or graduation or marriage. It's quite narrow, so there's a very strong signal for the learning algorithm. Exactly. I guess some, both you personally and a few of NLP has come a long ways towards then. Today, I regularly read about exciting projects that the Allen Institute for AI is doing. One of the projects that I've heard you speak about another context, I thought was really cool was the semantic scholar project. Can you say a bit about that? I would love to. Our mission as an institute that's a non-profit, fully funded by the late Paul Allen is AI for the common good. We asked ourselves, how do we use AI broadly and NLP in particular to make the world a better place. One of the things that came up on our radar was, can we use it to help scientists, and more generally, the informed public gain access to scientific papers? There's a Moore's law of scientific publication. The number of papers seems to be doubling every few years and of course growing very rapidly. Even diligent folks like yourself can't have read everything that they want to read. In fact, I think there's a limit on the number of papers we're going to read in our lifetime. I thought to myself, okay, AI to the rescue. I remember when the ICML conference or the new rooks at that time, previous in conferences it's well known that I would bring home the paper proceedings and sensibly read every single paper's title and abstract and a good fraction the papers and clearly we're well-passed that phase of AI today. It says great to have tools like semantic scholar. Exactly, and semantics scholars bottle is cut through the clutter. Our idea is to use AI in lots of different ways to find the papers that you want to read. For example, we automatically generate what we call extreme summaries or simply put TLDR's. Instead of having to read all those abstracts, maybe we can give you a one-line summary, and if you like that, you might go further into it, or we use computer vision techniques to automatically extract the figures. That may seem straightforward to a person, but remember these are PDF files which were set up to display information, not necessarily to give you the location of the semantics of where figures and tables are. We automatically extract information that again will, hopefully tell you at a glance, maybe even on a mobile device like this one, is this a paper that I want to read. If I can save you the time scouring through the papers to find the ones you want to read, that's a savings. Even when you're reading them, we're continuing to look into ways to make that process more efficient. Someone watching this video is feeling the gluts. Really they're wonderful gluts of deep learning NFI papers. Would you recommend this to them? Very much so. It's a free service. It's at semanticsscholar.org, so I don't profit from it. We love people to use it. We think has a lot of powerful features that would help both expert research but also people getting started to get a sense of the field to find the key authors, the key papers, et cetera. What do you think fits the story we're seeing is that with the rise of COVID-19, societal wide tragedy, many researchers are starting to publish more and more papers on this horrible virus, and semantic scholar was involved in helping sort out this, maybe fortunately or unfortunately rapidly growing literature. What was the story behind that? It's actually quite a dramatic story. Early in March 2020 when awareness was still growing about the virus, the White House, through a colleague, reached out to us because they knew we had tools for rapidly processing collections of papers. They asked us to put together the collection of all the relevant papers both published and preference at the time and make it available to AI systems, make it available in a machine readable form so that NLP systems, information retrieval, search engines, and others could make use of it. We were quickly, with some help, the White House's help, able to form a coalition that included Chan Zuckerberg, and Microsoft, and colleagues from Georgetown publishers and more, and to create a collection that's now more than 200,000 papers and it's being updated daily, that attempts to capture, exactly as you said, this rapidly growing literature about the virus, and then be able to answer questions about it much more rapidly than ever before. Kaggle was involved creating a set of competitions that became their most popular competitions. A lot of clinical questions and discoveries have been made using obviously the literature, but access through our open dataset, which was called CORD-19 for COVID Open Research Datasets. Again, thank you for bringing that up. It's a great example of how semantic scholar and AI in general is trying to help to make the world a better place. In the bash, we were covering some of this work several times, so it's actually really exciting to see your team doing this, and really thank you for helping fight COVID-19 for all of us. Of course. Oren, you've been a successful serial entrepreneur, and you are at the cutting edge of NLP technologies. What advice or guidance recommendations would you give to someone that's looking to work on or to launch a startup in NLP? Well, one thing that I think about based on my background, launching AI based companies, is where is your data going to come from? What I like to call the dirty little secret of big data isn't just that you need lots of data, but often that you need lots of labels. You need some way of taking a data item, let's say credit card fraud and saying this credit card transaction is fraudulent, this credit card transaction is valid. Often people have great ideas for companies or for products, but they haven't thought through, where's my data going to come from and how is it going to get labeled? I always like to ask people, there's a series of questions to ask, but one relatively unusual one is, tell me about your dataset. Where's your data going to come from? Where are your labels going to come from? That intuition came from probably my most successful company, Farecast, which was a company that predicted airfare prices, their fluctuations over time, ultimately was sold to Microsoft. What was really cool is at the peak we had a trillion labeled data points. This is, again, a company that we formed in 2003. It was acquired in 2008, so back then, a trillion data point was really quite a lot. How do you get a trillion labels? Mechanical Turk didn't even exist back then, and that's a lot of labels. Well, it turned out because it's temporal sequential data about prices, if you predict, say on December 1st, 2003 that the price of a particular flight will go up in a week, all you have to do is wait a week and see whether your prediction came true or not. Just the passage of time automatically labels your data. That observation turned out to be incredibly powerful. It allowed us to label a trillion data points, and with a trillion data points, we were able to generate some very strong predictions. It's very cool, and I guess because the number of labels you have, automatically generated, grows quadratically in time. Because at this moment you can predict a lot of future moments. Trillion is a big number, especially for time series but it makes sense that you could collect this giant dataset automatically. That's exactly right. Another thing that I really find fascinating is there's actually a connection between this and some of the success that we're seeing in NLP. Because even now that Mechanical Turk exists, the number of words or sentences and so on we can label is nowhere near enough compared to the appetite of models like are ELMo, or BERT or RoBERTta, most recently GPT-3, this succession of language models. But again, what they use is the inherent sequential nature of language. They effectively, to grossly oversimplify, they're saying, "I'm going to predict the probability that the next word is farce, or the next word is, is." Well, how do you tell whether that prediction was correct or not? All you have to do is look at the next word. You mask out, and of course it doesn't always have to be literally the next word. But basically you take a sentence, you mask out some words in the sentence, you predict then what these will be, and you check your prediction. The wild thing about a natural language corpus is in a certain very similar sense, it's also self-labeling. These models that we're now using with great success and have become so enamored with, have that property where the data can label itself. Some of the most exciting material in the NLP specialization is Younes and Lucasz talking about techniques like word embeddings or language models or transformer models, taking advantage of this. In fact, there's been a huge boon to the whole field of NLP. On the theme of building these giant data sets, you had a trillion examples, which is really huge back in the day, even today, trillion is really large. We do see this sequence of success stories in NLP where researchers are building bigger and bigger models, transformer models these days or flavors of it, and feeding to bigger and bigger data sets. What's your prediction on the future of this trend? Ever bigger and bigger and bigger and that's the story of NLP, or trend towards smaller models at some point as well, or plateauing, or something else? Well, let me first start by acknowledging, which I think is important, how wrong I've been. I predicted that the growth in model size and the number of parameters and the commensurate amount of data would already have plateaued. I've definitely been wrong about that. We see that with the continued increase in performance. Take predictions from anybody, especially me, with a grain of salt. That said, I do think that they will continue to grow because there is that hunger for performance and we haven't, by any means, exhausted the power of the machine or the size of the corporate. I think that they will continue to grow until we see a very significant plateau. I think at the same time it's also very natural for any computer science field to first build the largest possible model, to brute force it, and then go back and optimize it in various ways, both in terms of data efficiency, data selection strategies, and in terms computationally, so I think we'll see both. A great example of that, this is an analogy, but I think it gives the right intuition. We started with chess and it was only specialized chips and super computers could do it, now we have stronger chess-playing programs on a laptop. It's gotten cheaper and simpler, not just because of Moore's Law, also because of better algorithms. At the same time we've also scaled up to larger games like Go. Yes, I feel like today there are definitely researchers and aspiring researchers wondering, boy, if I don't have a million dollars, if how cheap it is how could I possibly do research in this field? I feel like on one hand, hopefully I think there are lots of opportunities to do exciting work on smaller data sets, lots of groundbreaking research to be done there. Also the history of computing has shown us that yesterday's supercomputer mainframe is today's smartphone or today's smartwatch. We'll see if this trend continues to hold up, but I would love it if some of these most amazing, giant, millions of dollars types of models that you read about in the news, someday, will all be [inaudible] on our smartwatches, but that would be an exciting future if we can get there. Yes. We've actually done some work at the Allen Institute for AI or AI2 as we call ourselves, on a topic we call Green AI, where we say exactly because of the point you made, Andrew, that these massive models result in a large number of people being shut out of the creation of these models. If we ask people to also publish results taking cost into account, taking efficiency into account. If I can spend four million dollars, $12 million and build the largest model, that's one kind of research. But what's the best model I can build for a $1,000? What's the best model I can build if only have, I don't know, 1,000 training examples. It seems like there are a lot of questions like that where if you factor in efficiency, I might say, look, my model isn't as good as Big Brother's trillion parameter model, but it only costs $1,000 to train, it only cost a $100 to train. Its trainable on a laptop. In fact, we've been talking about something we call NLP in a box. What are the best NLP capabilities you can derive from simply a laptop or simply on a phone? There are many situations where that's really the question. Let's just take a phone as an example. I keep flashing this as a prop. For privacy reasons, I may want to keep the data resident on the phone, or I may have intermittent Internet connectivity, and so I can't just always upload things to the Cloud. Now I suddenly need to think about models, whether they're natural language models or vision models, that are optimized to run on a limited device without exhausting the battery. That's a really key point. That leads to an array of research that many more people can participate in. Today, many NLP teams train a model and then have to v-engineer it or compress it or something, to make it run on a mobile phone. People often don't think of even just the download size even. If you have a very large downloadable for mobile app, user is actually less likely to be willing to, depending on the country and cost of bandwidth in the country, to either download and install that. Of course, the rise of Edge Computing. A lot of exciting work on getting these things at work and Edge devices as well. You have, for many years had a foot in both the academic world and in the industrial world as a professor, also a CEO of a non-profit, an adventure upon there. Today, a lot of people are asking, how do you choose between the academic and the industry pathway in AI? What advice do you have for someone trying to build a career and looking at academia and in industry. Well, let me answer it in a geeky terminology that will hopefully be really familiar to the folks taking this specialization. To me, it's a question of what you're trying to optimize. If you're trying to optimize compensation, how much money you make, or even adrenaline. What's the most exciting, but in exciting in the way that a car race is exciting or a poker game is exciting, then the world of startups, the private sector, naturally beckons. If on the other hand, you are trying to maximize freedom, the ability to ask your own questions, the ability to sit back and contemplate and really think, really deeply uninterrupted about fundamental intellectual questions and the questions that you want to ask, not somebody else, well then there's no substitute for academia. To me, I'm old enough that at different points in my career, which has spent some decades, I've focused on different things. One of the biggest academic highlights for me was graduate school at CMU where I worked with Tom Mitchell and I could spend months just delving into one particular question. I finished my coursework, I could just go as deep as I could in answering a question. Other times, when I did a startup, just that feeling of putting a team together and working so hard to succeed. That roller coaster ride of this thing that we built with sweat, blood and tears, and we own, and we're going to make it a success. That it was also an incredible feeling but very different. I sometimes liken academia to playing bridge, and startups, the commercial sector to playing poker. Both are fun, but they tickle different neurons, at least in my brain. From that description, I got that you are both a bridge player and a poker player. At various points in my career. I play a lot of bug house, partner chess. When I was in high school, I played chess, I was actually captain of my high-school's chess club. Then after Garry Kasparov lost to a computer, I gave up playing chess, but this sounds like fun I should check it out. Well, we should play sometime. One of the things I've seen you do is engage with regulators and help think through what are appropriate AI regulations. What are your thoughts about regulating AI? Or to take an example closer to NLP, we've seen that really unfortunately language models and other NLP systems trained on natural language, trained on text on the internet, learns to exhibit some of the very undesirable biases that is exhibited by texts on the internet. I think as AI technologists we do our best to diminish and squash that in our systems. But what do you think is the role of regulators in this frankly, really naughty NLP problem where we have wonderful the performance systems, but very problematic aspects of bias because of the day-to-day learn from? How do you think a regulator should think about that? I do think that's the toughest question you've asked me Andrew. What should the role of regulation be for Natural Language Processing? I would be very careful to avoid legislating our values into the technology. I would really allow 1000 flowers to bloom. What I would look at a lot more closely, is specific applications. NLP is a broad technology and that technology, as you pointed out, is prone to bias. But it's how this bias manifests in particular applications, that's what needs to be regulated. Let me give a concrete example. If we build a resumes scanning app application and that exhibits bias in favor of men over women, obviously that's highly problematic. We don't want to have that bias, it's illegal. In that context, we should block that. We should audit those sorts of applications and disallow bias to appear there. But what I would really not want to do is regulate basic research into NLP based on these ideas. I think that's really the most important point. Regulate the applications not the research. I would love for regulators and technologists together try to figure out what is a fair standard to hold these systems to, and then rigorously audit the systems such as a resume screener to a well-articulated standard. Then hold the AI teams accountable for reaching that standard. If we do that, hopefully, we can also avoid 'gotchas', where the AI team goes along and then many years later there's maybe even fair, but surprising criteria. That if only we had all realized we should judge that AI system on this criteria, we could have avoided the problem in the first place. It's tricky. I'm glad that community is working on this. I want to just highlight at least one more point there. Again, there's so much to say on this topic, but I think the word audit that you mentioned there is really a key one, because for example, in the European Union, they've started thinking about the right to an explanation. They say, if I have a model, then the model has to be able to tell me why it came up with its conclusion. The problem with that is, you and I know very well that deep learning models that are based on a very large number of variables, a lot of parameters, a lot of data, may really struggle to provide an explanation that anyone can understand. If that regulation is created, what we will end up with is explanations that are either incomprehensible and thus useless or inaccurate, right. They're clear, but they're actually not correct. They're not high fidelity explanations. Then they're not useless, but they're misleading, which is perhaps even worse. Another option which is encapsulated in what you said is to say, no, I'm not going to insist on explanations, but I'm going to insist on the right to audit. If you create a model, a regulatory agency or a third party like the ACLU or like an academic, should have access to it to audit its behavior and to check whether it's exhibiting bias. Now, we can rely on the marketplace of ideas and the interaction between different bodies with different incentives like journalists, non-profits, and so on to check on each other. I think that situation is a much more robust and interesting one. VTA, Vision of Transparency and Auditing. I think hopefully will shift society to want more fairness. Actually, just one final thing I want to ask you which is, you've mentored a lot of students, a lot of engineers early in their career, you've helped a lot of people become really good at NLP over your career. What advice do you have to someone watching this video today that wants to break into NLP or grow their career in NLP? Well, I would say at the early stages, make sure you've got the fundamentals right. We're talking about statistics, computer science, understanding of machine learning. I think that's essential because again, the flavor of the month, which the flavor of the year is transformers, as you said, that changes very rapidly. You want to make sure you've got the fundamentals right. Then the second step is, I do think that we've seen online courses to be extremely successful, extremely cost-efficient, and widely accessible. That's the next place I would go to, and of course, our conversation is part of an NLP specialization. I haven't studied it in depth, but knowing you Andrew, I'm sure it's very high-quality. Thank you. After that, after someone's done studying online, what's after that? There is no substitute for doing it yourself. You only understand something at a certain level if you've done it at the course level. You've got to take a real problem, take a dataset, and do it yourself. Find out how it works or doesn't work and you could be surprised. You could find that the problem that you're excited about is easier than you expected and you can do really well. Or you'll find, maybe I didn't understand that concept so well, or maybe this problem is harder than it seems, and that might lead you to a new invention or a new idea. There's also no substitute for practice. That's good Lorren. Thank you. Maybe many of our learners, I hope, will follow your advice and end up someday becoming great NLP researchers, or scientists, or engineers and build amazing systems. That was inspiring. Thank you very much Lorren. It was really great having you and thank you again for joining this interview series. Well, it was a real pleasure, Andrew. Thank you for all that you're doing to be a champion of the field in general and to bring this information and these ideas to so many more people, right. We need that if we're going to make the kind of progress that we should be making to use AI to make the world a better place. I think so. Thank you Lorren, For more interviews with NLP thought-leaders, check out the DeepLearning.AI YouTube channel or enroll in the NLP specialization on Coursera.