Joining me is Drew Conway, data scientist and entrepreneur. Thanks for joining us Drew. Thanks for having me. I'm happy to be here. We've used your Venn diagram in the course. Can you talk a little bit about this vexing question of what is data science exactly? Sure. It's interesting, I think it's a question that we're still trying to figure out the answer to even as the discipline has become more professionalized over even the last five years. But thinking about the Venn diagram as a baseline, part of what I was trying to do in making the Venn diagram, in attempting to define the designs was just say, this is inherently interdisciplinary discipline. That the pursuit of understanding and extracting value from data for the purposes of a business or organization is something that inherently requires many different skill sets. So to me, data science is really more of a process and a teaming function rather than a single role. In the same way that various other sciences or other endeavors have many different features to it, so does the pursuit of data science. So if we think about the Venn diagram, there are these constituent parts that are very common in practice. Things like understanding how to work with data, having some technical competence and expertise in working with large-scale data processing systems. Then of course, there's the math and statistics knowledge that you need to be able to actually model and make some predictions, some classification, some forecast. Then I think the part, and even again, as we think about the original Venn diagram thinks that people have talked about since then, really the focus on having some expertise in the business problem or the area that you want to apply this technology and this statistics to is really important both because it's where you get your questions from and what are the things that you actually want to investigate, where your hypotheses come from for the business, and then also how do you interpret the results? Because getting data and applying technology in that data has become easier and easier in the almost 10 years since we first published the Venn diagram but what hasn't gotten easier is actually asking good questions and interpreting the results. So that to me is still the biggest challenge. How do you think a generalist makes themselves a strong collaborator for their Data Science resource? Well, I think it's probably not that different than most other team functions in an organization. If you're someone who lacks the technical skills or the statistics training, I think it's important that you try to have some basic numeracy or statistical literacy because you have to be on a team, you have to understand essentially what the conversations are, why certain decisions are being made, but there may be many other things that you can bring to that group that are going to be really valuable. I expect one of the most valuable things that I've seen in my career are people who can take the application of complex methodologies or tools to a problem and then be able to actually articulate their results to a non-expert audience. So if you're working in an industry that is, say, not particularly data-driven, you're working, say traditionally you're working in media, you're working in fashion or something and you want to do a statistical analysis, you want to apply data science to that business. It can be very difficult to articulate the results of that to a senior leader of the management team for their 20-30 years career who has never thought about their business in that context. You having someone who's actually good at doing that and is a generalist in the sense of like they know enough about the tools, the techniques and the technology that went into that and can express that to a decision-maker that can be just as valuable or even more valuable than to the technologies and data scientists who are doing the work. The topic of our course is Agile and the intersection of that with data science and analytics. Can you talk a little bit about both the significance, if any and what's hard about doing data science and analytics in small batches as we do in agile, small evidence-based batches? Sure. Again, I think it's a part of the discipline that is very much a working progress, certainly from what I've seen, and for good reason, I think partially because of the natural interdisciplinary nature of doing data science, you often find yourselves, and I find myself in the situation today like many times before, where I am managing a group, a team of folks who have very different sets of backgrounds. So some portion of the team are software engineers and have a baked in viewpoint on Agile and how you would organize and size work and then decide on a week or two week basis what you want to get done, and then to folks who are more business analysts, who may be able to operate in that viewpoint but don't have software engineering background. To folks who are physics PhDs who just came out of a post-doc and are used to the 6 to 12 months development cycle of a research paper and have a research design that fits into that cadence. So as is often the case, I mean, I think the thing that ends up working is figuring out what part of the tools fit into a compromise practice that allows for the best outcome. So as a case that I've seen be really successful in a couple of different places is you can use the formulation and organizational structure of Agile in a sense, where you have a Kanban board where you have a backlog of things that you want to be able to test, you have a set of things that you want to prioritize and articulate in more detail. Then I think the critical component of a piece that Data Science really benefits in borrowing from Agile is saying, okay, if you have some big research projects, some large ambiguous thing. Your boss has said, build a predictive model for how to understand our business and churn within our business in a way that, from all the data that we have, no further instruction than that. A big part of the challenge there is breaking that problem down. You have to break that problem as a fundamental challenge of doing any of this work and does not require any technical knowledge per se. But you do need to be able to say what are the pieces of this that I want to be able to go after first. Do I first need to identify the data sources. Once I've identified the data sources, are there any alternative data or third-party data that I need to bring in that will add value. What is my hypothesis? Or effectively, what is my dependent variable? What is the thing that I'm tracking the change on? Once you've gone through and done all that, well you've actually gone through a process of breaking things out is something that may fit pretty well into that three different stages of development cycle. We can organize these things in the agile way. Then ultimately the biggest challenge, there's two big challenges out for that. One is, how long do you work on something before you give up, before you say, I can't do this. In data science, we often find ourselves in a quasi or fully R&D capacity. We're hypotheses driven, because hypotheses, excuse me, can be wrong. So the question is, do you have enough experience or have you done enough where you know when to say either we've done this for two weeks and we haven't gotten anywhere and therefore we pull-out or in the better case, although sometimes harder do is what are my metrics going into this that I'm going to measure as a result now that I've been successful? In a business case that may be easy. You know that you want to be able to identify X-number population in your dataset and if you hit that number, good, you did it. In other cases they may not be again, where it's this ambiguous new idea that no one's really tests before and that ultimately comes down to having good management. You have to be able to pull the reigns in and not allow people to go off on these rabbit holes and sometimes it will require some culture. That is a terrific perspective on both agile and the practice of data scientists. Thanks. Drew. You're welcome.