Joining me is Drew Kanaly data scientists and entrepreneurs. Thanks for joining us, Drew. >> It's a pleasure to be here. Thanks for having me. >> Let's talk a little bit about your data science journey. I mean, as you came to kind of talk about the discipline and all that you've done for that discussion. Can you talk a little bit about where that came from and how you became a data scientist? >> Yeah, absolutely. I'm glad that we're talking about the Venn diagram as part of this. because honestly, that's in some sense kind of the culmination of a journey where I actually realized that I was a data scientist. Because I'd been doing the work that I've been doing long enough that it starts in a place where data science wasn't a career path or at least wasn't named for. For me, the thing that ties it all together is that I've always really been interested in taking kind of tools and techniques from computer science, math, statistics but applying them to social science problems. So going all the way back to when I was an undergraduate. I did a weird thing where I was a double major in computer science and political science and I don't think anyone had ever thought to do that. At least at the little liberal arts college I went to. And then from there, worked in National Security in DC where I was directly asked to think about kind of understanding human behavior, human decision-making, add scale. But with the tools of technology and those days, they call this computational social scientist. I think that the people that have my job at least the job that I did today are probably called data scientists. And so, as I did that part of my career, the very very early part of my career and got to a place. Where I realized that while I had a lot of practical kind of computational social science or data science experience, there was a lot of training gaps. It's particularly in the statistics side of my training that I wanted to fill in. And through a somewhat roundabout process realized that actually the thing that I wanted to learn how to do was be a real professional researcher. But ultimately, what I was most interested in is going after these big kind of ambiguous problems understanding why do large numbers of people make that choice? What information do they have? How do you model that and actually learning how to do that? And so that's what led me to pursue a PhD. It's what brought me from DC to New York. I did my PhD at NYU and it wasn't really till I got to New York and started meeting people in the entrepreneurial community here in New York. Prickly, those startup community where I realized that they were asking a lot of the same questions. Folks who worked in finance, people who worked in media, people worked in social media. They were looking at data of people's buying behaviors, what people we're posting on various social media platforms. And wanting to use that data to understand their customer base, wanting to understand the economics of a country over a market. And then I realized we're all thinking about the same kinds of problems. We're all using a similar set of tools and then suddenly that kind of confluence of data technology and the pursuit of knowledge really around kind of large-scale human behavior of the likes. Of which prior to kind of late 2000s and early 2010's was really not possible didn't have that kind of scaled observation. That's when kind of this idea of data science is born. And quite literally at you know at that time frame, I got to New York in 2008 around, in the Venn diagram, we published I think 2011, 2010. I sat down and said well, I guess if I'm a data scientist, what is data science? How do I actually understand this in the context of the history of my career that brought me here? And was kind of through that path that I got to realize that actually I'm a data scientist. >> And what's it like being added a scientist? And what makes a collaboration work well versus work not as well? I mean, to answer the first part of your question, if you're the kind of person who loves to think about heart problems and then build a solution. Literally, whether it's through software or through collaboration with software engineers, then it's a great job and it always is a great job. I think it will be a great job for a long time. But, Ginger, the second part of your question, sir, what makes it good and what makes it harder? For me, it always comes back to the articulation of a problem, the streamline this and focus of that problem. And then what data do we have, and then what is our hypothesis now? Looking at it from a different way and to maybe kind of editorialize a little bit on the discipline. I think why a lot of people get interested in data science is that they get very focused on a set of tools, right? I want to be doing machine learning or I've downloaded tensorflow and I really want to learn about neural nets. I think I can apply it to a lot of different things and that's true. But you're just doing, you're solving for toy problems or you do have a hammer and you're looking for nails, right? What's much more valuable is if I can work with someone who maybe isn't a data scientist or isn't technical at all, but has deep expertise on an area. And they're interested in improving their ability to understand that market or understand that area. Then I can work with them to identify what the inefficiencies of knowledge are in that part or in that particular business. We can collectively or I can go try to find a data set that I think provides some director approximate measure of the thing we care about. And then honestly, the application of the tools if you've been doing it for long enough, that's the easy part, right? I mean, even think about in the context, like if your students sit down with a Jupiter notebook and they literally look at the lines of code dedicated to getting a data set, place where you can actually do some prediction. And then the part of it that's dedicated to the prediction, it's 9010. I mean, the application of the actual modeling part is pretty commoditized at this point, right? We've mentioned tensorflow is a great example of that but it works really well when I get that answer. I've worked with a someone who articulates the probably well. I have the answer that I can go back to them and say help me understand why you think this is happening. What is it about this that really works and then that becomes a positive feedback cycle, and those are the places where we always have the best outcomes. >> And one of the things as I would characterize myself as an enthusiastic amateur at best when it comes to data science. But one of the things that I found most interesting is this idea of how we frame the dependent variable and think about its actionability. Can you talk a little bit about the challenges there? And for example, for a team that's working in those the agile Cadence's you mentioned. How did they look at kind of the the the economic or some sort of significance of their output and make that a part of their data science process? How do they do it more effect? >> Yeah, there's a great question and one that I think is really kind of a first principle problem as you're getting started, right? You say, I think there is, sometimes it's nonlinear, but I think it works well. It can be a fairly linear process in which, let's take my example of we're working with a subject matter expert or we have a team of folks. And people have expertise in the business subject market whatever. So they say we want to be able to understand we think we are I mean whatever way, we think there's a relationship between X and Y, right? We've observed this in our in our business so we think that there's a relationship between these two things. That's the best possible case because then if you have a measure of Y you have a measure of X, then it's that straightforward process of getting the data doing it. When that's not the case, then having the ability and this becomes, one of the skills I think of a very good data scientist is talking to someone where there isn't an exact X for Y, right? So my favorite example of this is we think that there is a relationship between the weather and same-store sales of large big box retailers. We think that that's that's the case. Now, we don't have access to their transactions data. So, how can we think about actually measuring the relationship because we can observe weather, right? If we think of weather as the independent variable and we want to we want to see this relationship on sales, how can we do that? And one way that I like that people have thought about this as well. Actually, if we can look at satellite imagery of parking lots and we can see if people are showing up to the store. That's something that we can measure directly. Now having the idea to do that is something that takes a frame mind of a sort of mindset of on trying to understand what are the things that I can measure that give me an approximate estimate of the thing that I really care about? And that is very hard and I think it takes it does take some practice. It takes a little bit of experience and quite frankly some people just have a good a good eye for it. >> And some great perspective on the discipline and thank you for sharing your data science story, Drew. >> Happy to do it. Thanks for having me.