Joining me is Casey Lichtendahl, a colleague of mine from Darden, a faculty member that focused on analytics and data science. And currently a visiting researcher at Google. Casey, thanks for joining us. >> Thanks for having me, Alex. >> One of the things we've been talking about for years, which I just think is such an interesting and important intersection point between the practice of data science. And agile and just innovation at large is this thing that you call data at rest, versus data in motion. Can you talk a little bit about what that means, why it's important, and how the discipline of data science is evolving to include this into their perspectives? >> Yeah. So, the data and rest idea, well, maybe I just define these terms first. So, data at rest is data that you've pulled out of the stream of data that's moving at the speed of customer interactions with your product or service. And there's a batch of data you've taken a certain point in time. And you're going to go off and do some analysis of this data, that analysis might take you a few days, a few weeks, months. Sometimes it's a big study, it can take a year or more. Academics who pursue research oftentimes will spend years on a project. Studying that data set that may have actually been taken out of the stream of data, that's this ongoing flowing river, may have taken that batch of data 10 years ago, 20 years ago. Sometimes there are these famous historical data sets that get studied for 50 years, 100 years. And the assumption when you do that kind of study, this data at rest where you let your data get a little stale, relative to where the stream of data is today. Is that the world is stationary, and that things aren't changing from year to year, month to month, even even week to week. And if we don't sample more frequently from the the stream of data that is both immense and it's moving fast. If we don't maybe sample that on almost a continuous basis to where, using our data science technology and tools. We're feeding data almost continuously from the stream into a model that can produce real time results that a human can look at on a fairly regular basis. To monitor reliability of what's going on, and to also try to flag and highlight anomalies that somebody might go off and do a little bit more investigation into. It's this data in motion that I've really begun to appreciate more and more being at Google, watching these site reliability engineers who set up these monitoring dashboards. Where data is ingested into models and performance of models are populated on to dashboards. And this is done on a very frequent basis. Sometimes this is done daily, hourly, maybe even at the minute level. So that we've got these dashboards populated with some almost continuous measure of what is going on with the quality of our models. >> I mean, it sounds great, and Google is notoriously good at doing this. Can you talk a little bit about what is the tension and why does it exist in industry in general? I mean, what what's hard about transitioning from data at rest to data in motion for data scientists, and what are what are the rewards that they'll see from doing that? >> You need the engineering support to set up the pipeline to get the data in motion to come into a model. Have that model produced some results, and then visualize those results in some way on a dashboard. That whole pipeline needs to be engineered, and you need data scientists to engage in the engineering of that pipeline. So, those are maybe some software engineers that helped design that pipeline. But it needs the data scientists to help focus, again, on the right dependent variable, what the model is. And then how we're going to measure the performance of that model, in trying to predict where we're headed with this constant stream of data that's coming out of this pipeline. And so, it's software engineers. It's site reliability engineers, and then it's data scientists. And if you're feeding this data back to a dashboard, it can even involve a customer engineer. Who's talking very closely with, say, a large customers who's using one of your services. And that customer engineer is showing the results of that whole pipeline, that dashboard to the customer to help them understand what is going on with their usage of the product. >> Do you have any favorite examples you'd like to share to illustrate how this works in practice? >> Well, that loop is what goes on at Google Cloud with the large Cloud customers that use virtual machines on Google Cloud. That loop I've just described, with data scientists, software engineers, site reliability engineers, and customer engineers. That is a loop that's essentially been closed at Google. And it connects data science all the way back to the customers. So that they can see what's going on with their own usage of Google Cloud. And it's closing this loop and first getting data science in the loop, and then closing it all the way back to the customer that I think can really lead to some, Powerful lock in, frankly, right? It's going to be difficult for that customer to get the same kind of service from another provider, and end up deciding to switch from using a product or service like Google Cloud. >> Some great thoughts on the practice of data science in an agile environment. Thanks for joining us Casey. >> Thank you Alex. It was a pleasure.