In this module, you'll learn how to use Python to perform some basic tasks. Specifically, you'll one be introduced to python, two learn how to write python functions and three create conditional statements. As you know, this is a course for accounting students who want to learn how to analyze data using Python. We know that you want to get to analyzing data as quickly as possible. And we also think that you should get to that point as quickly as possible. >> For that reason, we're not planning on teaching you everything there is to know about Python and programming. Instead, we plan on teaching you some of the most relevant foundational concepts that will help you with some of the most common data preparation tasks. >> Let's chat with Dan Lemaire, a lead data scientist at overstock.com and see what he has to say about learning programming language. >> My name is Daniel Mayer. I work at Overstock. I'm a lead data scientist there and work with a team of superstar analysts to answer questions that are important to the business. >> So will you tell me a little bit why you think it's important to learn a data analytic language like R or Python versus just using Excel. >> Excel isn't reproducible. When you hide your functions and cells that call each other can be really hard later to track down a logic you used to arrive at the answer that you find. So the reason that you would want reproducible script is so that appear can check into your work. And give some more insight into whether you might have made a mistake or what your blind spots are about the best way of doing things. But also so when you have to do that same work again six months from now you can do it easily. You can pull up your old script you can understand exactly what you did and it can really accelerate your speed insight. >> So R and python are both open source. What does that mean to you? What are the benefits? >> It means that I can see every part of the code behind any function that I use with something like SAS? I have to trust that really brilliant people have built the tools that I need and that they operate in the way they say they operate. And there are a few well-known cases where certain kinds of null hypothesis tests or other kinds of model fitting behavior was decided by a team where the decisions that you have to make. There's no one right decision. There's only pros and cons and as an analyst, I should be making those decisions because I have to own them. And if I use proprietary software, it can be difficult or impossible to know what decisions were made and what pros and cons were balanced. But it can also be impossible to justify the answer the work that you use. I learned more about the best ways to do things because really brilliant people have opened their source, they've opened their code to me. And as I create I get to use the creation of other brilliant people and stand on their shoulders in a way that I couldn't with proprietary software. >> And communicating with all these different people, how do you think the software language that you use or the programming language the scripting language you use? How does that influence your thought process? >> Yeah, so best practices in each language can be a little bit different. There's an awful lot of overlap, of course, but I have noticed that people really strong and Matlab tend to think of things from a little bit more of a strict mathematical background. Because everything you're doing it feels like it's matrix manipulation of some sort. So they tend to think of the model specification, the model interpretation and the model fitting as some sort the manipulation of a matrix or a vector. That is of course what a Python or an R user is doing under the scenes, but maybe that's not the first way they think about the problem. A python user in my experience tends to think a little bit more from an object oriented approach. So they would be thinking about chunks of code that are defined in certain ways and behave in certain ways relative to each other. Whereas an R user tends to think more from a functional perspective where everything is a function. And you have a data set that looks a certain way and every step in your pipeline and your workflow is a transformation of that data set. And at the end, the data set you have is contains the answer you need the information that you need to make a decision. So the strict mathematical perspective versus the object-oriented perspective versus the functional perspective. I can see how the scripting languages themselves lead to a different way of approaching the problem. >> You interact with databases, why not just write the script in terminal or maybe you do do this, but why why is it important to learn how to interact with databases using R or Python instead of writing out a script in terminal and then running that and and then storing that script somewhere else? Why is it all, why is it nice to have it all in one place? >> A really good reproducible workflow would allow somebody to open a single file and know the entire logic from nothing to final result. And for me, the answer is notebooks. So I use R notebooks which allow me to to insert chunks of different kinds of scripting languages. So typically start with some sort of a Jama ladder with some markdown comments and occasionally if I'm if I'm writing a formula or something like that. Then I have to think about how I want to display the formula and then I might a SQL chunk that returns a result set locally into my computer. And then I might have an R chunk that performs some sort of manipulation and perhaps a model fitting on that chunk. And then I might include perhaps if there's a package in Python. One example is Python is is excellent in the sense that you can actually interpret a string as a rostering and R doesn't have that capability. So sometimes I run out of room in my brain to write the reg ex I need in R and I just have a Python chunk that just reads in the rust ring. And that's all I need, that's all I need to do. I might have Stan as a probabilistic programming language that's performing some sort of a complicated maybe a fitting a Bayesian model. And I might have a bash script in there somewhere that's setting some sort of environment variable or interacting with my console itself and some important way. So notebooks allowed to interact with many different scripting languages and to keep a record of every single step of a recipe from knowing absolutely nothing to having the solution that I started out to find. >> All right, Dan, thank you so much. I really appreciate you coming to share your insight and wisdom and experience with us. >> You're very welcome. >> We are confident that by learning some of the foundational material, you will be able to be much more effective at performing data analytic tasks.