Python is one of the most used programming languages and specifically one of the most popular programming languages used in the field of data analytics. Interestingly, it is used more often than programming languages that are specifically designed for statistics and analyzing data. To understand why, let's start by asking, what is Python? Here are four key features of Python. First, it's a high-level programming language. High-level means that you do not need to worry about writing a lot of ones and zeros and tell your computer what to do. All of that is interpreted and handled for you, so you don't need to know what is called machine language. Python code looks a lot like typical English. Second, Python is developed under an open-source license, which means that the program is free, available to everyone and people can make and distribute models to use in it without a license. This means you can use most of the base Python modules and libraries for any use including commercial use. Lots of other people's code in models will follow the same open source license, but not necessarily. We'll focus on using these freely available libraries, and most are freely available to use, but be aware if you come across other libraries in the future, they have different licenses. Third, it's also an object-oriented language, and every item in Python is an object. Object-oriented programming is a way of structuring and writing your code by creating various classes of objects that track certain information or can do certain things using functions that are specific to that class. For instance, a string object contains text information and has functions that are unique to text, like string.capitalize, which capitalizes the first letter of the text. An integer object doesn't have a dot capitalize function because it doesn't make sense to capitalize numbers. There's a lot more to object-oriented programming, commonly called OOP. But don't fret, we don't actually need to know, OOP to use Python successfully for data analysis. Don't worry, we'll cover the basics to get you there. Finally, Python is a general purpose language, meaning that Python is not limited to any one field, but can and is used across many fields, from tracking finances, in building software to designing rockets and sending people into outer space. Python is everywhere. This is different from languages like SQL, which is a query of language which can only be used to talk with databases. With Python, you can build anything you can think of, and there are usually great free libraries and code frameworks to help you do it. The fact that Python is not specifically designed for statistics, it's one of the strongest features in data analytics. This may seem counterintuitive, but it's because it is a general purpose language that analyst can get creative. By now, you've probably heard terms like machine learning or artificial intelligence. These are algorithms and tools that learn from data and they're very useful to anyone that analyzes data with models like marketing analysts. Python is the premier language for open source libraries for machine learning and artificial intelligence. Open-source libraries like Scikit-learn are maintained by thousands of volunteer programmers and scientists, resulting in a library that makes cutting-edge machine learning algorithms available free to anyone and everyone. There are other languages that are designed specifically for statistics and data analytics. A common popular one is R. However, Python has really overtaken R in popularity because the general purpose nature of Python makes it easier to every step of the process in Python, from grabbing data from websites using programs like web scrapers to talking to databases, modeling data, and creating data visualizations. You can even connect to APIs to get data or create your own API, all using Python. Rest assured that learning Python as your language of choice will get you far. There are a couple of core open-source libraries that most people refer to as the Python data science stack. These are libraries that anyone can access and borrow code from. These are, NumPy for storing data and doing numerical computations, Pandas for creating data frames which are basically spreadsheets made out of code. Pandas is actually built on top of NumPy. In other words, it uses code you can find in NumPy. We'll be using this package later once we get our feet wet with pure Python code. Matplotlib, a library used for producing data visualizations. Scikit-learn, a comprehensive open-source library for modeling data using machine learning algorithms.