Let's get started in our first Notebook that'll teach us how to read the data from our different data sources. Our learning objective for this Jupyter Notebook will be to create a SQL database connection to a sample SQL database, and read records from that database. We'll also explore some common input parameters for our read_sql function. The packages that we'll use are going to be Pandas. From Pandas, we're going to use the read_sql function, and we'll also use SQLite3, which is going to connect specifically to an SQLite3 database. We'll see in a second that's going to be somewhat interchangeable with any of your other SQL databases that you may be using at your workplace. The first step is going to be to import the necessary libraries. sqlite3 as sq3, pandas.io.sql as pds, and pandas as pd. As I just mentioned, you may be working with a different database at your workplace, such as Microsoft SQL Server, Postgres, MySQL, DB2 Family, so on and so forth. Some common libraries that you can interchange here for sqlite3 are SQLAlchemy. That'll work actually across a couple of them, including Microsoft SQL Server, psycopg2 is for Postgres, MySQLdb for MySQL. Our first step is going to be, as we want to connect to our database, is just say, where does our database actually live? Pass that into a string here. We have data/classic_rock.db. That just means that we have a folder called data, and in that folder we have a file, a database called classic_rock.db, and then we're going to set our connection. Ask him to use our sq3 library and do sq3.connection using that function, we just pass in the argument of our path. When I run this code, we've created a connection object, and I can show you here. That's just a SQLite connection. We're going to use that in a second right over here. Now that we've got our connection to our database, we can perform queries using the SQL syntax and load their results into a Pandas DataFrame. As mentioned, we're going to use that SQL syntax and we're just going to set query, this variable equal to the string. SELECT * from rock_songs, rock_songs being a table within our classic_rock database. We're then going to execute this query. In order to do so, we do pd.read_sql. All we're doing is reading in our data, we pass in the query as well as the connection that we established earlier. When I run this code, we set observations equal to this function, pulling out that query from this connection that created our Pandas DataFrame, and we can see the first five rows of that Pandas DataFrame. These are our different rock_songs. Now, we can run any supported SQL query. All we have to do is write the query appropriately using the right SQL syntax. Here we'll select the artist, we'll select release year, we'll select the count of that unique combination, and we'll call that the number of songs by that combination. We will get the average play count for those songs using our average function and call that average plays. We're pulling it from rock songs. We group by our artist and release year, and we're going to order it by a number of songs and we're going to make that descending from largest to smallest. Again, all we have to do is do pd, we can do pds or pd.read_sql. We pass in the query and the connection. This is our new query and the same connection as before, and we see for the first five values, the Beatles in 1967 released 23 songs and they had an average number of 6.5 place. There are a number of common parameters that can be used to read in SQL data with formatting. We can coerce float, attempt to force numbers into float types. We can parse dates. Within that, we'd want to list columns that we know should be brought in as dates into our Pandas DataFrame, and then chunk size, we can define the number of rows to include in each chunk. Let's have a look at using some of these parameters. We're going to use the same query as before, and then we have our observations generator, and we're going to create a generator object here. We're going to do pulling in chunk sizes of five, pd.read_sql, our query, our connection coerce_float equals true. This won't affect this dataset because the floats were already correctly parse. Parse dates, we want the release year to be brought in as a date value, then chunk size equals five allows for streaming results as a series of shortest tables. We're going to get chunks of five rows at a time when we pull this in, and that's why this is a generator object. Generator object, just as review is just similar to an iterator, such as a list, something that you can pass through, but it won't save the entire list and memory, it will just pull out as you call that value. You won't have to waste as much memory in using that object compared to an iterator such as a list. For index, observations in enumerate observation_generator. So enumerate, what that's going to do is as we pull out each object from that generator, it's also going to pull out a number, so it will be a zero, and then first object, one and then the second object. At first index will be zero, observations will be the first observation's objects, then it will be one, then the second observation's objects, and we're going to only do that up until five. If index is less than five, then we're going to print both the index and we're going to pass in that number here, as well as displaying the five rows for that observation from our observations generator. Let's run that to see. Here we see the five rows that are pulled out. The Beatles, Led Zeppelin, the Beatles. Again using that artists and release year group by, we now have also the release year before we saw that it just had a number as a float. Now, it's parsed as a date, actually date, time objects, so as Date and Time. Then we see the next five values, so on and so forth up until our index is less than five.