0:20

The main reason that we're introducing this,

Â besides the fact that it allows you to perform numerical operations quickly,

Â is it also forms the basis for the Pandas data frame and series data structures.

Â Thus, to understand how Pandas is actually operating, it's useful to know

Â about Numpy, because Numpy is how things are typically implemented by Pandas.

Â 0:50

We've seen previously that the Python programming

Â language provides a rich set of data structures.

Â This included the list, the tuple, the dictionary and the string.

Â And you've seen, by now, how these compound or

Â container data structures can make tasks that might be difficult, much easier.

Â 1:10

Now all of these but the string are heterogenous, which means they can hold

Â data of different types, so you can combine character data

Â with numerical data with other containers, all in another container.

Â This flexibility is powerful but it comes at a cost,

Â because it's more expensive both in computational time and storage to maintain

Â this arbitrary collection of data than it is to hold a predefined set of data.

Â And this is where Numpy differs from the standard Python container data structures.

Â Numpy will hold an array of data, that's all the same type.

Â And so, it can make certain assumptions that will allow the computer program to

Â operate more efficiently and to operate faster.

Â 1:53

So that's what this notebook does.

Â First, it introduces the NumPy the idea of this n-dimensional array.

Â In this particular lesson we're going to focus on one-dimensional.

Â A later lesson will focus on two-dimensional arrays.

Â But basically we start off talking about what NumPy is, why it's used so much,

Â it's very fast.

Â It can be very easy, especially if you already know how to use a list.

Â And it underlies many of the other common libraries in the standard data

Â science Python distribution.

Â 2:33

So the first part of this notebook walks through an introduction to these.

Â And then we use the time it magic to actually create,

Â in this case a list and apply a function to every element in the list.

Â And we see how fast they operate.

Â And that we do the same thing but now with a NumPy array.

Â And you can see that it's actually quite a bit faster,

Â in this case five times faster, simply by doing it in NumPy.

Â We could do other examples and see how the speed goes as well.

Â 3:02

Once we've hopefully convinced you that NumPy is fast and

Â it's worthwhile learning, we actually need to start doing things with NumPy.

Â So the first thing is how do you create an array?

Â And there's a number of different methods that allow you to do this,

Â there's one that creates an empty array.

Â There's an array that creates a function that creates an array,

Â where all the elements are initialized to zero.

Â Or another one that initalizes to one.

Â And you could read and see how these all work.

Â And you should play with this to make sure you're familiar with them.

Â 3:29

We also can slice elements, and we'll see that in just a minute.

Â But that same notation can be used with the arange method,

Â that will create a array of data following the specific pattern.

Â So if we're going to start with zero and end at ten, and stride is one,

Â we'll go 0 1 2 3 4 5 6 7 8 9.

Â If we have a stride of two, such as this example shows,

Â you can see that we go 3 5 7 and 9.

Â Just like before, we don't actually include the end parameter.

Â There's also elements or methods that will create arrays whose

Â elements are linearly spaced so this is very useful for plotting.

Â If you want to sample data at a specific set of points.

Â So for instance I need a 100 sample points between 0 and 1, this would do that.

Â You may want the logarithmically spaced because of the way your

Â analytics is operating and so you can do the same thing.

Â But now it's with log space method and

Â that logarithmically spaces them uniformly.

Â And this code here just demonstrates that.

Â Arrays have attributes that provide information about them such as how

Â many dimensions.

Â So if you have a one-dimension this value will be 1.

Â Shape gives you the shape of the array.

Â So if you have a matrix that holds n rows and m columns it would have shape n,m.

Â Size is the total numbers of the arrays which is just the product of n times m.

Â Dtype, is the data type, so is it an integer?

Â Is it float?

Â And NumPy will actually allow you to specify that when you create an array,

Â so that you can say,

Â look I know my numbers are very small, say they're between 0 and 255.

Â So I want an unsigned integer and

Â that will minimize the memory impact of your array.

Â These can be very important when you start working with very large data and

Â you want to try to make sure it's fitting within your computer's memory.

Â 5:25

And this is what this section here talks about,

Â different data types that you might use for an array.

Â The rest of the notebook talks about different things you might do.

Â So for instance this is demonstrating that you can't assign a string to a NumPy

Â array because it must have a floating point value.

Â 5:43

We'll also talk about how to index them including slicing.

Â Numpy also provides access via a boolean mask array, which is kind of cool.

Â We can say look let's select all elements where the element is greater then 4.

Â Then we're going to change that value, so that's what we do.

Â We say a is 0 1 2 3 4 5 6 7 8 9.

Â We create a mask array which says which elements in the array are greater

Â than 4 and then we can change the values in the new array based on that mask.

Â So this is a pretty powerful way of selecting data and

Â manipulating data based on some condition.

Â We can also create random data and this notebook shows that.

Â We'll use that a lot later on when we talk about probability and statistics.

Â We can also perform basic math operations,

Â just like we did with the Pandas library where we operated on a vector fashion,

Â where the method is applied to every element in the array.

Â You should definitely try these out so you learn there summary functions.

Â There's a lot of other things that you could try, including universal functions.

Â NumPy includes a lot of functions that have been defined to operate on

Â every element in the array at once.

Â And this is very nice.

Â So for instance, this computes the sine of every element in the array.

Â And that's very nice because it's a simple code.

Â We didn't have to write a loop to do it.

Â That's the whole benefit of a vectorized function.

Â 7:04

One other thing I wanted to talk about though is this idea of masked array.

Â This is really powerful because we can set the array.

Â We can then create masks and say if this is a bad value we want to mask it

Â such that when we do operations on it, they'll be ignored.

Â And this could be very useful when we want to do math on those sorts of things.

Â So for instance we might look at this and say this is a bad value,

Â and this is a bad value.

Â And we want to do some sort of operations on them.

Â Here we are dividing two arrays and then taking the square root of them, but

Â since they're masked arrays, it will prevent a error from occurring.

Â So for instance here, this is a 0, we're dividing by 0.

Â We can't do that.

Â And so, instead of giving it an error, it's just telling us a warning.

Â So this should so you that it's very powerful to use masked arrays.

Â The last part talks about how to input NumPy data straight into NumPy via load

Â text and genfromtext.

Â That's less important for us.

Â Most of the time we're going to be using Pandas data frames.

Â But again, you could look through all of this, try everything out.

Â And as always, if you have any questions let us know in the course forms.

Â Good luck.

Â