In this lesson, we're going to talk about the mongoimport command. So, oftentimes you're going to have data that isn't already in MongoDB. For those cases when your data is stored in either JSON or CSV formats, the mongoimport command has a very easy way to get that data into MongoDB. You've likely already seen the mongoimport command before now, but this lesson we're going to dive into a few specific features of the command. We'll see how we can import CSV data into MongoDB. Moreover, we'll see how we can specify what we want our field names and data types to be. And then finally, we'll get some pretty cool options when inserting that data into MongoDB. For this lesson, we're going to import an online retail dataset from the UCI machine learning repository. We'll go ahead and click this data folder right here, and then we can go ahead and click this online retail.xlsx file. Unfortunately, you can see this data says xlsx file which is Microsoft Excel proprietary data format. We really need to get this to become a CSV. So, I'm going to go ahead and open up this document into Microsoft Excel. And from here I can go ahead and save it as a CSV, I'll click save. I know that now everyone has access to Microsoft Excel. So, this CSV file will be attached as a handout in the list of notes. I've already CD into the directory that I saved my retail.csv file. Let's go ahead and take a quick look at the first 10 lines of the file. And you can see that we first have this nice little header line, and the below that we have the first nine rows of our orders. And now, we can see this data is pretty relational. And this is because the invoice number is repeated for quite a while here. We can go ahead and fix that once we ever did it into MongoDB by putting into a document model. But now, let's us focus on getting this data first into MongoDB. Before we begin importing this data with mongoimport, we'll talk about some of the cool features of mongoimport. I first want to point out the help page. So, by running this command, you can see all of the different options that you can pass to mongoimport, and you can see all the descriptions. We're on and discuss a few of these in this lesson. But I want to emphasize the importance of being able to read a page like this, to learn about all of mongoimport's functionality. So, please feel free to pause the video, run this command for yourself, and read about the different options. I'm going to import this retail data set into my MongoDB free-tier cluster by clicking on the cluster, and then clicking command line tools. From here, I can go ahead and scroll down to data import and export tools, I can go ahead and click copy to copy this command from mongoimport. Here I can go ahead and paste this command, but before I fill in these different pieces, we're going go back to Atlas so that I can copy my connection string. So, we can use Compass to view the data in my free-tier cluster. We'll click overview, and then from overview we can click connect. And then from here we're going to scroll down to connect with MongoDB Compass, we'll click copy. Now, I open MongoDB Compass, I'm immediately prompted to go ahead and import my connection string. So, I'll go ahead and click yes, it'll fill in these fields, I'll then type in my super secret password, and hit connect. And you can see I really don't have much going on here. Let's go ahead and fix this by importing our retail CSV. First, I'll go ahead and replace my password, and now I'll go ahead and change my database to coursera-agg, I'll change my collection to whatever we want to call it. What am I going to call it? I'll call it probably orders. I'm going to change the type to CSV, we support both JSON and CSV types. And then finally, I'll pass in the file path which is just retail.csv. Now let's execute the command. What happened here? Well, the command failed, and when we read the error up, we can see that we must specify fields, fieldfile, or headerline to import this file type, this is because we're trying to import a CSV file. With JSON, every value of data already has a corresponding field. With CSV this isn't always the case which is why you either need to pass in a field array, or fieldfile that contains those field names already have a headerline, and then have the first line of your CSV contain the column names. Fortunately for us, our file has a header. So, we just go and specify headerline. Let's go and try it again. Awesome. You can see that we are able to import 541,909 documents. Now, we can go ahead and head back to Compass. So, to go ahead and refresh Compass we can go and click this little icon right here. And now, we can see our new coursera-agg database, and inside here you can now see the retail collection with the same amount of documents. We can go ahead and drill in here, and here you can see that mongoimport was able to infer the different data types. Here this is an integer, here this is now a string. But unfortunately, with things like this here we have a date, but the date is actually encoded as a string, which makes sense because mongoimport doesn't know about the format of the date. But we'd really prefer this to be a native date data type. Moreover, when we go to over the schema tab, and click analyze schema, and what this let's us do is see the breakdown of our data. So, here we can see customer ID is 75 percent of the time an integer, 25 percent of the time is a string, and we can also see the breakdown of those different values. Let's see how we can fix this with mongoimport. So, mongoimport has a flag called columnsHaveTypes, and thus allows us to specify the data type for a field by appending it to the field's name. To do this, we're going to go ahead and slightly modify our CSV file. Like I said, we need to append the data type to the end of each field name. Here's the syntax for doing so. First, we go ahead and separate the field name from the type with a dot. For invoice number we saw that it was mostly an integer, but occasionally a string. So, let's go ahead and make it a string always for consistency. So, I'm just going to type string, and then I want put in two parentheses at the end, almost like as if this was a function call. We'll go ahead and do the same for StockCode and description. In Compass we select quantity was always an integer. So, we'll go ahead and change that to int32. And now, this one is a bit tricky. So, invoice date is currently a string, but we'd really prefer it to be a date. But you'll notice we have a very particular format for our date time, it's two digits for the month, but it looks like we only do that if there are two digits in the month because the day only has one digit, the year only has two digits, and then the time I know that I've seen times farther than this, but this is actually 24 hour time. So, this is actually 8.26 am. And of course the date and the time are separated by a space, and the date, each element is separated by a slash, and separate the hour from the minute is separated by a colon. So, to tell mongoimport how to parse these different date time fields. So, we're going to do.dateparentheses. And then inside those parentheses, we're going to use going's date parsing syntax, and this is because MongoDB's command line tools like mongoimport are written in golang. So, what this means is that we specify a very specific time as our example. Specifically, we're going to give the time of Monday, January 2nd at 03:04:05 pm in the year 2006. So, for that date, we actually need to provide that date in the format that our file uses. Let me show you what I mean. So, I'm going to say January 2, 06 which follows the format that we've been using. Then you're going to give it a space. Going to do 24 hour time colon, and the number of minutes which is at four minutes after three pm. And that's how you specify the date. Now, we can go ahead move on to unit price. For unit price, since this is a dollar value, really I want to make sure that we have precision when doing math. So, we're going to make it a decimal type. Customer ID, we'll make a string, and of course country we'll make a string as well. Now, go ahead and save this file. Now, I'll go ahead and add columnsHaveTtypes to the end of our command, and we'll run it again. Great. You can see that we again imported the 540,909 documents. So, now back in Compass, we can go ahead and click Refresh. Now, after refreshing Compass we can see our new documents. And you'll see that, oh, no wait this is supposed to be a string. Why is this an integer? And this is supposed to be a date. Under closer inspection, we can see that we now have 1.1 million documents instead of about 500,000 and that's because we just re-imported all those documents again. So, now we have a bunch of duplicates. Let's look at how we can fix this. The easiest way to fix this is to append the --drop option to our command, which will first drop our collection before importing any documents. And now, after importing our 540,909 documents. And now when we refresh Compass again, we can see that our date field is truly of a date data type and the invoice numbers and stockcodes are both strings like we would expect. Now, let's imagine for a second that this collection, before we imported any data already contains some data. This would mean when we use --drop, we would delete data that we really wanted to keep. Well, there's a really cool trick for these types of situations. You can use the --mode flag and pass an argument whether you want to insert, upsert, which means we are replacing an existing document with a new one, or insert a document if those fields don't already exist, or merge, which means to replace only the fields of the new data that we're importing and keep the other fields the same. In our case, upsert will work well because we just really want to replace the entire document with the new proper ones. Before running this command, we'll need one other flag, which is upsertFields. And this tells mongoimport which field or fields to use to determine which document in the collection should be replaced by the current new one. For us, that means we need to use both InvoiceNo and StockCode since together they're unique across all the documents in our collection. Now, when we execute this, you'll notice that this command runs much, much more slowly than our previous commands. And this is because for every document that we are inserting, the server needs to query the database on InvoiceNo and StockCode and then fetch that document so that it can replace it. Our drop command work fine so let's go and cancel this. And that's a pretty good practical overview of mongoimport. Let's recap. So, in this lesson, we saw how we can import CSV data into Atlas using mongoimport. We also saw how we can specify the field names via a CSV header. Similarly, we saw how we can append data type info to those column names so that information can be passed on to MongoDB. Additionally, we saw how to use the --drop option to prevent duplicates when running multiple re-imports. And finally, we saw a cool trick of how to update existing documents in MongoDB by using --mode equals upsert or --mode equals merge.