In this segment, we're going to learn how to import data into Neo4j. We will begin by using a fairly simple spreadsheet consisting of only a few rows and three columns in a format that's fairly typical for importing into a graph database. We will review the Neo4j CYPHER script used to perform this import. And then we'll run the script and validate the resulting graph. Then we'll demonstrate a similar process but with a more challenging dataset, consisting of terrorist network data. We will review this dataset and the script commands necessary for performing the import. And we'll run the script and explore the resulting graph. And finally, we will review a third dataset that you will use yourself to perform similar data inport operations. First, let's take a look at our sample dataset. This spreadsheet consists of just a few rows and three columns. Each column has a heading. The first column heading is Source, the second column heading is Target, and the third column heading is distance. You can imagine that this might represent data from a simple road network in which the Source and Target values represent towns, and the distance values represent the actual distance in miles between the towns. So let's take a look at the Neo4j CYPHER script and see what we will need to do to import our spreadsheet data. The first line of code performs the actual import. The other three lines of code provide constraints to the formatting of that data. Since our data file is a comma separated values or CSV file, we will need to specify that in our code. Our file also contains headers. And if you're using a Windows operating system, your file path will look something like this. Since we're running Neo4j in a browser, the file path needs to conform to the HTTP address conventions of the word file, followed by a colon, followed by three forward slashes and then the hard disk letter where the file is located, plus the path to that file. If you're using Mac OSX, your command will be similar, but the file path will probably look something more like this. The next three lines specify which nodes will be the source nodes and which nodes will be the target nodes. And the properties will attach to them as well as defining the relationships and the properties we will attach to the relationships. As we read each line of data n, we're going to use keyword line, to specify the individual line we're currently working on. We use that here at the end of our load command and so we'll have to continue to use it in the subsequent merge commands. Our source node variable is going to be n and we'll use the MyNode type, which we've made up ourselves. We're going to add a Name property to each of our source nodes. And we're going to attach the value in the Source column to that particular node. Likewise for our target nodes, we'll use a variable m. We'll define them as the same type, MyNode. We will give them a Name and we will extract that name from the Target column on the particular line we're working with. Finally, we need to define our edge relationships. We're going to give each edge a label of the word TO, and we're going to add a property called dist, which represents the distance. And we'll attach the values in the distance column from the particular line we're currently working on. So let's go ahead and copy this code and paste it into Neo4j, and see what happens. Now, I'm working on a Mac OSX machine, so I'm going to copy this code down here and perform the import operation. So I've pasted the line of code into Neo4j and I'll click Execute. And it takes a few moments to run. There's not too much data so it shouldn't take more than just a few seconds. And now we get the results in which 11 labels, 11 nodes have been added, 25 properties have been set and 14 relationships have been created. So let's take a look at this graph network. We can see that the nodes are listed all as MyNode and there's our graph. The edge relationships should all have different distance values and each node should be named a different letter. Okay, so let's try a more difficult dataset. Here is the spreadsheet containing terrorist data. The spreadsheet consists of seven columns with headings such as Country, ActorName, ActorType, AffiliationTo, AffiliationStartDate, AffiliationEndDate, and any aliases associated with that particular terrorist. This dataset consists of over 100,000 rows of data. Since that could take a very long time to load into Neo4j, we will be working with a subset of this dataset consisting of the first 1,000 rows. That much data will still include three countries, which should be plenty of data for our purposes. So here is the script we're going to use to import the subset of terrorist data, which shares similarities with the script we used previously. But since there are more columns in our dataset, we're going to include some additional properties into our graph network. The first line of code is very similar to the load command we've used in previous datasets. The second line of code will use a variable c and a label Country for the particular nodes representing the individual countries in the dataset. In this particular case, we're going to use the keyword row instead of line to read our data n, but it really doesn't matter. We could use either word as long we are consistent from command to command. So we're using the term row and associating the value in the Country column with the node that we're working on in that particular row. We will do something similar with nodes that are intended to represent the actors or the actual individual terrorists. We're going to use a variable a and we will associate a property called Name, and associate the ActorName with that property. We'll also associate a property named Aliases, and associate the value in the Aliases column with that property. And finally, we will define a property called Type and associate the values in the ActorType column with that property. We're going to create nodes representing organizations as well and we will use the variable o, and the label Organization for those nodes. We will attach a single property to these nodes called Name and we'll assign the values in the AffiliationTo column to that property. Then we're going to define relationships between the Actors and the Organizations they're affiliated with. The relationship label is going to be AFFILIATED_TO and we'll define a property called Start. And we'll assign the values in the AffiliationStartDate with that property. Likewise, we will define a property called End and assign the values in the AffiliationEndDate with that property. And finally, we're going to create relationships between the countries and the actors. In this case, we will define relationships with the label IS_FROM that will describe the fact that a particular actor is from a particular country. So if all this makes sense, let's go ahead and copy this script and paste it into Neo4j, and see the results. So here we are in Neo4j, and I'm going to go ahead and paste that script into the command line, and we will execute it. We loaded 1,000 rows of data, which consists of 658 labels and 658 nodes. 3,464 properties and 1,403 relationships and it took about two and a half seconds. So let's look at a small subset of this network. Here, we see the equivalent of the first 25 rows of the dataset. There's only enough data such that 1 Country, 8 Actors and 15 Organizations are visible. Let's go ahead and change this command to 250. By clicking on the line of code in the top of our panel, it automatically gets pasted into the command line above. So all we need to do is add a zero on the end of our command and execute that again. Now we have a much larger, more complex graph, but we still only have one country. In order to see more than one country, we'll need to render the entire 1,000 rows of our terrorist data subset. But in so doing, we will have a difficult time viewing the entire graph. The community edition of Neo4j is limited in its ability to navigate a graph network. But I'm going to show you a little trick by editing the HTML behind the scenes to scale the view of our graph. So let's go ahead and render all 1,000 rows. So most recent versions of the major browsers provide the ability to go behind the scenes and edit the HTML. So the trick is to find a place on the viewing panel that does not have any objects. So that when you right-click and inspect the element, you'll be inspecting the viewing area. Neo4j uses SVG graphics, or Scalable Vector Graphics to render its graph networks. And SVG uses a g element, which can be seen here on the right. So in order to change the scale of our view, we simply need to double-click and add scale, open parentheses, and the scale factor that we'd like. We hit Return and the graph network now is zoomed out. I'm going to try to position it so we can see at least two countries, and I'll close my HTML panel. And so now we can view two countries. We can see Albania in the upper region, and Afghanistan in the lower right. And we can see that there are actors and organizations that have relationships with both countries. Now there are add ons to Neo4j that make navigating a graph network a little easier, but this trick is convenient for those of you who have not added any Neo4j extensions. The last thing that we're going to do is take a look at the sample dataset of gene-disease associations, and give you an idea what's going to be expected of you in the accompanying assignment for this module. This data consists of information associating different genes with different diseases. The spreadsheet consists of columns with headings for geneId, geneSymbol, geneName, diseaseId, the diseaseName, the score that represents the extent to which that gene is associated with that particular disease. The NumberofPubmed articles containing that information. The associationTypes, there are up to three different types of associations between a gene and a disease. And then the sources of data and information that confirm this gene disease relationship. Now this dataset contains over 400,000 rows of data. So if you have difficulty importing this entire dataset, then you'll be better off extracting the first few thousand rows. So you're goal will be to define the load statement, which includes a CSV with headers, that will allow you to import enough data into Neo4j to give you an idea that you've done it successfully.