You're still here? We're almost done. This is the last course in our sequence in text marketing analytics. My name is Professor Vargo. I am an Associate Professor of Advertising and I'm super excited to talk to you about network analysis today. Have you seen something like this graph before? What does it actually mean to you? This is a network analysis of relationships that people share in the movie Love Actually. The thicker the line, the more the two scenes the actors have appeared in. This is a great use of network analysis because this movie is known for its nine subplots. Yes, nine, I looked it up. The network analysis shows how the subplots and characters relate to each other. I also love how this network uses labels to describe the relationships that the individuals share. The way in which this network is annotated really helps tract and extrapolate the meaning that you can gather by watching the movie. Network analysis has been around for a very long time. The first network was created in 1859 to show the relationship between different pesticides used on the land by Darwin's cousin. The network was created by Francis Galton, a cousin of Charles Darwin. The first social network analysis, that is a network of people, was done in 1877 by the French sociologists Gabriel Tarde. He used it to study the spread of gossip and rumors. The first social network analysis software was developed in the 1960s by MIT professors John McCain and Peter Spiegelman. They call it social net. It was used to study how people were connected together on campus. In 1973, Mark Granovetter published his paper, The Strength of Weak Ties, which showed that weak ties are often more important than strong ties when trying to find a job or getting information about a new product or service. This is because weak ties have access to different networks than strong ties do. They can help introduce you or bridge you to new people that can help you achieve your goals. In 1978, John Scott published the paper, Social Network Analysis a Methodological Approach which showed how social networks could be analyzed using mathematical models like graph theory and matrix algebra. Today, we use networks to understand the relationships in complex data. Networks understand the relationships that people, places, and things share. We use networks to understand how ideas spread through populations. We use network analysis to understand the semantic relationships document share. Quid is a network analysis tool that turns text into context, by identifying the most important entities and their relationships in collections of news articles. As a result, it's able to extract the most important information from the collection of news articles and present it in an easy-to-understand format. It was developed by researchers at the University of Washington and is available for free. It can be used to analyze any collection of text documents, news articles, blog posts, emails, or tweets. Quid has been used by journalists to analyze large collections of documents relating to the WikiLeaks scandal. Quid tool works by extracting entities such as people, places, things, or keywords in a document, by using natural language processing techniques, such as parts of speech tagging and named entity recognition. The relationships between these entities are then identified using graph theory algorithms that identify the clusters of related entities within a network graph. This information is then presented via an interactive visualization that allows users to explore the connections between different entities within a collection of documents. You can conceptualize really all data as having connections. Not everything should be treated as network analysis. I'm not even trying to assert that you should turn everything into a network analysis, however, almost all data can be thought of in terms of connections that it has to other data. If you are interested in those connections, then a network analysis is probably an appropriate method to use. A network analysis quantifies the relationships between people or data. If we can capture that relationship is being shared, we can do a network analysis of that data. A network analysis is also a database tool that explains the relationships that people are data have with each other. It helps you analyze relationships. Network analysis can be used to reveal the most influential people in a group of people. For instance, this is pretty commonly done in influencer marketing. In influencer marketing, you want to find the person that's most influential, that will reach the broadest network of people as possible. That is exactly what network analysis can allow you to do. Network analysis can not only be designed to tell you who's influential, but it also can cover an entire network of people and the common bonds that they share so that you can better understand who's connected to who and why. There are two common types of network analysis. Social network analysis is of people. It shows you the relationships and the interactions that humans share. It's a bit confusing because the terms are related but totally different. However, stay with me here. Most social network analysis examples you'll see online are actually connected or collected from social media data. Why? Because it's really easy to capture the relationship that people share on social media. Social network relationships, who follows who, is fairly easy to determine via downloading data from Twitter's API. Know that a social network analysis really isn't anything about social media. Remember the first one was done in the 1800s, well before Twitter. But doesn't really have to deal with social media. That being said, most of them are social media. People can be connected in many different ways. Everyone in this Coursera is now connected because we take this class together. Now, if that's not interesting, let's try to quantify something else, such as the amount of time you talk to your friends, or the amount of time that we talk to our colleagues. Network analysis can start to capture these human relationships and decipher who is most close to who. There's a lot of terminology in network analysis and we're going to cover it over the next series of lectures. Because there are so many different academic disciplines talking about network analysis, there's really a fragmentation of the terminology and I hate it. But it's true. It's really everyone from sociology to computer science and everyone likes to use their own words. I tend to refer to the dots on the networks as a node or an actor or person. They can also be called vertex. There's invariably tons of other terminology we use to describe those dots that you see there on the screen. It could be people in the network, or it could be any actor. My advice is just pick one type of terminology or one word that you like to describe a dot and stick with it. You'll hear me say node. Maybe you'll hear me say person or something along those lines too. Whether it's a node or a person, or a person's social media account, the dots have some type of reaction and relationship to other dots on the network. That link is usually referred to as an edge or a connection or a tie. If a node has a relationship than we visualize that connection with a line. You can see here that the first node is connected to the second node. On Twitter, not all relationships are reciprocal. You can follow someone like Taylor Swift. Taylor Swift may not even notice, she may not even follow you back. That's a non-reciprocal relationship in a network analysis. Nodes send connections to others, but not every connection in the network is reciprocated. Networks can be directional. Some networks are directional, others are not. We'll look at examples of both in this course. Facebook connections are always non-directional. You're either Facebook friends with someone or you're not. Not every network analysis has to be directional. Edges can be directional in nature though. For instance, this is a network where it's clear that whatever Person A is sending to Person B, it's being reciprocated back. The arrows indicate a direction. Do you see the arrows on the graph here? The common example of these is a node follows another node on Twitter, and that node follows that node back. They follow each other. We call this a reciprocal or mutual relationship. They reciprocate with each other because they share that relationship. Here's the famous example in Twitter. I follow Taco Bell on Twitter, so I am exhibiting what is called an outdegree. In the network analysis literature, people who send out a lot of connections or mentions are often referred to as hyperactive members of the network. They are the ones that are talking to everyone. Then there are people that are receiving that connection, the famous. In this particular network analysis, I have an outdegree of one because I'm sending a relationship. I'm not the one receiving any benefit from this. I'm not getting anything in return. I'm just following someone on Twitter. I'm sending an edge out, but I'm not receiving anything back. So my indegree is zero, no one follows me on Twitter, but I follow someone on Twitter. Like not far from the truth. If we flip the relationship, if Taco Bell follows me on Twitter, but I don't follow them back, then see how the indegree and outdegree changes for each node. Indegree and outdegree can actually tell you a little bit about who's influential on the network. If someone has a very high indegree, that means they talk a lot to a lot of people and that other people look towards them. If I'm an outdegree, I'm talking to you and you're not talking back because this is a recording. If we're measuring attention then I have the highest indegree because you're all paying attention to me but I'm not paying attention to you. As I aforementioned, a mutual tie or a reciprocal tie is when both actors reciprocate in the same way. This is the type of tie we see between friends, relatives, or neighbors. So here's a visual network of all the places that I used to go and bolder before COVID, these networks are undirected. I'm going to these places, these places can't go to me. So there's no real reverse relationship here, there's no real direction, it's just there's a connection. So I have a relationship with Whole Foods and that relationship can be described as the whole paycheck. Networks can be directed and undirected, but they also can be valued and unvalued. We can quantify relationships. Going back to this example that I gave you, if we wanted to take a social network of all of the people taking this MSDS sequence, we could measure the emails that you sent to each other, the messages that you send to each other, the times in which you chatted with each other, so on and so forth. We can also tally those emails and record the total number of interactions that you have with each other as an indication of strength. The more times you sent a message to another person in the MSDS, the more connected you are, the larger the number, the stronger the tie. Website visits are a good example of a value network. Note, this is a non-directed network. I visited these sites but they can't visit me, so there's no clear direction to this network, but it is valued. What do you think the values mean? How many times do I visit Amazon? Thirty times. I visited Twitter 14 times. I only visited HGTV twice. We can represent these valued networks in two ways. We can increase the thickness of the line or we can make the nodes closer together in space. Here I'm only utilizing one thing, the line strings. Again, this is my website behavior. I've added some additional lines. What do you suppose they mean? Well, of course, I went to Amazon 24 times. I visited HGTV, just once. What does the line with the value seven mean here? It means that I visited Amazon and Twitter on seven different occasions. This is website data. What do you think this network represents? It represents the different actions that users took on a website during a website session. The weight of the edges, their values represent the number of users that visited both pages. Network analysis visualized in this way can help make sense of website visits. What sites do people tend to visit together in a session? That's what this network visualizes. We can use this to understand what the closest related actions to each other actually are. Directional networks can kind of be measured in both ways. Some days depending on how many muffins or treats I gave Charlotte, my daughter the connection may only go in one direction. There are days when I feel like the love is flowing in one direction with a four-year-old. However, there's an opportunity to measure both people and if they both have an equal opportunity in participating in the relationship, then directed is the way to go. If there's no real reason to measure the direction of the relationship, then use an undirected network. We'll learn how to prepare directed and undirected networks in this set of courses.