[MUSIC] So what we're going to do now is looking an example project that hopefully will inspire you and get you really excited about this capstone. You worked really hard in the courses and specialization up until now and you're ready really to put all of the skills that you've acquired to use in answering a question that care about. So by the end of this video, you'll have seen one example of a project that could be done by someone in this capstone. But hopefully, you'll also get your creative juices flowing about other projects that you might want to think about and implement a design as your own capstone project. So as we mentioned in the introduction, this capstone really is thinking about social networks and the connections that we have with other people in our life and in the world. At this day and age, we have the amazing phenomenon of social networks online and there are all of these different platforms out there like Twitter and Facebook, and Google+ that facilitate this interactions that we in other people through our computers and online. And what this also gives us is a lot of data that exist about how people connected to one another. So in this capstone project what we are going to do is dive in to some of the data. And we think about how we can use graph algorithms and data structures to answer questions about what that data might look like on a local, but also on a global scale. Now these networks are huge many, many millions and sometimes even billions of nodes or people represented in these networks and relationships. And so its really hard to grasp what the structure and the global behavior of this network might look like. But that global behavior is really important. Think about how information spreads in this community. And the worldwide community thinking about videos that become viral online or thinking even about how news of an epidemic might spread. Or even just how the next hot tech item gets buzzed about from one place to another, and people hear about it in various places around the world. And so we'd like to think about how the personal connections that we might have on a pair by pair basis translate to some global trends. And we might think about how to visualize those trends, and see some patterns. So for example, if we're looking at a really big visualization of a network, we might notice that some of the nodes are somehow they seem closer to one another, they seem like they might be clustered more than nodes that are not part of that community. And so we might ask is there a way do detect communities inside a big social network and maybe if we have a lot of informations. So maybe if this network represented people on the world then we knew something about they're geography where they were located. Maybe we'd be able to deduce communities based on people nearby one another or more likely to be friends. And we can try to color their graph based on their locations. But maybe we don't have that data people sometimes can be a little private about sharing their data. And so maybe all we know is whether two people in our network are connected or not. Can we still deduce information about these clusters, these communities? And so let's ask specifically about UC San Diego here on campus thinking about can we find the communities amongst our students? Now what's nice about restricting our attention to just UC San Diego students is that there is a really great data set from 2005. Which is a little bit old but it still has a lot of very interesting data in it that really took a snapshot of the Facebook friend network on campus at that day. Now friendships of course evolve over time and so all of these social networks that we're thinking about will change from one time to another. But with this data we just have a snapshot of how the relationship look like in September 2005. Now Christine and the next video, will tell you a little bit more about Facebook and the details of what those edges look like and what the graph looks like for Facebook. But for now let's think about this data is a proxy for relationships between our students on campus here at some moment in time. And there's a lot of students on campus here, so this particular data set that we have. We've got quite a few vertices and quite a few edges and all we know about these people represented by vertices is whether they're friends with one another. We don't know what high school they went to, we don't know what clubs they belong to, we don't know which residence they live in. But all of these things might contribute something to whether they're in one of these clusters, these communities or not. So now we have to think, now is there some way to try to discover community. So one way to answer this question would be to look for connected components. So if we have a graph structure, then we know that we can partition that graph into components where a component will include a whole bunch of notes that are reachable from one another through puzzles in the graph. And it might makes sense to say, well maybe a community is all one that are friends of friends of friends of friends, and so they are reachable from one another through these puzzles of friendship. The problem with social network data or actually pretty amazing, it's not really a problem at all, is that when you think about the world that we live in especially if we restrict to people who are active in these online social networks. We're all not so far apart from one another in terms of this friends of friends of friends of friends behavior. Turns out that for most of these social networks you'll have one giant component that most of the vertices in the graph belongs to. And so most of the vertices in the graph will be just some path away from another vertex in the graph. And so if we just look at the connected components of the graph, we won't really get a very fine description of that inner structure. We'll just have a whole bunch of advertise's grouped in one of these giant components, it's actually a technical term which is kind of funny. And then a few outliers who aren't really connected to the other people. Which is interesting and if you want to think more about drawing components it might be the basis of an interesting project. But for our example right now, we'd like to find those inner communities and for the drawing components not really going to help us out. But maybe if we think about the meaning of community as not just I'm a friend of a friend of a friend of a friend. But somehow capturing this fact that within a community we should have more links to one another than to people who are outside the community. Maybe if we can formalize that notion of being closely connected to other people in my community, then we can write that down as a graph property that then we could algorithmically look for in our social network. And this was done and in fact, there's different ways of describing what it means to have more links within a particular community than outside to it. And so different definitions of this connectedness and clustering behavior might lead to slightly different algorithms. So we have a reference for you that might be really useful as you're exploring these questions that you want to ask about social networks and look for algorithms that might give you some of these answers. And this book by Easley and Kleinberg has a very nice overview of the research that's really been done in social networks and algorithmic questions about social networks up until this point. And one example that they present in the context of community finding is this study that was done quite a ways back by Wayne Zachary. And in this study, he was analyzing the relationships amongst a karate club on a campus not UC San Diego, but on a different campus. And noticing that for this particular karate club, you could find a difference between the people who are represented by vertices in white from those who are represented by vertices shaded as red. And it turned out that there were more edges interconnecting the ones that were shaded in white from those that were shaded in red. And it turned out that this club really ended up having a schism. It separated into two groups, two separate karate clubs some time after this network was analyzed and it turned out that at the heart of the white shaded vertices was one person who was the original founder of the club. And at the heart of the red shaded vertices was actually the instructor in the club, the karate instructor. And it turned out that these two people had somewhat different visions for how the club would carry out its activities and what the mission of the club was. And so they each ended up splitting up into two separate clubs. And when the researchers followed up with this community of people, they noticed that all of the white shaded vertices actually ended up going with the founder of the club. And all of the red shaded vertices indeed went with the instructor with the exception of one person. And there was a psychological reason for that person as well. So it's really amazing how just the graph structure of the relationships between individual and people modeled in this abstract way can give us insight into how the network will evolve over time. And so what we like to do is have a description of what does it mean for those white vertices to be grouped more and the red vertices to be grouped more. And that definition will lead to algorithms that can figure out those clustering. We'll talk more about those next week. But for now, let's think about applying that algorithm to that UC San Diego data in 2005. And what we see is that using that algorithm, we can find sub communities within that big giant component of people who seemed to have more of a clustering behavior to one another. And then we can ask ourselves a question, what brought these people together? And we can go back to the people on campus and ask them. So did you go to the same high school? So are you both engineering students? And from the graph structure give us insight about how these people interact? So how do we do this? How do we answer these questions? And at the heart of these interesting questions, I really foundational algorithmic computer science mathematical notions. So we're modeling the social network piece of graph. We, as part of the algorithm, we'll do a breadth first search which should hopefully be familiar to you from our earlier courses. We'll be using really interesting data structures like priority queues and queues to implement the algorithms that are necessary for separating out a big graph into its subcommunities. And so hopefully, you're as excited as I am about the kind of questions you can answer about social networks using the tools that we've developed in this course. Sorry, developed in this specialization for courses that you can work through up until now to get to this point. And in the next video, what I'm going to do is outline the steps that you'll take in developing your own project throughout this capstone course.