Hi, I'm Indranil Gupta and I'm a Professor here at the University of Illinois. Those of you who have taken the Cloud Computing Concepts course, or courses rather, parts one and two, would recognize me. I was an instructor in that course. And with me here I have Reza. >> Hi, I'm Reza Farivar. I'm a senior software development engineer at Yahoo. I'm also an adjunct faculty at the University of Illinois. Those of you who took my Cloud Computing Applications course remember my face. >> So we're here to talk to you about the Capstone course for this cloud specialization. And I'm going to tell you a little bit about the goals of the course and we will tell you about why the Capstone is structured the way it is. So today in cloud computing and industry there is a wide variety of systems, there is a diversity of systems. And when you want to write code for storing large data sets or processing large data sets, you have a lot of choice. And some of these choices are more complex than others. Some choices involve working with multiple systems and integrating them, and worrying about whether or not this whole integrated system or system of systems are going to work at all or not. While other choices involve writing code in more contained stacks. So for the first think of an example like, working with HTFS or Cassandra and for the second think of an example like writing code in Spark for a Spark batch processing or for a Spark streaming. So we're going to give you a feel for both these kinds of programming as well as for storing data and for processing data, both batch processing as well as stream processing. And while doing this you'll get to use real data sets, and we'll make available to you real data sets from the Amazon Web Services' free data sets. As well as you'll be using the cycles on Amazon Web Services as well. >> So I'd like to expand a little bit in one of concepts that Professor Gupta mentioned here. We have a lot of open source big data systems out there. In some sectors, you have the dominant standards. For example for batch processing, pretty much the standard of industry is Hadoop. There are then other systems that you have so many different systems that it's actually kind of hard to choose from. For example, databases, right, or key value stores. You have something in the order of 250, I think, the last I checked, different systems. So it's actually not quite easy to pick which one is the best and what they offer. And then there are some domains like the new ones, streaming systems. There's still a lot of competition and there's no clear winner at the moment. So I think by taking a couple of these different systems, and we're trying to give you suggestions on which systems to use, these are pretty much either the dominant industry standard or what we think could have a good potential. >> Right. And like mentioned, there is a lot of choice out there in terms of what you can use in order to write your programs. And one of the goals of this Capstone is to give you a feel of the complexities involved in making each one of those choices. And this is a very fast moving area, systems are coming out all the time, and each system has a different model than the others, and systems are being improved all the time. So, if you look at, for instance, a particular system like Cassandra, its next version is going to be very different from its previous version. So this means that, as programmers in the cloud space, we need to be very adaptive, and adapt to systems as they're changing and as they're evolving. And part of that is experiencing the current complexity, and that's what this Capstone provides to you. >> And speaking of complexity, another very important part of dealing with the complexity of these systems is try to figure out how to set them up correctly, how to feed information from one of them to another environment correctly. In your previous courses, we gave you MPs, machine problems and everything was kind of set up. In this course we don't want to do that, we want you to kind of feel how practitioners in the industry really have to figure out how to get everything running together. And that's actually a view from trenches that you won't get if you don't get your hands really dirty and get everything done. >> Right. And in the previous courses, both in the cloud computing concepts courses and the cloud applications course, we gave you in many cases, code templates and unit tests, and things like that. Here, it's a lot more hands off. Because this experience is meant to be a lot more closer to what you might encounter as an employee inside one of these companies that's trying to store and process big data. So, we will give you specification documents, we will give you pointers, we will give you pointers to tutorials. But beyond that you have a lot of flexibility and a lot of freedom in making design choices and reaching the goal that we set for you in the Capstone. >> One important part of this course this Capstone course, when we were designing it was we wanted you to work on real world data sets to gain experience on something that is real and that is valuable for you. So that's why we chose a data set, in particular, airline information data set, to get some interesting questions and knowledge out of it. >> Yeah, and with the airline transportation data set, we will give you a series of queries that you'll need to execute, and you have a little bit of choice in there. And there will be more details in the specification document. >> Right. So, of course we are talking about distributed systems. And to get your hands dirty and gain experience with these distributed systems, you need distributed systems, right, the platform to run your code on. And that's why we have partnered with Amazon. >> And Amazon Web Services has been gracious enough to give some cycles to the students and more information about that will be forthcoming soon. >> So the last point I'd like to mention here is that, in our previous courses we had a lot of students enrolled, we had in the order of 50,000 or more students enroll throughout the courses. But you guys, now that you guys are seeing this video, you Capstone people, you are the cream of the crop. You are the people who finished up all the previous courses so we do expect a lot from you and that's why basically, we are giving you the design specifications and we are hoping to get fully functional systems. >> Right, and I'll only add to that that the Capstone is going to be challenging, it's going to be more challenging than you think it's going to be, so we highly encourage you to start early and start often, if that makes sense. But do start early because before you know it, deadlines will creep up. And with that, all the best and we will stay in touch online.