[SOUND] Hello, welcome to the data mining capstone offered by the University of Illinois at Urbana-Champaign. I'm ChengXiang Zhai, with a nickname Cheng. I'm a professor of computer science at the University of Illinois at Urbana-Champaign. I'm one of the three instructors for this capstone. The other two instructors are my colleagues, Professor Jiawei Han and Professor John Hart. If you're taking this capstone, now, I assume that you have already taken four to five other courses in this specialization as shown on this slide. These courses should have provided you sufficient knowledge and skills to perform a number of data mining tasks involving the use of algorithms for pattern discovery, text retrieval, cluster analysis, and text mining as well as the use of techniques for data visualization. This capstone is going to provide you an opportunity to integrate all the knowledge and skills that you have learned from those courses to solve a real world challenge. In this video, we're going to give you a brief introduction to the capstone course including the objectives, the format, methods of completion, and project overview and schedule. Before I proceed, I should point out that there may be some differences between the details that we cover in this video and the actual details on the web, in the service. Now, so I will make a disclosure here. In case there is any difference, please refer to the syllabus and the syllabus always takes priority. So you should periodically check the syllabus for the most up to date details. When we design this capstone, we have the following objectives in mind. First we would like to give you opportunity to integrate and apply knowledge and the skills learned from individual courses to solve a real world data mining problem. Second, we hope you can use this capstone to understand and experience a typical data mining workflow. These two combined together will help you bridge the gap between what you learn from these courses and what's needed in solving a real world data mining problem in the job. We have also kept in mind the goal of providing you with the opportunity to do research in data mining so as to deepen the understanding of the technique or invent new techniques. Research is very important when you solve real world problems because what you have learned from these courses can be standard techniques or basic techniques, yet they provide a basis for you to explore more advanced techniques. Which are often needed to solve real world problems, where the format of this course is different from the other courses that you have taken in this specialization. So it's worth saying a little bit more about the format. The main difference is that instead of having you watch a lot of lecture videos, here you will be working on the data mining project, and it lasts for 6 weeks. Your main task is to mine the data industry set provided to you to discover interesting and useful knowledge. There will be weekly assignments each corresponding to a predefined task of the product. Each task is specified with both general definition, which would be generated enough to allow you to freely explore any interesting ideas that you want to explore and also a minimum task which would be specific enough to allow you to finish the task easily. Your submissions for these assignments will be mostly peer-graded, but some parts may also be graded by automated grader. And in the end you are also going to submit a final project with report to summarize and highlight the major results of a project and this will be due near the end of the course. There are multiple methods for finishing this course but the basic method is to achieve average score on all the tasks above 70% or equal to 70%. And if you can do that, then you will get a course achievement badge. Or you can also obtain a mastery badge if you can score 90% or higher in average score on all the tasks. So if you can achieve any of these then you will be able to pass the course. There are also a number of additional project badges that we offer. In particular, there are three kinds of badges listed here. The first is data mining task mastery badge. And this is a badge to those of you who have scored 90% or higher on a specific data mining task. The second additional batch is data mining competition leader badge, and this badge is given to the top 30% learners that scored high the leaderboard that we provided for a particular data mining task. Finally, we also offer best project award badges. And these badges are given to the top 10 best project reports as judged by industry expert committee. Next we will give you an overview of the capstone project. The dataset to be used in this capstone is restaurant reviews from Yelp. These are real world datasets that are available on the web. Your main task is to mine this data set to discover useful knowledge to help people make dining decisions. Now theres could be other uses of this data set. But the focus of this capstone is to mine this data set to discover knowledge that will help people make dining decisions. In particular, you will be working on your project to discover the following knowledge. First, you will construct a cuisine map to help people understand the landscape of all cuisines and their similarities. And this is to help people get some sense about what cuisines exist and which cuisines are similar to which cuisines. Secondly, you will be discovering popular dishes for a particular cuisine. This would help people explore an unfamiliar cuisine. Let's say you have determined to explore a particular cuisine based on the cuisine map, then you would be interested in knowing what all the popular dishes that people often order for that cuisine. And so this is designed for that purpose. Now next week you might have decided to try a particular dish then obviously the next task is to decide where to eat. And for this purpose, you are also going to make a recommendation of restaurants for a given dish, or ranking restaurants for a given dish. Also, you will be working on predicting hygiene conditions of a restaurant so that you can help people choose a place to dine. Here's a list of the tasks. So there are seven tasks listed here, but only six would require you make submissions. Task 0 is really to obtain the data set and then also the toolkit that we provide so it has to do with downloading the data set and toolkit in a package. And task 1 is to explore the data set and the second task, is about the construction of a cuisine map. Next task 3 is really about dish name mining. You are going to work on how to mine dish names from the data set. Task 4 and task 5 are both to mine popular dishes and to also rank restaurants for a particular dish respectively. Finally, task 6 is going to last longer and and this is about hygiene prediction. And you will work on a prediction task and we're going to have open leaderboard for you to submit your results and to evaluate your results quantitatively. So the report of this task together with the whole project, and that is your final project report, which will contain a report about task 6, will be due at the end of this course. So since this course is different from the other courses that you have taken in this special edition, we also would like to make some suggestions about the how to get most out of this course. Now the first thing that we want to suggest, which may be also the most important suggestion, is to try to leverage the forums for collaborative learning. As has already happened in all these courses that you have taken, forums have been very useful. But for this Capstone product course, which is more instructive than other courses, we expect that the forums to be even more useful. So, to leverage forums, to post your questions and also help answer other questions post by others. So that you can help each other to collectively learn about the data mining techniques and apply them to solve a real word problem. The second suggestion is to review some relevant lectures from previous courses if needed. And this sometimes is more efficient than if you just try to use your memory to record how a technique works. Unless you have remembered the details clearly, and it's often useful to review the relevant parts of the previous courses so that you can get a more complete understanding of the technique which can help you apply the techniques more effectively. The third suggestion is to leverage existing resources as much as you can. And you will find a lot of useful resources on the Internet including various toolkits, software tools, and data sets are other resources that you can leverage. And we encourage you to leverage all those resources as much as you can because in a real world job world application scenario, you will be able to do that as well and also you're encouraged to do that since this would minimize your effort. You can build on top of these resources to create something really new that doesn't exist. The next suggestion is, don't settle for one solution to a problem and try to seek the best possible solution if possible. This is very important for solving a real world problem because the quality of your solution actually matters. So it's important to think about whether there are alternative ways to solve a problem and make a comparison and then different options to figure out which one works the best, or propose new ideas to try to improve them. And in one of our tasks we actually would explicitly ask you to do that. Finally, just be creative and try out any new ideas that you might come up with. So don't be afraid of exploring your new ideas. The main difference between this course and some other courses that you have taken in the Capstone is that here there is a lot of freedom for you to explore new ideas and to explore your own ideas, and this is by design. Because we hope you can use what you have learned in our courses to solve a problem in a creative way. Because what you have learned from those courses may not be directly applicable sometimes to a problem but you can easily adapt it in a creative way to solve the problem. So we hope that you are enjoying this Capstone and good luck with your project. Thanks. [MUSIC]