Welcome to the Capstone Project for Big Data! In this culminating project, you will build a big data ecosystem using tools and methods form the earlier courses in this specialization. You will analyze a data set simulating big data generated from a large number of users who are playing our imaginary game "Catch the Pink Flamingo". During the five week Capstone Project, you will walk through the typical big data science steps for acquiring, exploring, preparing, analyzing, and reporting. In the first two weeks, we will introduce you to the data set and guide you through some exploratory analysis using tools such as Splunk and Open Office. Then we will move into more challenging big data problems requiring the more advanced tools you have learned including KNIME, Spark's MLLib and Gephi. Finally, during the fifth and final week, we will show you how to bring it all together to create engaging and compelling reports and slide presentations. As a result of our collaboration with Splunk, a software company focus on analyzing machine-generated big data, learners with the top projects will be eligible to present to Splunk and meet Splunk recruiters and engineering leadership.



Big Data - Capstone Project
This course is part of Big Data Specialization


Instructors: Ilkay Altintas
Access provided by Merck
18,535 already enrolled
(400 reviews)
Skills you'll gain
Details to know

Add to your LinkedIn profile
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate

There are 7 modules in this course
This week we provide an overview of the Eglence, Inc. Pink Flamingo game, including various aspects of the data which the company has access to about the game and users and what we might be interested in finding out.
What's included
4 videos4 readings
Next, we begin working with the simulated game data by exploring and preparing the data for ingestion into big data analytics applications.
What's included
6 readings1 assignment1 peer review
This week we do some data classification using KNIME.
What's included
4 readings1 peer review
This week we do some clustering with Spark.
What's included
2 readings1 peer review3 discussion prompts
This week we apply what we learned from the 'Graph Analytics With Big Data' course to simulated chat data from Catch the Pink Flamingos using Neo4j. We analyze player chat behavior to find ways of improving the game.
What's included
2 readings1 peer review
What's included
1 video1 reading
What's included
1 video1 reading2 peer reviews
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructors


Offered by
Why people choose Coursera for their career




Learner reviews
400 reviews
- 5 stars66.25% 
- 4 stars21.50% 
- 3 stars5.75% 
- 2 stars1.75% 
- 1 star4.75% 
Showing 3 of 400
Reviewed on Aug 6, 2017
Watch out for week 4. This is the hardest one out of the whole specialization
Reviewed on Sep 1, 2019
I learned a lot about applying the big data knowledge gained in the previous courses. Thank you!
Reviewed on Jan 6, 2021
A lot more work and time than expected. Some issues with software tools as per expected.
Explore more from Data Science
 - University of California San Diego 
¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.




