Data Science Capstone

The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.

Status: Predictive Modeling

Status: Data Analysis

Course6 hours

Featured reviews

5.0Reviewed Sep 4, 2018

GREAT opportunity to apply all of the knowledge that I learned over the last year or so. Thank you so much for the guidance the opportunity to learn something new!!!

5.0Reviewed Mar 28, 2017

Wow i finally managed to finish the specialization!! definitely learned a lot and also found out difficulties in building predictors by trying to balancing speed, accuracy and memory constraints!!!

5.0Reviewed Oct 11, 2016

This class was a huge challenge for me, but it pushed me to learn a whole lot and practice many of the skills that I had learned in previous courses! I had a lot of fun, too. Thanks!

5.0Reviewed Jul 26, 2020

This course was a very much useful course which helped me delve into the ocean of data science and to understand all the basic and slightly advanced topics and concepts

4.0Reviewed Nov 27, 2018

I appreciate all the work they put into creating the course,. However, it can be frustrating to follow. It would be nice if they would structure it in a more organized fashion.

5.0Reviewed Jan 2, 2017

Simply brilliant ! Fantastic exposure to Natural Language Processing! Learnt a ton about various NLP algorithms for anyone who aspires to be a Data Scientist !

5.0Reviewed Jan 15, 2020

Great finish to an excellent specialisation. It's actually opened up some excellent career options for me and I am very grateful to the instructions and Coursera for providing the platform.

5.0Reviewed Jun 20, 2017

This course was very challenging. It resembled a real world task, where an idea is presented and it is up to the user to research methods and processes for the best outcome.

5.0Reviewed Mar 20, 2021

Course capstone is too tough for beginners. Requires Natural Language Processing knowledge in advance. Also the method used is outdated already since it was launched 4 years ago.

5.0Reviewed Jan 4, 2017

clear and easy course which gives a basic understanding of working with data. Great inclusion of the tools that are useful for presenting your data analysis.

4.0Reviewed Oct 22, 2018

The content of this course is very good and the assignments test the knowledge gainedThe video lectures are some times boring, losing focus.I think video lectures need some improvement

4.0Reviewed Oct 23, 2018

It is a lot of independent work, with guiding questions but no real help otherwise. If that's not your thing, this course is not for. If it is, you'll great a very fun end project.

All reviews

Showing: 20 of 324

Marcio Gualtieri

1.0

Reviewed Aug 21, 2017

The whole specialization is a bit of a mixed bag... Many of the courses rely too heavily on teaching R programming and not sufficiently on data science concepts (such statistics or machine learning). The instructors (specially Peng) spent way too much time detailing R syntax that could have been picked up by the students on their own from other resources available on the web...

The regression models and statistical inference courses are exceptions though: Together with the machine learning course, these are probably the most useful from the whole specialization.

The materials in this capstone project are way sloppier than materials in other courses by the way. They lack structure and feel confusing. I'm not even sure if the instructors tried to implement the proposed project themselves to have a base of reference. Feels like they were already growing tired of the whole thing and put the capstone project together in a hurry without much thought or care.

The theme of the project is indeed interesting (text-mining and NLP), but I think that would have been more productive for me to take a NLP course instead. You are going to use very little from what you have learned from the other courses in the specialization (for the most part the data product course) and you will need to learn text-mining and NLP from scratch on your own to complete the capstone (no videos nor materials available in the course on these subjects).

Also, if I was going to implement the same app on my own these days, I would probably use RNNs, not Katz Back-off and Markov Transition Matrices as in the capstone and I would probably use SparkR. Heck, I might not even use R, probably Scala or Python with Spark instead. In short, data science moves fast and this course already feels very outdated...

The instructors seem quite experienced in statistical analysis, so it's a shame that they decided to focus so heavily on R programming instead... That would have made the specialization more resilient to technological innovations in the field...

The specialization surely could be improved and these issues corrected, but all courses seem pretty much abandoned by the instructors. Most of the courses still have active "mentors" (volunteers not associated with Coursera nor Johns Hopkins) , but "mentors" seem to have lost contact with the instructors: For example, a couple of assignments require data that is no longer available (dead links) and "mentors" have provided this data in the discussion fora. I reckon that if "mentors" could contact instructors, the dead links would have been fixed in the materials by now...

The peer-grading doesn't work so well... Most of the submissions I graded were painful to review (extremely low quality). Not surprisingly, the graders were also pretty low-skilled. They can't even understand the requirements (and I suspect not even the English language) and they will take points from correct submissions.

I urge any employers to look at the actual code for this capstone from candidates given the general incompetence and poor skills of the students I graded. The grading criteria is pretty relaxed, so even though I would like to fail them, I still had to give them a passing grade. Such a weak grading criteria is detrimental to all people who actually have the skills and put hard work on their submissions. Many undeserving people will, unfortunately, pass and receive a certificate.

Thej

1.0

Reviewed Jul 31, 2019

I spent 80 hrs on this course. I hated so many things. 1. There was lot of uncertainty in the course. For example we didn't know how far to go with NLP. And I constantly came across in the forum where people were complaining about how there was 0 guidance and had no idea what to do. Saviours were those few people who put up help posts on the forum and sharded thier trecherous experience going down different paths. 3. The topic was already hard enough NLP, something I had no clue about and then there was this additional problem all the fucing time about memory. Jesus! One of the most painful courses primarily due to overload, lack of clear instructions and their refusal to edit one letter in the course since 5 years! Fuck them!

Roberto Garuti

4.0

Reviewed Dec 2, 2017

This class is challenging and a lot of people complained so I'll tell you my approach since I was able to complete it on the first try in my free time from my full time job. Not having any knowledge of Natural Language Programming, I found Youtube videos and presentations from the Stanford class taught by Dan Jurafsky and Christopher Manning. Study it up to the explanation of n-grams, it should be enough for the class. I completed the first weeks in few days so I had more time to actually build the model and the app (you'll need more than the scheduled weeks if you have no prior experience). I found valuable resources in the course forum. Then you're pretty much on your own, identify the best packages, how to use them, look on Stack Overflow when you get stuck. Start using a very small set of data so you can quickly build the model and the app until you get something that works. After that you can improve the model by using more data, finding the balance between processing time, app time response and prediction accuracy. Everyone understands the limitation of the project so give importance to quickness rather than accuracy.

My overall evaluation of the project is a mixed bag. The positive is that it introduces you to a new topic (NLP) and the goal is reasonable, it takes a lot of effort but it's not impossible and it forces you to learn something meaningful (something easier would have not made me learn something valuable). The negative is that there is no explanation whatsoever about NLP, which was never mentioned in the previous courses, so there's not much teaching or guidance. The involvement of Swiftkey is limited to providing the data.

Paul Ringsted

3.0

Reviewed Mar 21, 2019

The project topic itself is interesting, but longer (structured as 7 weeks); not much guidance until you find the right threads from mentors in the discussion forum from a few years ago or repeatedly google stackoverflow; it is much more technical than the rest of the course; and doesn't really use much of what was learned during the meat of the specialization's statistics/regression/ML courses, other than data science principles and tools (though new R libraries were needed). These issues aside, the project was an interesting challenge to complete nonetheless. Overall this specialization is now a few years old, and the plethora of 4 and 5 star reviews across all courses seem generous and out-dated. Materials are not being updated, forums are a mess of years-old threads with not much current activity; there is a feeling of waning interest and participation. This was clearly cutting edge material and course back in 2014-6, if JH/Coursera intend to continue offering it, the material needs some refresh and reordering, tougher grading rubrics (I saw a lot of inconsistency and poor quality which met the rubric criteria, alongside great quality work), and more active involvement from lecturers and mentors (and, please fix the typos).

Jose Alberto Valdez Crespo

2.0

Reviewed Apr 15, 2016

Very disappointed with this final course. Little to no support. Discussion Forum provides some level of help but you are basically on your own.

Very challenging to come up to speed with Natural Language Processing techniques if you have never taken any class about it.

My recommendation to JHU and Coursera is to add a separate course for NLP where you cover all the basics and then have the Capstone.

Tony Wang

1.0

Reviewed Mar 20, 2017

In my opinion, this course is a waste of time, it simply throws a bunch of links and terminology for you to google and research. The project is interesting but once again, you have to do tons of research and take up other courses to fill the gaps (might as well do the other courses instead of this one).

I do not recommend this course or the specialization.

E. CHRIS

1.0

Reviewed Feb 18, 2017

NLP is a total different thing and should be a course by itself. I would prefer a a large scale machine learning capstone where we could make models and it would fit better to real life situation! Through all the courses I worked hard only to reach NLP capstone? this doesn't feel right! Please fix it!

Piyush Verma

4.0

Reviewed Mar 25, 2018

On the Capstone Course, those who are reading this review I would say, skip everything (videos) and directly start writing codes and building the app. Otherwise this course is somewhat unnecessarily stretched too much, it could have been cut way short. I will tell you what I did: I skipped everything, got the gist of the objective, scanned through the codes and worked on my idea.

I started the specialization in December of 2015 and I am ending it today, March of 2018. I remember struggling with R in the beginning (I was a novice programmer writing dirty codes). Now I can't stop thinking about plethora of data product opportunities surrounding me.

N T

5.0

Reviewed Mar 5, 2018

Capstone did provide a true test of Data Analytics skills. Its like a being left alone in a jungle to survive for a month. Either you succumb to nature or come out alive with a smile and confidence.

Fulvio Barizzone

3.0

Reviewed Dec 4, 2020

It is nice when you arrive at the end of the specialization and I understand that being this the final step of the specialization that has to demonstrate that you are able to put together more or less all the things you were presented during the journey you have to be left a bit alone. However I think this capstone is now outdated. It does not mentioning new packages that are now available and performing very well (e.g. tidytext) and also some of the references mentioned in the "lessons" are not available anymore at the url given. I think at least these should be maintained.

Marcos da Silva Medeiros

5.0

Reviewed Feb 28, 2022

In data science, two of the best specializations taught by the Coursera platform are from The JHU and IBM. There are many comparative assessments to choose which would be the best choice according to the profile of the postulant. It's probably worth doing both. In any case, I ratify jhu's data science expertise, in the sense that it is quite rich, deepens very little explored topics in regular undergraduate courses, and is directed specifically to statistics and big data. It assumes that the student has a good basis of mathematics and statistics of higher level and good knowledge base in R. I say only good base, because the R is quite vast, and the specific classes of programming in R are sequenced, practically from scratch, but of well accelerated progression. In some more specific topics, I had to complement the knowledge with very objective and punctual parallel courses, such as the themes of the DataCamp platform, which serves well to unlock some punctual subject. StackOverFlow's help and feedback alsocomes as a great help at all levels of learning, including professor Roger Peng's first lesson: knowing how and where to seek help to move forward, that is, the first major lesson in data science is the humility of being an astronaut in a virtually infinite universe,and expanding every day. This is the most fascinating of Data Science, Biostatistics and R: the themes never run out and become concatenated in the face of the phenomena that surround us daily. From there, the sequence of 10 courses represents a long way (not so long for some) of development, feedbacks, evaluations and model building in R. Undoubtedly an excellent specialization, which is worth the investment, especially time. The final part of the specialization represents the last steps, but the steepest of the journey. In the latter, in particular, metaphorically you are confronted with yourself, a feeling of having been blindfolded in the middle of a dense dark forest, and now need to find your way back using what you have learned so far. For all specialization graduates it is a stage of relief, rather than celebration. The percentage of evasion of specialization from the first course is very large. In a master's and doctorate, I believe it is a specialization of great support, for the publication of studies and analysis of field data, in order to reach assertive conclusions from hypothesis tests. Upon completion of the specialization, JHU encourages the publication of the Certificate of Completion on LinkedIn and, with this, you receive an invitation from Professor Brian Caffo, after a brief verification of authenticity (around 1 week), to join a private Data Science group moderated by him, where there are excellent networking opportunities with other scientists,partnerships, job opportunities and project development.

Carlos Saquel

5.0

Reviewed Nov 19, 2019

I took this specialization a couple of months ago and did not comment as such. Now I turned around to remember some topics and started reading comments.

I found many comments that say the final project has nothing to do with the previous 9 courses and when I did it I thought the same.

Looking at it in perspective, I think the previous courses are absolutely necessary for the final project. The objective of carrying out a project with such characteristics is to apply the knowledge by oneself.

The first courses of programming in R, extraction and cleaning, and exploratory analysis are fundamental to understand the problem. In this case the cleaning has to do with the transformations using regular expressions and tokenization. The exploratory analysis should be done in any data science project, otherwise you may encounter surprises when implementing the models.

Statistical inference was necessary and closely linked to exploratory analysis, especially to select samples well and review distributions, since some machine learning methods may be affected by distributions. I must say that I did not see this when I took this course, but it was because of my lack of experience. Maybe there was a lack of guidance.

The algorithm I used was regression on the ngrams for simplicity, time and capacity of my computer, but it could have been combined with other methods such as neural networks or svm.

Implementing the model in shiny and then adjusting it because it was very heavy was also interesting.

As a summary, I really liked this specialization and although it was very hard and many times I did not know how to move forward (especially in the capstone), I think the challenge was important for my learning and I was very entertained.

Ken Wood

2.0

Reviewed Nov 16, 2020

To tell you the truth, when I started this capstone, I felt like I was thrown into the deep end of the pool. You are asked to build a NLP app using Shiny and, unfortunately, most, if not all, of the concepts required to design and build the app are not covered in the earlier courses in the specialization. Can you say 'Google'? I would have liked to have seen the instructors walk through the relevant concepts required to successfully complete this project. The videos consist of Dr. Peng basically saying "Good luck!"...a little lame if you ask me.

Jesse Sharp

1.0

Reviewed Apr 29, 2016

Coursera lost my thoughtful 2-star review so I am replacing it with this. I learned a lot through my own efforts and through the efforts of students who bothered to post in the forums. The one mentor disappeared half-way through the course.

Pablo Rueda

5.0

Reviewed Feb 20, 2021

I can't just finisih this specialization without commenting my personal experiences regarding this. I have taken several courses related to data science; however, this specialization has set the quality in such high standard that I feel dissapointed with most of the courses around the web.

I'm a technical guy who has most of his professional experience in academy, I know the process of creating new courses, of summarizing complex topics into understandable concepts but the most important is to show the application of the kwoledge into real world projects. Yes, all of this what is what I got from these courses.

Does the specialization worth?. Abosolutely yes!, the amount of knowledge that you would get is huge!. As everything in life, it will depend on how much effort you invest on it, there's a big part of the specialization that depends on you and if you are thinking on becoming a Data Scientist, you better get used to spend lots of time on doing your own research.

Finally, I just wanna let you know that I have started my career as a data scientist and I had several interviews with many companies/clients that were amazed by the quality of my portafolio that I built using all the final projects of this specialization.

I'm really greatfull with all the people involved in creating this specialization and I will absolutely recommend this to anyone who is decided to become a data scientist.

cheers,

Pablo

Jerome Cholewa

5.0

Reviewed Sep 13, 2017

Capstone very challenging. Minimal instructions force the students to do a lot of research on the subject. But this is extremely rewarding. Doing is good job is possible (well, my grade is still pending at the time of this comment!) and makes students take a huge leap forward in data exploration, data cleaning, setting up a strategy for analysis and algorithm, make an Rpresentation, create an online app (by the way, I also created an small app for my company thanks to this training, especially the "Developing Data Product" course).

Ken Koch

5.0

Reviewed Jun 16, 2017

This class provided a good background on the principles and process of Data Science and related research. The R material was very good and the assignments and capstone project will force you to become a good R programmer. The statistical analysis materials were also very thorough. Overall, the courses were well taught and the material was relatively easy to follow and learn.

Fernando Simão e Silva

5.0

Reviewed Jun 17, 2017

Honestly, there is very little guidance for the project and it deals with a whole new type of data: text. That's when you find out that working with quantitative data, like all the previous courses, is easy. I got my ass kicked throughout 3 sessions in order to finish this thing. But you know what? Maybe that's how it should be for one to learn something.

Ben Straub

5.0

Reviewed Apr 19, 2018

Great times! It took me almost four years to get through this!! I had a child, sold a house, went to graduate school in statistics and I'm about to graduate. The DSS classes gave me a lot of great tips for graduate school and really cool reports, apps, ideas to show off to potential employers. Just got to get that job now!!

Francesco Chiaveri

5.0

Reviewed Jun 5, 2018

In my opinion this last course is a great way to conclude the Data Science specialization, because not only it "forces" you to apply a lot of lessons learned during the other 10 courses, but also because it gives you the opportunity to understand how important is to set the problem in a good way before trying to solve it.