Chevron Left
Back to Getting and Cleaning Data

Learner Reviews & Feedback for Getting and Cleaning Data by Johns Hopkins University

8,055 ratings

About the Course

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data....

Top reviews


Invalid date

A lot of insight and practical knowledge of cleaning data that is available in many places in the Internet. I loved this course and it took me 2 tries to pass the peer graded assignment. ;)


Invalid date

The 'cleaning data' part was explained pretty well... I do feel he could've gone into more detail for the 'gathering data' part- especially the webscraping part. Other than that, great course!

Filter by:

1 - 25 of 1,308 Reviews for Getting and Cleaning Data

By William S

Feb 4, 2018

It's not really acceptable to make students google new things in order to pass the quizzes. Quizzes should asses knowledge gained through the reading and lectures, not our ability to learn via Google.

By T M

Feb 1, 2019

There is a huge disconnect with the material and the HAR dataset exercise. I would suggest that there is some help with smaller exercises to help explain how to complete it. Yes, I know you're supposed to do research to help figure out problems, and I have. As a matter of fact, I have taken other courses on data wrangling to be able to figure out this problem. Merging two datasets makes this problem very confusing. Why can't you help guide students through a similar problem, instead of throwing to the fire?

By Matt K

Jul 17, 2018

Prepare to not actually learn anything, rather you're going to go on a journey through google to try and find obscure ways to install packages onto your Windows computer. Whether it be packages to read Excel files, SQL files, API's and more, you'll rarely have the time or patience to put any of this to practice because you'll struggle to just get packages installed.

For the record, I gave the previous two courses in the specialty a good rating, but this is clearly a low effort showing. It's a shame because I really think this might be the most applicable and useful content in the course.

By Sebastián L

Jan 12, 2018

The contents of the course are extremely useful. BUT if your programming experience is the two previous courses I think it's a very difficult course, since there are some issues that are outdated or not explained in detail or not explained at all.To do most of the quizzes it's not enough to repeat and listen to the videos. In many cases it's necessary to read a lot of documentation, search and apply new functions that are not explained in the videos, search forums and realize that the packages not work in the same way for the new versions of R, that some functions don't work correctly with RStudio but they do with RGUI, in other cases must be added a certain argument that was not explained in the videos (eg: for windows "binary" mode in the function download.file, which I still have no idea what it means).In short, a lot of things that make certain parts of the evaluations do not measure if you really learned what was taught in the course, but what has been your ability to handle yourself in a self-taught way. Which is a necessary skill in general (not only in R and Data Science) but that isn't what I expect this course teaches me.All this search is more difficult especially for Spanish-speaking people because it isn't enough to have a level in the language between intermediate and advanced, rely on Google Translator and rewind the video many times; to really understand, you have to have some technical language management.

By Bhawesh S

Apr 4, 2019

The course is good but the only problem is there is no explanation on how to solve different problems. there should be a live example of problems so people who have some trouble can get through

By Mohammad A A

May 13, 2019

There's too much of a jump from the theory to the practice. I had a difficult time understanding what was being asked of me.

By Thej

Nov 29, 2018

Horrible Assignment. So vague. So much puzzling to do. Students cannot waste their time in attempting to understand the loose vague assignment that was made. ASsignment took me 4-5 hrs of pondering and referrinf to online material just to freaking understand partially what the hell is expected of me to do. I hate this part of CoursERA IT is ugly!


Feb 16, 2019

Swirl practice in for Getting and Cleaning Data in this class is terrible. Most of my code working fine in R and R studio but Swirl would tell me "That's not the answer I'm looking for, try again" Then I type "skip()" Swirl will give me the exact answers that I just typed earlier.

By Les S

Apr 8, 2017

In my experience, this course has a huge gap. The lectures go through all manner of tools, commands, packages, etc. about cleaning. That's fine and probably necessary. It's like "this is a hammer", "this is a saw". That's fine and probably necessary. But then you get to the final project...

All of the sudden you're plopped onto a construction site with your hammer and saw and a pile of wood and told to build a house. You've never built a house before and don't have the slightest idea where to begin.

What's missing is some principles of "tidy data construction" in the lectures. When introduced to a new data set, what are some approaches to understanding that data? How do you develop a plan to build a tidy data set (i.e. how do you think about developing a plan, not looking for a step by step instruction set)?

It's disappointing that the only way for someone new to data science, R, etc. to tackle the problem is to read someone's third party monologue about how to make your reviewers happy.

By Pietro P

Jan 25, 2019

Modules 1 and 2 are horrible, so much to cover (several types of files) and so little actual information from the course. Yet, quizzes demand one knows every detail of each file type. Scripts and links are not available from the slides, although I did manage to find a repository with all scripts of the course (after much trouble). Why not make it available from the main page of the course? Anyways, some links were broken and could not be used to follow classes. Classes themselves are very dull, no interaction whatsoever.

By Javier R L G

Nov 17, 2018
















By Kyle R

Jun 1, 2020

So far the worst of the series. The material is good and to the best of my knowledge it is useful to serve as a baseline for a data science career. However, the lectures are structured very poorly. Many of the links provided are outdated. The lectures have blue "links" in them where the data is discussed or subsetted, but that data is not provided and neither is the link. How is this considered reproducible? Is that not a component of data science? Also, the syntax used is continuously changed (the assignment operator is not consistent) and spacing is also inconsistent between classes. Just not the quality I'd expect for a course taught by experts in this field.

By Erin A

Dec 9, 2019

This is my third course completed in the Data Science Specialization offered by Johns Hopkins. In all three, I feel the lectures, quizzes, and swirl exercises are easily accessible, and then the final project makes me feel like I am seeing R for the first time. One review of the course made a brilliant suggestions: go through the videos as quickly as you can, and then look at what will be asked of you in the final project. Then, go back through the videos and quizzes with a different set of eyes.

I feel like there is just so much to learn with R that sometimes you need a lens to help you focus on a subset of things that you absolutely will need, while getting a "taste" for all that R has to offer.

Overall, I am enjoying the courses, but the final projects are indeed a different kind of challenge.

By Mathew K

Dec 29, 2019

Pros: I learned a ton about cleaning data, the challenges involved, and how to tackle new problems. The quizzes and projects throw you into the deep end, asking you to import some data set and report some features of it, and you often need to figure out what package to use and how to work with it on your own.

Cons: The videos in this course are basically useless. You get a superficial coverage of how to use some package without a lot of explanation on what each part does, and basically all of the examples are broken, because the data have been updated, the site has changed/no longer exists. The instructors very annoyingly bat away any responsibility in the forum by saying it would be too expensive to fix anything. Too expensive? This isn't a Micheal Bay movie, this is a guy talking over a powerpoint.

By Jennifer S

Jul 22, 2020

This course is not for beginners. Much of the what is expected in the assessments is not covered in the lectures.

By Autumn C

Apr 9, 2020

Not helpful, can barely hear lectures, and lectures are outdated. I can't stand this course.

By Viktor K

Mar 23, 2020

There is a big gap between class material and practical exercises.

By Dan K H

Feb 2, 2016

Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.

See the videos for general presentation, but use the energy on the excersizes.

By Moshe P

Mar 13, 2019

The material in this course is very condensed. Data Table lecture was very much a copy of someone else' information on the web and was so terse, I would imagine even people from programming backgrounds had had to listen to it many times just to understand what was going . Expect to put in good 8-10 hours a week into this course if you want to become proficient in course' material.

By Akshay K

Apr 9, 2018

Week 1 can be more detailed as per what you expect in the quiz. The main idea of following a course is that we get all material about that topic together at one place. But here we are given just names of topics and told to research & read about them ourselves.

By Hugo S

May 3, 2020

This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.

By Nima A

Jun 8, 2020

A very useful course. The audio quality of some lectures (especially those by the main instructor) was not good. This course completes the sister course of R programming and they work together.

By Asit R

May 14, 2018

Loved the structure of the course. Learned a lot. The course project seemed a little funky , especially creating the codebook for an already existing set of data but was a useful teaching aid.

By Nelson M

Feb 24, 2019

This course is a nice introduction to the complex process of getting and cleaning data in R. It introduces you to some fundamental tools in the area, such as the dplyr and tidyr packages, and touches upon the most important aspects of data gathering and transforming. The final project is an interesting mix of technical challenges with a touch of intelligent practices in data handling and sharing. Whatever your level in R programming and data science, this course is an enjoyable hands-on experience.

By Kris B

Jan 28, 2019

More challenging than the "R Programming" course. The instructions for the final project were a little vague, but I think maybe this was intentional to promote discussion. Definitely give yourself plenty of time to complete the final project if you take this course. The principles of a tidy data set might seem like common sense, but in practice it's more challenging than you might think. I highly recommend taking this course even if you think you know what a tidy data set is.