Chevron Left
Back to Getting and Cleaning Data

Learner Reviews & Feedback for Getting and Cleaning Data by Johns Hopkins University

8,048 ratings

About the Course

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data....

Top reviews


May 2, 2020

This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.


Oct 25, 2016

This course is really a challenging and compulsory for any one who wants to be a data scientist or working in any sort of data. It teaches you how to make very palatable data-set fro ma messy data.

Filter by:

26 - 50 of 1,307 Reviews for Getting and Cleaning Data

By Bantwale D E

Oct 26, 2016

This course is really a challenging and compulsory for any one who wants to be a data scientist or working in any sort of data. It teaches you how to make very palatable data-set fro ma messy data.

By William G C

Nov 1, 2016

This course is amazing! I have spent the majority of my time in R merely doing analytics. This course taught me the tools needed to go out and grab the data that I need for those analytics.

By Andrea C

Jan 22, 2023

Unfortunately this course had lots of potential but it turns out to have just a lot of flaws. The material presented is so dense that it would have needed a much more detailed discussion: getting data from the web in CVS format, XML, JSON, databases, etc. All these different aspects should have been discussed in much more detail.

The second and most important flaw is the fact that virtually all data are out of date, and it is often impossible to follow along with the code, because the data is no longer available. Since the material presented is so much, it would have been useful to follow along, keeping the code for further analysis and for understanding what is actually going on, but that has not been the case, which is very disappointing.

Finally, some of the packages, like plyr, are presented so quickly and the code explained sometimes doesn't work as expected. Another example is xlsx, that does not work with MAC and requires some personal google research to fix the issues. I had lots of expectation for this course, as I was curious to understand better data scraping and the link between R and databases, but the course has been a total disappointment. Once I have finished the specialisation, I will have to find some other courses to complement this, which is also extremely disappointing. From such a prestigious university, I'm afraid I had expected more.

By Maria S

Mar 29, 2021

I did not get anything out of this course. This course was pointless because it wasn't really a course just a random scavenger hunt. If I wanted to wander around the Internet aimlessly trying to solve random problems by hacking away, I could have just done that on my own. I signed up for the class because I was looking for a structured way to learn the content and get in some exercises to practice & drill in the skills learned. This course is a waste of time -- if you are interested in learning R, go through some tutorials online. If you want to learn data science principles, try one of the other Data Science specializations. If you want to mimic this class but have more fun, pick some problems that you are interested in, find some data that could help you solve those problems, and try to clean that data.

By Narin P D

Jun 29, 2018

The course is very helpful when it comes to exploring commonly used R packages and learning certain best practices involved in data cleaning. I'd definitely recommend it to any data science enthusiasts. One area with slight scope for improvement could be the final project. The instructions are quite open to interpretation, which means that the final grade which you get via peer review is always going to be debatable. Other than that, I have no complaints whatsoever :)

By Marc P H

May 18, 2021

It was an effective course, in which we were given the right amount of knowledge to know how to find information. R is a difficult language for me, (I'm a C++/Java and Rails developer) but the projects increased my confidence and my ability to find the information I needed. That being said - the course needs to be updated - many of the links were 404s.

By Anton

Mar 20, 2016

great course, requires a little bit of programming background with no rigid specifics though.

By Alessandro V

May 22, 2020

I found this course very useful for my learning needs, nevertheless I have a remark about this course. The timing estimation provided for each section are quite inaccurate, for instance: 3h for a swirl exercise are really excessive, may be 45 minutes are more realistic, but the main problem is related to time underestimation ! I mean, especially for the final assignment I spent more than 20h for completion and part of this time has been used to convince myself that a negative standard deviation was acceptable for the assignment goals. The provided estimation instead is 2h (<< 20h !!)

By cristian b

Mar 8, 2022

this course really explains all the programming tools available to get and clean data from different sources, however, i feel it is missing some extra activities to consolidate the knowledge they are sharing in the course.

By Raw N

Apr 7, 2017

Would have preferred if there were programming assignments that incorporated reading from data sources on the web.

For those planning to take the course, note the following:

*The course covers reading data from a myriad of sources, but largely in passing superficial detail. These sources include XML files, mySQL databases, HDF5 files, csv files, txt files with various formats (for example fixed-with files), JSON objects, and web API.

However, the course project only involves reading data from several txt files and combining them into a single R dataset.

Course topic order: In the first two weeks of the course, a lot of information is glossed over in passing- this information involves reading from the various file formats mentioned above. Week 3 involves subsetting, sorting, reshaping and merging data. Some of this may be review for you if you've taken the R programming course or the "R Programming Environment" course in the "Mastering Software Development in R" specialization. Week 4 involves string manipulation, regular expressions and working with the Dates. A lot of this is covered in Roger Peng's ebooks "R Programming for Data Science" and "Mastering Software Development in R" (both are freely available- google them).

Assessments: The only assessments in the course are 4 quizzes- each of which involves about 5 short programming exercises- and a final project which only involves topics from weeks 3 and 4 (specifically- subsetting data, sorting data, reshaping data, and working with regular expressions). So you can do the course project without understanding anything covered in weeks 1 and 2 of the course.

Mentor David Hood is fantastic for providing valuable resources to aid you with each assessment and so is Xing Su for providing a complete set of course notes. USE THE DISCUSSION FORUMS IF YOU GET STUCK!

By Vladimir C

Apr 9, 2021

Although the subject covered is important, and I learned something, I cannot recommend this course. The course is 7 years old and is badly in need of updating. R language is very dynamic and rapidly evolving and the course covers many packages and functions that are deprecated, retired or superseded by newer, more efficient tools. If this is meant to be an online course, it needs to stand the course of time or needs to be updated regularly. Data sources used as examples were from webpages no longer available. There is no expectation that they will be after 7 years. A different approach is needed for an online course. I spent significant amount of time troubleshooting outdated course material on user forums and searching the web. If you read user forums, you will see lots of frustrated people commenting on this. Unless, the course is recently updated, l recommend learning the material using a more up to date course.

By Pamela M

Feb 26, 2016

I would have given just one star except the swirl() assignments are actually very good. The videos are just a (poorly) narrated glossary. Topics I learned in another course were presented here in such I way I actually got confused. Can you imagine? my knowledge was actually worsened, not improved by thus course. (!!) // If the swirl() functions were made the centerpiece of the course, and the videos were described as just a narrated glossary, at least our expectations would be in line with reality. // Even so, I come to Coursera because I WANT to be taught by an instructor. If I'd wanted a curated list of tutorials so I could teach myself, I would have done that already. Anyone who pays for this should get their money back. NOT recommended for beginners. // I going to complete it because I'm stubborn that way, but it is an unpleasant experience for me and everyone within earshot as I have to vent my frustration often just to make it through. // After week 2 I resorted to just reading the pdf of the slides and stopped watching the videos. The videos added NOTHING to my understanding. More often than not they put me to sleep. And what's worse, the narrator mispronounces "attribute". There IS a difference. I atTRIbute certain ATtributes to native speakers who mispronounce important vocabulary.

By Liam C

Jan 29, 2020

Week 1 and 2 are completely worthless. They're cursory 5-10m introductions to topics that show you HOW to start to do something, but don't explain any commands or what is going on, it's just instructions to follow. This leaves you completely unprepared to do any actual work. Then you get the assignments and you basically have to go learn everything independently. The course info is useless. I skipped these. When I want to do the type of work they cover, I'll watch some tutorials and read documentation to actually learn it. They need to focus in on one or two topics (e.g. APIs, MySQL) and actually teach you the basics of them. The lecture videos even use weird syntax without explanation (e.g. using = instead of <-. Using par(), etc.).

Like the other courses in this specialization, you'll spend almost all of your time learning independently, and not using any of the materials provided. The discussion board is sometimes useful, but you can see how little work is done to improve the course there, as people point out errors and issues which are still outstanding months/years later.

By Md. Z M

Jun 8, 2020

Pros: After putting in many hours of effort in understanding the problem statement and then actually solving it, the sense of achievement is fulfilling. I learnt a lot of skills in this course. Those skills are very important to understand the data before start doing the analyses, but are usually ignored when data science is taught to a beginner.

Cons: The course project is extraordinarily difficult and you won't get any help from the discussion forums as there are no TAs live. However, there are some threads that can help understand the problem statement. So, sift through the thread dump to find the topics relevant to you.

The quality of the video lectures are very bad; many of the packages referenced in the lectures are outdated, and require you to search for its alternative on your own, which is helpful in the long run, but demands many hours of googling and reading through the documentations.

Overall, I would recommend this course for understanding the skills required in data cleaning.

By Alex F

Dec 20, 2021

The content on downloading files needs to be explained much better. Including more practice with the different file types would have been great. Also needs an demonstration and lecture on what makes a good codebook and readme file. The content with dplyr was really well done though. For something so important in data science I would expect this course to have been done so much better.

By laurent h

Nov 18, 2020

Content is fundamental but teaching was under expectations

By Anthony B K

Jun 16, 2021

This was, by far, one of the worst courses I've ever taken. Considering that I have three degrees and completed military PME, I've taken many. The content of the course was significantly out of date. If you're going to teach a course in computer science or on a programming language, you should be updating your lectures at least annually. The websites referenced in the lectures here were either missing or they had changed to the point that the "examples" that were presented were useless. That means that at best, those lectures were a waste of time. If I wanted to spend hours and hours on the web trying to figure out what you meant to do, then I could have bought a book and taught myself this material. Secondly, some languages (including R) are actively being developed; this means that because the lecture material was so dated, the methods presented were (in some cases) obsolete. From a presentation point of view, the lectures were sub-optimal because the slides themselves were just images. Having to re-type long lines of code where you can easily make "fat finger" mistakes isn't helpful; those slides should contain text that can be copied and pasted into either notes or into an R session. I'd also suggest a better microphone or better sound levels, but that's minor compared to the terrible content.

By Neil J

Jul 23, 2016

R is really just the worst, and the instructors do not make it better. The code in this class is unreadable:

- too many one liners, because "it's faster to write", though harder for other people to read

- variables are named cryptic things like spIns or x, rather than names with meaning (eg,, again "because it's faster to type"

- way too many cases of "there is more than one way to do it", which just makes things confusing because the other ways tend not to be equivalent

What I'm most concerned about is that I've seen lots of poorly written code in many different languages: Java, C++, C, Python, Perl, and now R. But I've also seen really well-written code in all the languages *but* R, I have yet to see any code in R that is flexible, maintainable, and clear. Which leads me to think that no such code exists, or it's so rare that it doesn't matter. It is clear to me that if I am to do data analysis, then I will need a different set of tools; but because this specialization is taught entirely around R (the lectures are about R, not about higher-level concepts), then this specialization is not useful to me.

By Jake S

Jun 29, 2016

There is a lot of fluff in this course and at the same time it assumes that you have knowledge and skills that are not covered in this course or in the previous two (e.g. github). I'm really disappointed in the quality of this course--specifically at how vague many of the instructions were in the quiz questions and the final project-- and that most the time when explanations were asked for on the message board the professors just did some hand waving and said that figuring it out was part of the assignment. That isn't teaching (online or otherwise). And if your instructions aren't clear, you aren't doing the job of an instructor when you pass the buck and try to sell it as "part of the learning experience." I hope this fall off in quality isn't reflective of the rest of the courses in the data spec.

By Maria D

Oct 13, 2020

If you are wondering is it worth paying - the answer is "NO".

Course is badly outdated, lectures are useless and even do not help to complete quizzes. Too much of a real life - information is old or incomplete or wrong and you need to sort out dozens of additional sources looking for an answer.

I suppose that that the reason why we want to learn before going for real tasks is that it is much more productive to go step-by-step, using reliable instruments, and proceed to troubleshooting only with the good knowledge of working solutions.

This is not the case with this cource, here you need to troubleshoot from the very beginning. That is an exercise in frustration and googling, seriously.

I was going to take the entire specialization, but I changed my mind and stop now.

By Yusof A

Mar 17, 2021

Horrible lectures which have not been updated even though the websites that are referenced may have changed and options on those websites for data required for the course may have changed.. for example, there is no way to download as excel using "download.file" a file that is only available as .csv since the excel option was removed from the time these lectures were made. I finished the first 2 courses and had high expectations from this one... started off well but in the middle of week 1, we realize this can be a very frustrating experience////well, the pdfs could have been revised and updated.. but this is probably the same material from 7 years ago with the same websites references from then. Worst course I have encountered on Coursera till date.

By Lindsay E M

Jun 9, 2020

The first two courses in this specialization were good, but the third course, Getting and Cleaning Data, was honestly very disappointing. The lectures are extremely out of date (made in 2013, and it's already June 2020...), and a lot of the code in the lectures and examples no longer works correctly because of this. Beyond that, the "updates" posted by the mentors in the discussion forums are also out of date (2016) and have limited usefulness. This is a course that is meant to teach you how to acquire and clean data in the R program, and methods and technology from 7 years ago are not the standard that I expected - technology constantly changes and updates, and this course should reflect that (but clearly doesn't).

By Ash S

May 3, 2021

So much of the material is out of date. As other people in the forums have mentioned, the course doesn't cover the necessary information needed to succeed and is also at a much higher level than listed (course says beginner level, but it's not). I have since switched to a different course and there are so many basic things that were explained that never were in this course. Even after taking a different intro level course, this course is still too difficult for me. This should at least be listed in the information so that people don't waste their time and money.

By Daniel G

May 12, 2022

This course was the final straw for me on Coursera. I will not be continuing my subscription if this is the quality of instruction I can expect to receive, which is none. The disconnect between the lectured material and what is expected on the quizzes with no instruction in between is just mind-boggling. And to see that students have been complaining about these issues in the forums FOR YEARS and the instructors have done nothing to update or change their material has put me off entirely.

By Laura G M

Apr 22, 2022