Chevron Left
Back to Getting and Cleaning Data

Learner Reviews & Feedback for Getting and Cleaning Data by Johns Hopkins University

4.5
stars
7,923 ratings
1,310 reviews

About the Course

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data....

Top reviews

HS

May 2, 2020

This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.

DH

Feb 1, 2016

Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.\n\nSee the videos for general presentation, but use the energy on the excersizes.

Filter by:

1051 - 1075 of 1,273 Reviews for Getting and Cleaning Data

By Mrunal 1

May 8, 2021

-

By Dimitri d

Feb 23, 2017

.

By Sergio R

Nov 27, 2016

4

By sugyoo

Oct 22, 2016

A

By Borja C

May 7, 2016

.

By Miguel C

Mar 25, 2020

I really enjoyed and learned a lot in this course.

I feel a lot more comfortable with looking for and reading data. I learned how to clean data and getting it ready for further analysis. I think the course project was particularly good for completely understanding the process of tidying data and all the aspects it involves, such as writing a code book and a README file for accompanying it.

Furthermore, I believe I further developed my R programming skills, by learning how to code new things or things I already knew but in a more efficient way, by using new packages and techniques.

Moreover, I found Professor Jeffrey Leek quite engaging, very easy to understand and I had complete confidence in his knowledge on this subject.

However, I believe the course is slightly outdated. I was often disheartened and frustrated by not being able to replicate what was being done in the lecture videos. For example, there were many links that did not work anymore and sometimes information that simply wasn't correct anymore. I found the discussion forums and many mentors responses to be very helpful. I think this can easily be fixed by writing up an errata or updating the lecture videos.

By John O

Apr 20, 2022

The course had good content and lectures, and the quizes were an appropriate level for the material covered.

Some problems I had:

The lectures are quite old so many of the data sets used in the examples are unavailable now. This makes it difficult and frustrating to find new data sets that have the characteristics needed to follow along with the exercises.

The first swirl exercises has problems with some of the files that make the lesson malfunction and therefore can't be completed. Looking through past reviews and discussion threads, this has been a problem for years that still hasn't been fully addressed.

The final project was quite difficult and I felt it was a bit beyond the level of the course. It wasn't as bad as in the R Programming course, but I still think that examinations and assignments should be at an appropriate level for the material covered in the course. What's the point in testing people on skills and material that you haven't taught them? What's the point in signing up to the course and doing all the lectures and exercises if we have to teach ourselves new skills using external resources to pass the final exam?

By Tomasz J

Dec 11, 2017

The course is teaches you some principles of tidy data and cleaning data but it's very messy.

There is no systematic approach to plyr and dplyr libraries. The teachers peak some functions from one and some functions from other library, but without any clear principle. It looks that prof. Leek and prof. Peng are presenting their favorite functions without consulting each other. They are doing it in the right way, though very confusing. On the other hand loading data is approached very encyclopedically.

Assignments not only check what was taught in the videos but also sometimes require new skills and going through stackoverflow etc. (e.g. codes to read fortran files). This is not the way how you construct good coursers.

Additionally some instruction in the final assignment are provided in submission part! (Expected names of the files should be provided in the assignment description, not on the submission page).

Prof. Peng and prof. Leek are very skilled, they know their job, but they don't know how to teach efficiently. Nonetheless if you are motivated to learn, this course may be very helpful.

By Rahul G

Aug 3, 2021

The course is great and essential to becoming a good data scientist. Provided me with valuable lessons in managing and processing raw data.

A major issue is that the course has not been updated in a while and has a lot of missing/ expired data sources. You will however be able to find and download the data if you do a bit of searching online. All the data used in the course is still available online and you just need to find it.

The course will also need you to do a bit of research and exploration on your own to use some of the methods specified in lectures. All the details are not handed to you in the video lecture. I have seen other people complain about this, but my personal view is that you will anyway need the skill to research and adapt in a practical scenario outside the course and this is the best place to learn this skill as well.

It would have been great for the course to be updated and made easier for beginners but alas. It will be good to have your basics in R, using libraries and a bit of data analysis under your belt already before jumping into this course.

By Guillermo A G

Oct 23, 2017

This course was quite challenging in comparison with the first two. I felt that the material provided by the instructors was not enough to approach the quizes and assignments, so it's necessary to spent a lot of time researching for your own in other sources. I struggled with the Course Project Assignment because I didn't understand what I was supposed to do exactly. Fortunately, the forum threads were really helpful. Nevertheless, the course's intention is very valuable and if you are patient and go all the way through it you will improve your data science skills, learn very useful techniques and habits, specially if you're a beginner. But I strongly suggest the instructors to make the course contents more explicit and helpful.

By John Y

Aug 4, 2017

Great class for an important piece of data analytics and data science. One issue I've been noticing with R compared to using Anaconda/Python is that a lot of the libraries required for the class aren't explicitly mentioned. That's fine if you're experienced with these environments and able to read error codes with familiarity. Minor annoyance to me when I run a script and realize I don't have a library installed.

I'd imagine though its extremely frustrating for beginners who may have written perfectly good code but haven't figured out that they simply need to install certain packages to answer quiz or homework questions. Perhaps having a full library or package list for Course 1 of this series will be helpful.

By Dan S

Oct 9, 2020

There was a lot to like about this course but there were several aspects that kept the course from being stellar. This course attempts to convey a lot of important information without connecting the dots. The videos did not align well with the Swirl exercises and the videos did not attempt to address foundational principles of the course topics. Rather, many topics were covered on a superficial level without in-depth examples that demonstrate application of the topics. I think the course needs to updated as some of the quiz questions and material seem to be out of date. The final project description had aspects that were vague and hard to interpret given the data set we had to work with.

By Michael M J

Nov 11, 2020

The material that's ultimately learned from this course is very powerful and provides a major milestone in one's data science journey. However, I had to mostly teach myself. The instructor's video lectures only teach at the most basic level, and the quizzes are much more advanced - thus creating a very large gap which I needed to fill by lots of self learning and research. The course project is a great opportunity to put the course's objectives into practice, but it wasn't explained clearly - it took me many hours to understand what needed to be done. All in all, I am satisfied with the course's material, but not the way it's communicated to the students. Three stars.

By Joshua S

Jul 10, 2020

Lecture content is very drab and filled with "...and here's another thing you can do...". I think it would be a lot more effective with more problems and various solutions. There should be a project every week along with the quizzes.

I also found the peer review process for the course project to be sub-par. I personally put a LOT of work into the course project, and put together what was a really thoughtful and well-written README and CODEBOOK just to have my project downgraded because someone wasn't sure the run_analysis did what it was supposed to. It's not a big deal but I think if an instructor saw it they would've found it very thoughtful, thorough and accurate.

By Jaymes P

Oct 23, 2020

This course was largely helpful and the instructor was clear in the lectures. However, much like many of the courses in this specialization, the quizzes and the final project were misaligned with what was taught in the lectures. For example, we sit through a whole week of lectures on the details of reading in many different data types, but never talk about fixed-width format once, it's never even mentioned. And that is what is on the quiz. The final project was challenging and fun to figure out once I was on the right track, but the directions were not sufficiently clear and I wasted a lot of time needlessly simply trying to figure out what the instructor wanted.

By Anton K

Aug 8, 2019

This is a very brief course, many of the topics deserve a much more thorough explanations. This part of the data analysis (i.e. data cleaning and acquisition) is in fact a complex subject and subjects are not covered in this course. There were also technical issues. For instance, the audio quality of lectures by prof. Jeff Leek is very poor. And the other major problem that I had with this course is the ambiguity of the requirements, although it wasn't difficult to finish. But if you are planning on taking this course, be ready to spend considerable amount of time on understanding the structure of the final submission's materials.

By Harris W

May 27, 2020

Like all courses in this specialization, there is an incredible lack of practice and application for a large amount of the skills taught in lecture. I would say that only about 60 percent, or maybe less, of the content in the lecture is assessed in the quizzes and assignments. Additionally, the peer review process is wildly flawed for the final project. I did not receive any constructive criticism from anyone, and I doubt they even truly looked at my code to make sure it worked. I don't blame them, they have little incentive. Rather I place blame on the system of grading and lack of feedback.

By Rashaad J

Jul 24, 2017

I have 2 key concerns with this course. First, I don't feel like the material presented adequately prepares you for the quizzes. For at least 2 of the 4 quizzes, I had to spend a substantial amount of time locating and reviewing other resources to answer the questions. My second concern is that for the final course assignment, there is a lack of specificity with the instructions. Not being able to answer a question is vastly different from not understanding what the question is asking and I found myself spending more time doing the latter (which is wasteful) and less time with the former.

By Constantin S

Feb 20, 2016

In some weeks only about an hour of input where several topics have already been covered in R Programming. That's very little value for money.

The final course project again feels like it's done in a rush and without another review: The submitted dataset should be automatically checked. It's simply impossible to derive from it whether the student did everything right, but it could be easily done programmatically. Some of the questions have wording and grammar issues that make it hard to understand. Also there is slightly contradicting instructions between the task and review description.

By Angela W

Jul 14, 2017

I did learn a lot, but I thought the first half of the course (getting the data) was very challenging.

What does annoy me though is that links aren't clickable, sometimes they're wrong, there are typos on the slides etc. The response to these complaints in the forums is that these lectures were recorded a while ago and it takes time to change things and so on - but for $50 a month, I don't think it's too much to ask that the course materials be kept up to date!

So honestly, I feel like I'm being ripped off a bit here.

I did really enjoy the course project though.

By Carla P

Oct 25, 2021

In my opinion, the lessons are just a basic overview of some concept and do not gives you the competences you need to pass the Quiz and the Peer Graded Assignment. Therefore, for most of the questions of the assignments, you need to look for the tools you need somewhere else in the web! On one side, without the lessons, yhou would probably not know what to look for on google, however the lessons are not enough to achieve a good grade in the assignments! Also the peer graded assignment takes to long to receive the evaluation!

By Luis P

Jan 25, 2018

The most challenging so far of the 9 courses on the Data Scientist track. Would like to see some errors removed from slides. Some parts of the lectures seemed rushed. Would like to see some of the non-self-evident usage of some functions to be described a little better in more detail. I found myself having to look at multiple online areas to really understand some of the functions that were glossed over. Otherwise, this was a very helpful course that should be taught to all disciplines involving any amount or type of data.

By Raymond B

Sep 27, 2020

The "Reading from..." lessons from week 1 and week2 were extremely frustrating, since we did not get much info on where we would see them most often or the benefit of using one over the others. Instead, we simply sat for hours listening to lectures moving from one type of document to the next before being handed the quiz. The dplyr and data manipulation lectures were great and I really anticipate using them frequently in the future. I think regular expressions deserved more lecture time/ practice.

By Chanchal D

Jul 8, 2020

The Course Design is good however what i give three stars is for the following reason

The Sound Quality is straight up very poor . i have to put my speakers to full volume to atleast make it clear and audible , which leads to other pc programs to cause loud noise with the same sound volume

Many Topics in the course like Factors etc were not clear in the tutorial videos and i had to most probably go out of my way to find the meaning and uses

Rest The Course Is Top Quality . Thank You For the course

By Paul R

Mar 11, 2019

This is really R part 2, getting into file/API handling, data frames, regular expressions etc. The specialization focuses on data frames though little coverage of data tables needed for the capstone. Some of the ordering of the materials was confusing e.g. this course revisits date/time handling which was started in the previous course. Assignments are interesting and Swirl exercises are useful. All in all, the combination of these R courses gets you up to speed.