This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.
Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.\n\nSee the videos for general presentation, but use the energy on the excersizes.
By Narin P•
The course is very helpful when it comes to exploring commonly used R packages and learning certain best practices involved in data cleaning. I'd definitely recommend it to any data science enthusiasts. One area with slight scope for improvement could be the final project. The instructions are quite open to interpretation, which means that the final grade which you get via peer review is always going to be debatable. Other than that, I have no complaints whatsoever :)
By Bantwale D E•
This course is really a challenging and compulsory for any one who wants to be a data scientist or working in any sort of data. It teaches you how to make very palatable data-set fro ma messy data.
great course, requires a little bit of programming background with no rigid specifics though.
By Alessandro V•
I found this course very useful for my learning needs, nevertheless I have a remark about this course. The timing estimation provided for each section are quite inaccurate, for instance: 3h for a swirl exercise are really excessive, may be 45 minutes are more realistic, but the main problem is related to time underestimation ! I mean, especially for the final assignment I spent more than 20h for completion and part of this time has been used to convince myself that a negative standard deviation was acceptable for the assignment goals. The provided estimation instead is 2h (<< 20h !!)
By Raw N•
Would have preferred if there were programming assignments that incorporated reading from data sources on the web.
For those planning to take the course, note the following:
*The course covers reading data from a myriad of sources, but largely in passing superficial detail. These sources include XML files, mySQL databases, HDF5 files, csv files, txt files with various formats (for example fixed-with files), JSON objects, and web API.
However, the course project only involves reading data from several txt files and combining them into a single R dataset.
Course topic order: In the first two weeks of the course, a lot of information is glossed over in passing- this information involves reading from the various file formats mentioned above. Week 3 involves subsetting, sorting, reshaping and merging data. Some of this may be review for you if you've taken the R programming course or the "R Programming Environment" course in the "Mastering Software Development in R" specialization. Week 4 involves string manipulation, regular expressions and working with the Dates. A lot of this is covered in Roger Peng's ebooks "R Programming for Data Science" and "Mastering Software Development in R" (both are freely available- google them).
Assessments: The only assessments in the course are 4 quizzes- each of which involves about 5 short programming exercises- and a final project which only involves topics from weeks 3 and 4 (specifically- subsetting data, sorting data, reshaping data, and working with regular expressions). So you can do the course project without understanding anything covered in weeks 1 and 2 of the course.
Mentor David Hood is fantastic for providing valuable resources to aid you with each assessment and so is Xing Su for providing a complete set of course notes. USE THE DISCUSSION FORUMS IF YOU GET STUCK!
By Vladimir C•
Although the subject covered is important, and I learned something, I cannot recommend this course. The course is 7 years old and is badly in need of updating. R language is very dynamic and rapidly evolving and the course covers many packages and functions that are deprecated, retired or superseded by newer, more efficient tools. If this is meant to be an online course, it needs to stand the course of time or needs to be updated regularly. Data sources used as examples were from webpages no longer available. There is no expectation that they will be after 7 years. A different approach is needed for an online course. I spent significant amount of time troubleshooting outdated course material on user forums and searching the web. If you read user forums, you will see lots of frustrated people commenting on this. Unless, the course is recently updated, l recommend learning the material using a more up to date course.
By Pamela M•
I would have given just one star except the swirl() assignments are actually very good. The videos are just a (poorly) narrated glossary. Topics I learned in another course were presented here in such I way I actually got confused. Can you imagine? my knowledge was actually worsened, not improved by thus course. (!!) // If the swirl() functions were made the centerpiece of the course, and the videos were described as just a narrated glossary, at least our expectations would be in line with reality. // Even so, I come to Coursera because I WANT to be taught by an instructor. If I'd wanted a curated list of tutorials so I could teach myself, I would have done that already. Anyone who pays for this should get their money back. NOT recommended for beginners. // I going to complete it because I'm stubborn that way, but it is an unpleasant experience for me and everyone within earshot as I have to vent my frustration often just to make it through. // After week 2 I resorted to just reading the pdf of the slides and stopped watching the videos. The videos added NOTHING to my understanding. More often than not they put me to sleep. And what's worse, the narrator mispronounces "attribute". There IS a difference. I atTRIbute certain ATtributes to native speakers who mispronounce important vocabulary.
By Liam C•
Week 1 and 2 are completely worthless. They're cursory 5-10m introductions to topics that show you HOW to start to do something, but don't explain any commands or what is going on, it's just instructions to follow. This leaves you completely unprepared to do any actual work. Then you get the assignments and you basically have to go learn everything independently. The course info is useless. I skipped these. When I want to do the type of work they cover, I'll watch some tutorials and read documentation to actually learn it. They need to focus in on one or two topics (e.g. APIs, MySQL) and actually teach you the basics of them. The lecture videos even use weird syntax without explanation (e.g. using = instead of <-. Using par(), etc.).
Like the other courses in this specialization, you'll spend almost all of your time learning independently, and not using any of the materials provided. The discussion board is sometimes useful, but you can see how little work is done to improve the course there, as people point out errors and issues which are still outstanding months/years later.
By Md. Z M•
Pros: After putting in many hours of effort in understanding the problem statement and then actually solving it, the sense of achievement is fulfilling. I learnt a lot of skills in this course. Those skills are very important to understand the data before start doing the analyses, but are usually ignored when data science is taught to a beginner.
Cons: The course project is extraordinarily difficult and you won't get any help from the discussion forums as there are no TAs live. However, there are some threads that can help understand the problem statement. So, sift through the thread dump to find the topics relevant to you.
The quality of the video lectures are very bad; many of the packages referenced in the lectures are outdated, and require you to search for its alternative on your own, which is helpful in the long run, but demands many hours of googling and reading through the documentations.
Overall, I would recommend this course for understanding the skills required in data cleaning.
By laurent h•
Content is fundamental but teaching was under expectations
By Anthony B K•
This was, by far, one of the worst courses I've ever taken. Considering that I have three degrees and completed military PME, I've taken many. The content of the course was significantly out of date. If you're going to teach a course in computer science or on a programming language, you should be updating your lectures at least annually. The websites referenced in the lectures here were either missing or they had changed to the point that the "examples" that were presented were useless. That means that at best, those lectures were a waste of time. If I wanted to spend hours and hours on the web trying to figure out what you meant to do, then I could have bought a book and taught myself this material. Secondly, some languages (including R) are actively being developed; this means that because the lecture material was so dated, the methods presented were (in some cases) obsolete. From a presentation point of view, the lectures were sub-optimal because the slides themselves were just images. Having to re-type long lines of code where you can easily make "fat finger" mistakes isn't helpful; those slides should contain text that can be copied and pasted into either notes or into an R session. I'd also suggest a better microphone or better sound levels, but that's minor compared to the terrible content.
By Neil J•
R is really just the worst, and the instructors do not make it better. The code in this class is unreadable:
- too many one liners, because "it's faster to write", though harder for other people to read
- variables are named cryptic things like spIns or x, rather than names with meaning (eg, sprays.by.insect), again "because it's faster to type"
- way too many cases of "there is more than one way to do it", which just makes things confusing because the other ways tend not to be equivalent
What I'm most concerned about is that I've seen lots of poorly written code in many different languages: Java, C++, C, Python, Perl, and now R. But I've also seen really well-written code in all the languages *but* R, I have yet to see any code in R that is flexible, maintainable, and clear. Which leads me to think that no such code exists, or it's so rare that it doesn't matter. It is clear to me that if I am to do data analysis, then I will need a different set of tools; but because this specialization is taught entirely around R (the lectures are about R, not about higher-level concepts), then this specialization is not useful to me.
By Maria S•
I did not get anything out of this course. This course was pointless because it wasn't really a course just a random scavenger hunt. If I wanted to wander around the Internet aimlessly trying to solve random problems by hacking away, I could have just done that on my own. I signed up for the class because I was looking for a structured way to learn the content and get in some exercises to practice & drill in the skills learned. This course is a waste of time -- if you are interested in learning R, go through some tutorials online. If you want to learn data science principles, try one of the other Data Science specializations. If you want to mimic this class but have more fun, pick some problems that you are interested in, find some data that could help you solve those problems, and try to clean that data.
By jake s•
There is a lot of fluff in this course and at the same time it assumes that you have knowledge and skills that are not covered in this course or in the previous two (e.g. github). I'm really disappointed in the quality of this course--specifically at how vague many of the instructions were in the quiz questions and the final project-- and that most the time when explanations were asked for on the message board the professors just did some hand waving and said that figuring it out was part of the assignment. That isn't teaching (online or otherwise). And if your instructions aren't clear, you aren't doing the job of an instructor when you pass the buck and try to sell it as "part of the learning experience." I hope this fall off in quality isn't reflective of the rest of the courses in the data spec.
By Mariia D•
If you are wondering is it worth paying - the answer is "NO".
Course is badly outdated, lectures are useless and even do not help to complete quizzes. Too much of a real life - information is old or incomplete or wrong and you need to sort out dozens of additional sources looking for an answer.
I suppose that that the reason why we want to learn before going for real tasks is that it is much more productive to go step-by-step, using reliable instruments, and proceed to troubleshooting only with the good knowledge of working solutions.
This is not the case with this cource, here you need to troubleshoot from the very beginning. That is an exercise in frustration and googling, seriously.
I was going to take the entire specialization, but I changed my mind and stop now.
By Yusof A•
Horrible lectures which have not been updated even though the websites that are referenced may have changed and options on those websites for data required for the course may have changed.. for example, there is no way to download as excel using "download.file" a file that is only available as .csv since the excel option was removed from the time these lectures were made. I finished the first 2 courses and had high expectations from this one... started off well but in the middle of week 1, we realize this can be a very frustrating experience////well, the pdfs could have been revised and updated.. but this is probably the same material from 7 years ago with the same websites references from then. Worst course I have encountered on Coursera till date.
By Lindsay E M•
The first two courses in this specialization were good, but the third course, Getting and Cleaning Data, was honestly very disappointing. The lectures are extremely out of date (made in 2013, and it's already June 2020...), and a lot of the code in the lectures and examples no longer works correctly because of this. Beyond that, the "updates" posted by the mentors in the discussion forums are also out of date (2016) and have limited usefulness. This is a course that is meant to teach you how to acquire and clean data in the R program, and methods and technology from 7 years ago are not the standard that I expected - technology constantly changes and updates, and this course should reflect that (but clearly doesn't).
By Ash S•
So much of the material is out of date. As other people in the forums have mentioned, the course doesn't cover the necessary information needed to succeed and is also at a much higher level than listed (course says beginner level, but it's not). I have since switched to a different course and there are so many basic things that were explained that never were in this course. Even after taking a different intro level course, this course is still too difficult for me. This should at least be listed in the information so that people don't waste their time and money.
By Mohamed A•
It was difficult to understand and no enough exercises, in addition, the questions are very hard to answer and need a lot of digging to find the correct way to answer, personally, I am not happy with this course and not intend to continue the DS program because materials are old and not reflect the exam questions.
By Christian B•
No idea what they want for the project and the discussion forum is clogged with people asking for peer reviews. The previous courses at least provided you with a understanding of what the final product should be, in this case it's make tidy data, but with no idea on how that data should look.
By Najib B•
Online course design at its worst.
By Ramalakshmanan S P•
Thanks for this wonderful session on Getting and Cleaning Data. I would like to convey my sincere thanks to Professors Roger D. Pend, Brian D. Caffo and Jeff Leek and my fellow learners for their excellent help in completing the projet to generate Tidy dataset. I would like to name Mr. Luis Sandino for his help and effort in putting a help Guide for this assignment. I follwed it and got the assignment completed. The step by step procedure helped me and other fellow learnerrs to complete the assignment on time.
Though this course is over, still we have the doubt on the dimension of the tidy dataset, whether it is 180 by 68 or 180 by 88 as the total number of "mean' variables considered are varying. Request mentors or TAs to help us arrive at the correct dimension and help us understand the reason behind the same.
This course has witnessed the need for support from TAs and mentors. Their help and support was very valuable in understanding the subject.
Thanks to Coursera, my Professors, mentors and TAs of this course for their insight, guidance, support and effort.
Wishing Coursera and Professors all the best and Success.
The SWIRL component for learning the subject is the best and wish SWIRL support for all the heavy courses. Special thanks to those who made SWIRL course material possible for Data Scientisit's toolbox.
With Best Wishes,
By Carlos C•
Excellent course to build upon the knowledge from the "R Programming" course. Learning to use functions from the Tidyverse packages is an essential tool if you want to learn Data Science in R. In my opinion, most of the time these are stronger and easier to use compared to Pandas, Numpy, etc., from Python. Despite the bad reviews at the top with lots of upvotes, I do think this was a great course overall. People tend to complain and don't assume responsibility to work and find solutions if they don't understand something. My humble advice is that, if you wish to immerse in the Data Science field, you should accustom yourself to researching a lot, going to other forums like StackOverflow if an error appears, etc. Thanks to Jeff Leek, Roger Peng and the others from Johns Hopkins University!
By Pouria T•
Thank you for giving me opportunity to learn. These material (or this class) would have been super difficult, if it was taught through the same traditional channels based on my academical experiences. Yet, the materials were presented in such an amazing way that I wasn't taken over by the difficulty of the presented subjects, rather I was getting more focused to learn more and to be challenged. Thank you for letting me get 3 free online certificates. It means a lot to me and it has given me hope through this difficult time. I feel accomplished. It's a great feeling and it the best and the only gift that I have received and would probably receive this holiday.
By Alfonso R R•
I learned so much of R with this course. Thanks Johns Hopkins. Thanks Coursera.
The course final project was so challenging that made research R tools I did know they existed. Such as generating MD files from RMD markdown notebooks, so I could mix live code with text. That's how I produced my CodeBook.md. Then I learned that there are a bunch of libraries for pretty-printing tables. I discovered even more about dplyr. And also learned how to return multiple objects from a function.
You can really write papers with all these tools in R and getting expertise about knitr and pandoc.
Thank you Jeff and team for putting together such a quality course.