Preparing video…

Exploratory Data Analysis

Part of the Data Science Specialization »

Learn the essential exploratory techniques for summarizing data. This is the fourth course in the Johns Hopkins Data Science Specialization.


Eligible for

Data Science Specialization
Course Certificate

Course at a Glance

About the Course

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

Course Syllabus

After successfully completing this course you will be able to make visual representations of data using the base, lattice, and ggplot2 plotting systems in R, apply basic principles of data graphics to create rich analytic graphics from different types of datasets, construct exploratory summaries of data in support of a specific question, and create visualizations of multidimensional data using exploratory multivariate statistical techniques.

Suggested Readings

The e-book Exploratory Data Analysis with R covers all of the material presented in this course. It is available for download from Leanpub.

Course Format

There will be weekly video lectures, quizzes, and peer assessments.

As part of this class you will be required to set up a GitHub account. GitHub is a tool for collaborative code sharing and editing. During this course and other courses in the Specialization you will be submitting links to files you publicly place in your GitHub account as part of peer evaluation. If you are concerned about preserving your anonymity you will need to set up an anonymous GitHub account and be careful not to include any information you do not want made available to peer evaluators.


How do the courses in the Data Science Specialization depend on each other?
We have created a handy course dependency chart to help you see how the nine courses in the specialization depend on each other.

Will I get a Statement of Accomplishment after completing this class?

Free statements of accomplishment are not offered in this course. If you are not enrolled in Signature Track, participation and performance documentation will be reported on your Accomplishments page, but you will not receive a signed statement of accomplishment.

What resources will I need for this class?
Students must have the latest version of R and RStudio installed.

How does this course fit into the Data Science Specialization?

This is the fourth course in the track. We recommend that you first take The Data Scientist's Toolbox and R Programming.