Johns Hopkins University

Introduction to the Tidyverse

Shannon Ellis, PhD
Stephanie Hicks, PhD
Roger D. Peng, PhD

Instructors: Shannon Ellis, PhD

Access provided by Ecole Supérieure des Industries du Textile et de l'Habillement

5,121 already enrolled

Gain insight into a topic and learn the fundamentals.
4.4

(53 reviews)

Beginner level

Recommended experience

7 hours to complete
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
4.4

(53 reviews)

Beginner level

Recommended experience

7 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • Distinguish between tidy and non-tidy data

  • Describe how non-tidy data can be transformed into tidy data

  • Describe the Tidyverse ecosystem of packages

  • Organize and initialize a data science project

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments¹

AI Graded see disclaimer
Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Tidyverse Skills for Data Science in R Specialization
When you enroll in this course, you'll also be enrolled in this Specialization.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 6 modules in this course

Before we can discuss all the ways in which R makes it easy to work with tidy data, we have to first be sure we know what tidy data are. Tidy datasets, by design, are easier to manipulate, model, and visualize because the tidy data principles that we’ll discuss in this course impose a general framework and a consistent set of rules on data. In fact, a well-known quote from Hadley Wickham is that “tidy datasets are all alike but every messy dataset is messy in its own way.” Utilizing a consistent tidy data format allows for tools to be built that work well within this framework, ultimately simplifying the data wrangling, visualization, and analysis processes. By starting with data that are already in a tidy format or by spending the time at the beginning of a project to get data into a tidy format, the remaining steps of your data science project will be easier.

What's included

6 readings2 assignments

The reason it’s important to discuss what tidy data are an what they look like is because out in the world, most data are untidy. If you are not the one entering the data but are instead handed the data from someone else to do a project, more often than not, those data will be untidy. Untidy data are often referred to simply as messy data. In order to work with these data easily, you’ll have to get them into a tidy data format. This means you’ll have to fully recognize untidy data and understand how to get data into a tidy format. The following common problems seen in messy datasets again come from Hadley Wickham’s paper on tidy data (http://vita.had.co.nz/papers/tidy-data.pdf). After briefly reviewing what each common problem is, we will then take a look at a few messy datasets. We’ll finally touch on the concepts of tidying untidy data, but we won’t actually do any practice yet. That’s coming soon!

What's included

3 readings1 assignment

With a solid understanding of tidy data and how tidy data fit into the data science life cycle, we’ll take a bit of time to introduce you to the tidyverse and tidyverse-adjacent packages that we’ll be teaching and using throughout this specialization. Taken together, these packages make up what we’re referring to as the tidyverse ecosystem. The purpose for the rest of this course is not for you to understand how to use each of these packages (that’s coming soon!), but rather to help you familiarize yourself with which packages fit into which part of the data science life cycle. Note that the official tidyverse packages below are bold. All other packages are tidyverse-adjacent, meaning they follow the same conventions as the official tidyverse packages and work well within the tidy framework and structure of data analysis.

What's included

5 readings

Data science projects vary quite a lot so it can be difficult to give universal rules for how they should be organized. However, there are a few ways to organize projects that are commonly useful. In particular, almost all projects have to deal with files of various sorts—data files, code files, output files, etc. This section talks about how files work and how projects can be organized and customized.

What's included

6 readings2 assignments

Throughout this specialization, we’re going to make use of a number of case studies from Open Case Studies to demonstrate the concepts introduced in the course. We’ll generally make use of the same case studies throughout the specialization, providing continuity to allow you to focus on the concepts and skills being taught (rather than the context) while working with interesting data. These case studies aim to address a public-health question and all of them use real data.

What's included

2 readings2 ungraded labs

This project will allow you to create a new project and organize the files that will be needed to engage in a future data analysis

What's included

1 peer review

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings
4.3 (17 ratings)
Shannon Ellis, PhD
Johns Hopkins University
5 Courses6,767 learners
Stephanie Hicks, PhD
Johns Hopkins University
5 Courses6,767 learners
Roger D. Peng, PhD
Johns Hopkins University
37 Courses1,662,358 learners

Offered by

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

4.4

53 reviews

  • 5 stars

    73.58%

  • 4 stars

    13.20%

  • 3 stars

    3.77%

  • 2 stars

    1.88%

  • 1 star

    7.54%

Showing 3 of 53

DI
5

Reviewed on Apr 17, 2024

DM
5

Reviewed on Oct 30, 2022

SM
5

Reviewed on Oct 1, 2021

Explore more from Data Science

¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.