Johns Hopkins University

Modeling Data in the Tidyverse

Shannon Ellis, PhD
Stephanie Hicks, PhD
Roger D. Peng, PhD

Instructors: Shannon Ellis, PhD

Access provided by Rio Tinto

1,575 already enrolled

Gain insight into a topic and learn the fundamentals.
2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Describe different types of data analytic questions

  • Conduct hypothesis tests of your data

  • Apply linear modeling techniques to answer multivariable questions

  • Apply machine learning workflows to detect complex patterns in your data

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

8 assignments¹

AI Graded see disclaimer
Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Tidyverse Skills for Data Science in R Specialization
When you enroll in this course, you'll also be enrolled in this Specialization.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 11 modules in this course

Developing insights about your organization, business, or research project depends on effective modeling and analysis of the data you collect. Building effective models requires understanding the different types of questions you can ask and how to map those questions to your data. Different modeling approaches can be chosen to detect interesting patterns in the data and identify hidden relationships.

What's included

16 readings1 assignment

Inferential Analysis is what analysts carry out after they’ve described and explored their dataset. After understanding your dataset better, analysts often try to infer something from the data. This is done using statistical tests. We discussed a bit about how we can use models to perform inference and prediction analyses. What does this mean?

What's included

3 readings1 assignment

Linear models are the most commonly used models in data analysis because of their computational efficiency and their ease of interpretation. Having a solid understanding of linear models and how they work is critical for any work in data science. The tidyverse provides a set of tools for making linear modeling more efficient and streamlined.

What's included

12 readings1 assignment

Multiple linear regression is needed when you want to include confounding factors or other predictors in your model for the response. R provides a straightforward way to do this via the formula interface to the lm() function.

What's included

1 reading1 assignment

While we’ve focused on linear regression in this lesson on inference, linear regression isn’t the only analytical approach out there. However, it is arguably the most commonly used. And, beyond that, there are many statistical tests and approaches that are slight variations on linear regression, so having a solid foundation and understanding of linear regression makes understanding these other tests and approaches much simpler. For example, what if you didn’t want to measure the linear relationship between two variables, but instead wanted to know whether or not the average observed is different from expectation?

What's included

3 readings

Hypothesis testing describes a family of statistical techniques for determining whether the data you collect provides evidence for the value of an unknown parameter of interest. The goal of hypothesis tests is to make inferences while accounting for variability in the data that can lead to spurious results.

What's included

3 readings1 assignment1 plugin

Prediction modeling is an essential activity in data science and involves building systems for making predictions based on previously observed data. These models are typically very flexible and can capture a range of different relationships.

What's included

12 readings1 assignment

There are incredibly helpful packages available in R thanks to the work of RStudio. As mentioned above, there are hundreds of different machine learning algorithms. The tidymodels R packages have put many of them into a single framework, allowing you to use many different machine learning models easily.

What's included

5 readings1 assignment

This case study will demonstrate an approach to building a prediction model for predicting outdoor air pollution concentrations in the United States.

What's included

17 readings1 ungraded lab

The tidymodels collection of packages can be overwhelming at first glance. Here, we provide a quick summary chart to help navigate all of the packages and when they should be used.

What's included

1 reading

In this project, you will practice building models with the tidyverse for classifying consumer complaints data from the Consumer Financial Protection Bureau (CFPB). This project includes both a Peer Review step in which you'll upload R Markdown and knitted HTML files AND a Quiz step in which you'll answer questions about the predictions made by your classification algorithm.

What's included

1 reading1 assignment1 peer review

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Shannon Ellis, PhD
Johns Hopkins University
5 Courses6,767 learners
Stephanie Hicks, PhD
Johns Hopkins University
5 Courses6,767 learners
Roger D. Peng, PhD
Johns Hopkins University
37 Courses1,662,358 learners

Offered by

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Explore more from Data Science

¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.