When you enroll in this course, you'll also be enrolled in this Specialization.
Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate
There are 11 modules in this course
Developing insights about your organization, business, or research project depends on effective modeling and analysis of the data you collect. Building effective models requires understanding the different types of questions you can ask and how to map those questions to your data. Different modeling approaches can be chosen to detect interesting patterns in the data and identify hidden relationships.
This course covers the types of questions you can ask of data and the various modeling approaches that you can apply. Topics covered include hypothesis testing, linear regression, nonlinear modeling, and machine learning. With this collection of tools at your disposal, as well as the techniques learned in the other courses in this specialization, you will be able to make key discoveries from your data for improving decision-making throughout your organization.
In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
Developing insights about your organization, business, or research project depends on effective modeling and analysis of the data you collect. Building effective models requires understanding the different types of questions you can ask and how to map those questions to your data. Different modeling approaches can be chosen to detect interesting patterns in the data and identify hidden relationships.
What's included
16 readings1 assignment
Show info about module content
16 readings•Total 195 minutes
Course Textbook•10 minutes
The Purpose of Data Science•5 minutes
Types of Data Science Questions•10 minutes
Data Needs•5 minutes
Number of observations is too small•5 minutes
Dataset does not contain the exact variables you are looking for•10 minutes
Variables in the dataset are not collected in the same year•5 minutes
Dataset is not representative of the population that you are interested in•10 minutes
Some variables in the dataset are measured with error•5 minutes
Variables are confounded•10 minutes
Descriptive and Exploratory Data Analysis•15 minutes
Missing Values•10 minutes
Shape•25 minutes
Identifying Outliers•20 minutes
Evaluating Variables•20 minutes
Evaluating Relationships•30 minutes
1 assignment•Total 30 minutes
Modeling Data Basics Quiz•30 minutes
Inference
Module 2•1 hour to complete
Module details
Inferential Analysis is what analysts carry out after they’ve described and explored their dataset. After understanding your dataset better, analysts often try to infer something from the data. This is done using statistical tests. We discussed a bit about how we can use models to perform inference and prediction analyses. What does this mean?
What's included
3 readings1 assignment
Show info about module content
3 readings•Total 30 minutes
Inference•10 minutes
Uncertainty•10 minutes
Random Sampling•10 minutes
1 assignment•Total 30 minutes
Inference Quiz•30 minutes
Linear Modeling
Module 3•2 hours to complete
Module details
Linear models are the most commonly used models in data analysis because of their computational efficiency and their ease of interpretation. Having a solid understanding of linear models and how they work is critical for any work in data science. The tidyverse provides a set of tools for making linear modeling more efficient and streamlined.
What's included
12 readings1 assignment
Show info about module content
12 readings•Total 119 minutes
Linear Regression•15 minutes
Assumptions•20 minutes
Association•15 minutes
Association Testing in R•10 minutes
Fitting the Model•2 minutes
Model Diagnostics•10 minutes
Tree Girth and Height Example•10 minutes
Interpreting the Model•10 minutes
Variance Explained•5 minutes
Using broom•5 minutes
Correlation Is Not Causation•7 minutes
Confounding•10 minutes
1 assignment•Total 30 minutes
Linear Regression Quiz•30 minutes
Multiple Linear Regression
Module 4•1 hour to complete
Module details
Multiple linear regression is needed when you want to include confounding factors or other predictors in your model for the response. R provides a straightforward way to do this via the formula interface to the lm() function.
What's included
1 reading1 assignment
Show info about module content
1 reading•Total 15 minutes
Multiple Linear Regression•15 minutes
1 assignment•Total 30 minutes
Multiple Linear Regression Quiz•30 minutes
Beyond Linear Regression
Module 5•23 minutes to complete
Module details
While we’ve focused on linear regression in this lesson on inference, linear regression isn’t the only analytical approach out there. However, it is arguably the most commonly used. And, beyond that, there are many statistical tests and approaches that are slight variations on linear regression, so having a solid foundation and understanding of linear regression makes understanding these other tests and approaches much simpler. For example, what if you didn’t want to measure the linear relationship between two variables, but instead wanted to know whether or not the average observed is different from expectation?
What's included
3 readings
Show info about module content
3 readings•Total 23 minutes
Beyond Linear Regression•3 minutes
Mean Different From Expectation?•5 minutes
Testing Mean Difference From Expectation in R•15 minutes
Hypothesis Testing
Module 6•1 hour to complete
Module details
Hypothesis testing describes a family of statistical techniques for determining whether the data you collect provides evidence for the value of an unknown parameter of interest. The goal of hypothesis tests is to make inferences while accounting for variability in the data that can lead to spurious results.
What's included
3 readings1 assignment1 plugin
Show info about module content
3 readings•Total 27 minutes
More Statistical Tests•2 minutes
Hypothesis Testing•10 minutes
The infer Package•15 minutes
1 assignment•Total 30 minutes
Hypothesis Testing Quiz•30 minutes
1 plugin•Total 15 minutes
Common statistical tests are linear models (or: how to teach stats)•15 minutes
Prediction Modeling
Module 7•3 hours to complete
Module details
Prediction modeling is an essential activity in data science and involves building systems for making predictions based on previously observed data. These models are typically very flexible and can capture a range of different relationships.
What's included
12 readings1 assignment
Show info about module content
12 readings•Total 133 minutes
Prediction Modeling•10 minutes
What is Machine Learning?•10 minutes
Machine Learning Steps•10 minutes
Data Splitting•10 minutes
Train, Test, Validate•10 minutes
Train•3 minutes
Test•5 minutes
Validate•10 minutes
Variable Selection•15 minutes
Model Selection•5 minutes
Regression vs. Classification•30 minutes
Model Accuracy•15 minutes
1 assignment•Total 30 minutes
Prediction and Machine Learning Quiz•30 minutes
The tidymodels Ecosystem
Module 8•2 hours to complete
Module details
There are incredibly helpful packages available in R thanks to the work of RStudio. As mentioned above, there are hundreds of different machine learning algorithms. The tidymodels R packages have put many of them into a single framework, allowing you to use many different machine learning models easily.
What's included
5 readings1 assignment
Show info about module content
5 readings•Total 100 minutes
The tidymodels Ecosystem•5 minutes
Benefits of tidymodels•5 minutes
Packages of tidymodels•15 minutes
Example of Continuous Variable Prediction•45 minutes
Example of Categorical Variable Prediction•30 minutes
1 assignment•Total 30 minutes
tidymodels Quiz•30 minutes
Case Studies
Module 9•5 hours to complete
Module details
This case study will demonstrate an approach to building a prediction model for predicting outdoor air pollution concentrations in the United States.
What's included
17 readings1 ungraded lab
Show info about module content
17 readings•Total 305 minutes
Case Study #1: Predicting Annual Air Pollution•5 minutes
The Data•5 minutes
Data Import•5 minutes
Data Exploration and Wrangling•20 minutes
Evaluate Correlation•15 minutes
Splitting the Data•10 minutes
Making a Recipe•30 minutes
Running Preprocessing•30 minutes
Specifying the Model•20 minutes
Assessing the Model Fit•15 minutes
Model Performance: Getting Predicted Values•15 minutes
Visualizing Model Performance•5 minutes
Quantifying Model Performance•10 minutes
Assessing Model Performance on v -folds Using tune•30 minutes
Random Forest•30 minutes
Model Tuning•30 minutes
Final model performance evaluation•30 minutes
1 ungraded lab•Total 5 minutes
Case Study #1: Predicting Annual Air Pollution•5 minutes
Summary of tidymodels
Module 10•5 minutes to complete
Module details
The tidymodels collection of packages can be overwhelming at first glance. Here, we provide a quick summary chart to help navigate all of the packages and when they should be used.
What's included
1 reading
Show info about module content
1 reading•Total 5 minutes
Summary of tidymodels•5 minutes
Project: Modeling Data in the Tidyverse
Module 11•2 hours to complete
Module details
In this project, you will practice building models with the tidyverse for classifying consumer complaints data from the Consumer Financial Protection Bureau (CFPB). This project includes both a Peer Review step in which you'll upload R Markdown and knitted HTML files AND a Quiz step in which you'll answer questions about the predictions made by your classification algorithm.
What's included
1 reading1 assignment1 peer review
Show info about module content
1 reading•Total 10 minutes
Important information before you start the quiz•10 minutes
1 assignment•Total 30 minutes
Course Project Prediction Quiz•30 minutes
1 peer review•Total 60 minutes
Modeling Data in the Tidyverse Course Project•60 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
The mission of The Johns Hopkins University is to educate its students and cultivate their capacity for life-long learning, to foster independent and original research, and to bring the benefits of discovery to the world.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.