When you enroll in this course, you'll also be asked to select a specific program.
Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate
There are 6 modules in this course
In this capstone course, you will apply various data science skills and techniques that you have learned as part of the previous courses in the IBM Data Science with R Specialization or IBM Data Analytics with Excel and R Professional Certificate.
For this project, you will assume the role of a Data Scientist who has recently joined an organization and be presented with a challenge that requires data collection, analysis, basic hypothesis testing, visualization, and modeling to be performed on real-world datasets. You will collect and understand data from multiple sources, conduct data wrangling and preparation with Tidyverse, perform exploratory data analysis with SQL, Tidyverse and ggplot2, model data with linear regression, create charts and plots to visualize the data, and build an interactive dashboard.
The project will culminate with a presentation of your data analysis report, with an executive summary for the various stakeholders in the organization.
In this module, you will be introduced to the capstone project scenario and the real-world problem you will solve throughout this course. You will begin applying the data acquisition techniques learned in earlier courses to collect project data from multiple sources. You will gather data using web scraping methods to extract information from HTML pages and use API requests to retrieve external data such as weather information. The collected datasets will be organized into structured formats, preparing them for further analysis in the subsequent stages of the project.
What's included
2 videos1 reading1 assignment3 app items5 plugins
Show info about module content
2 videos•Total 4 minutes
Introduction to Data Science with R Capstone Project•2 minutes
Weather and Bike-Sharing Demand Data Collection•2 minutes
1 reading•Total 2 minutes
Course Overview•2 minutes
1 assignment•Total 6 minutes
Checkpoints•6 minutes
3 app items•Total 180 minutes
(Optional) Obtain an IBM Cloud Feature Code•60 minutes
Hands-on Lab: Complete the Data Collection with Web Scraping Notebook•60 minutes
Hands-on Lab: Complete the Data Collection with OpenWeather API Notebook•60 minutes
5 plugins•Total 75 minutes
Capstone Overview•15 minutes
(Optional) Hands on Lab: Creating an IBM Cloud Account•15 minutes
Data Collection Overview•15 minutes
(Optional) Hands-on Lab: Complete the Data Collection with Web Scraping Notebook•15 minutes
(Optional) Hands-on Lab: Complete the Data Collection with OpenWeather API Notebook•15 minutes
Module 2 - Data Wrangling
Module 2•4 hours to complete
Module details
In this module, you will apply data wrangling techniques learned in previous courses to clean and prepare the collected datasets for analysis. Working with the data gathered in Module 1, you will transform raw data into a structured and analysis-ready format. You will clean text data, standardize variables, handle missing values, and perform data transformations such as encoding and normalization. By the end of this module, you will have prepared a reliable dataset that supports meaningful exploration and modeling in later stages of the project.
What's included
1 video1 assignment2 app items3 plugins
Show info about module content
1 video•Total 3 minutes
Data Wrangling•3 minutes
1 assignment•Total 12 minutes
Checkpoints•12 minutes
2 app items•Total 120 minutes
Hands-on Lab: Complete Data Wrangling with Regular Expressions Notebook•60 minutes
Hands-on lab: Complete Data wrangling with dplyr Notebook•60 minutes
3 plugins•Total 125 minutes
Data Wrangling Overview•5 minutes
(Optional) Hands-on Lab: Complete Data Wrangling with Regular Expressions Notebook•60 minutes
(Optional) Hands-on lab: Complete Data wrangling with dplyr Notebook •60 minutes
Module 3: Performing Exploratory Data Analysis with SQL, Tidyverse & ggplot2
Module 3•4 hours to complete
Module details
At this stage of the capstone project, you will apply the data collection and data wrangling skills developed in the previous modules, along with your prior experience in SQL querying and data visualization. This module focuses on performing Exploratory Data Analysis (EDA) to better understand the patterns, relationships, and trends within the prepared datasets.
You will work with the datasets generated in earlier modules to explore key variables, identify meaningful insights, and prepare the data for predictive modeling. If you encountered challenges in earlier steps, prepared datasets are available to help you continue progressing through the project. In this module, you will complete a series of hands-on labs that guide you through the essential stages of exploratory analysis.
What's included
1 video1 assignment3 app items3 plugins
Show info about module content
1 video•Total 2 minutes
Exploratory Data Analysis•2 minutes
1 assignment•Total 12 minutes
Checkpoints•12 minutes
3 app items•Total 180 minutes
Hands-on Lab: Complete the EDA with SQL lab using RSQLite•60 minutes
(Optional) Hands-on Lab: Complete the EDA with SQL lab using RODBC with IBM DB2•60 minutes
Hands-on Lab: Complete the EDA with Data Visualization Lab•60 minutes
3 plugins•Total 45 minutes
(Optional) Hands-on Lab: Load Data into Db2 on IBM Cloud•15 minutes
(Optional) Hands-on Lab: Complete the EDA with SQL lab on IBM Watson Studio•15 minutes
(Optional) Hands-on Lab: Complete the EDA with Data Visualization Lab•15 minutes
Module 4: Predictive Analysis
Module 4•4 hours to complete
Module details
In this module, you will apply regression modeling techniques to build predictive models for bike-sharing demand using the prepared dataset. Drawing on modeling concepts learned earlier, you will construct and refine multiple regression models to improve prediction accuracy. You will evaluate model performance using appropriate statistical metrics and interpret the contribution of different predictor variables. This stage represents the transition from data exploration to predictive analysis within your capstone workflow.
What's included
1 video1 assignment2 app items2 plugins
Show info about module content
1 video•Total 2 minutes
Regression Models•2 minutes
1 assignment•Total 12 minutes
Checkpoints•12 minutes
2 app items•Total 120 minutes
Hands-on Lab: Complete the Building a Baseline Regression Model Lab•60 minutes
Hands-on Lab: Complete the Improving the Linear Model lab•60 minutes
2 plugins•Total 135 minutes
Reading: Predict Bike-Sharing Demand Using Regression Models•15 minutes
(Optional) Hands-on Lab: Complete the Improving the Linear Model lab•120 minutes
Module 5 - Building a R Shiny Dashboard App
Module 5•4 hours to complete
Module details
In this module, you will apply your data visualization and application development skills to create an interactive dashboard that presents the results of your predictive analysis. Using R Shiny and visualization tools, you will design a dashboard that enables users to explore predicted bike-sharing demand across locations. This module focuses on transforming analytical results into interactive visual tools that support data-driven decision-making.
(Optional) Lab: Getting started with Posit Cloud•15 minutes
(Optional) Hands-on Lab (Part A): Build a bike-sharing demand prediction app with R Shiny and Leaflet (using Posit Cloud)•90 minutes
(Optional) Hands-on Lab (Part B): Enhance the Bike-Sharing Demand Prediction App with City Details Plots (using Posit Cloud)•90 minutes
Module 6 - Present Your Data-Driven Insights
Module 6•3 hours to complete
Module details
In this final module, you will consolidate the results of your capstone project into a professional presentation that communicates your workflow, analysis, insights, and predictive results. You will prepare a structured presentation that highlights the project problem, methodology, key findings, and conclusions. This module represents the culmination of your learning journey, where you demonstrate your ability to apply data science skills to solve a real-world problem and communicate your results effectively.
At IBM, we know how rapidly tech evolves and recognize the crucial need for businesses and professionals to build job-ready, hands-on skills quickly. As a market-leading tech innovator, we’re committed to helping you thrive in this dynamic landscape. Through IBM Skills Network, our expertly designed training programs in AI, software development, cybersecurity, data science, business management, and more, provide the essential skills you need to secure your first job, advance your career, or drive business success. Whether you’re upskilling yourself or your team, our courses, Specializations, and Professional Certificates build the technical expertise that ensures you, and your organization, excel in a competitive world.
What specific R packages and ecosystems are utilized in this capstone?
This project requires you to master the premier libraries for data manipulation and visualization in the R programming language. You will use the Tidyverse ecosystem extensively for programmatic data wrangling, data transforming, and text cleaning. For statistical exploration and data visualization, you will harness ggplot2 to build custom charts and curves, alongside SQL to query and slice your structured datasets before feeding them into your predictive models.
How does this project handle complex spatial and environmental variables for predictive modeling?
The core objective of this capstone is constructing an advanced predictive model for bike-sharing demand. You will programmatically gather and merge data from disparate sources,
utilizing web scraping for HTML extraction and executing REST API requests to pull in live environmental variables like weather conditions. From there, you will apply feature engineering—such as normalization, binning, and categorical encoding—to build and refine a linear regression model capable of forecasting demand trends with high statistical accuracy.
What kind of interactive dashboard will I build to display my analytics?
You will transition your backend statistical models into a fully functioning web application by building an interactive R Shiny dashboard. To visualize location-based demand dynamically, you will integrate a Leaflet map layer into your application, allowing users to explore real-world spatial data plots in real time. The project culminates in delivering an executive-level data analysis report, proving to organizational stakeholders that you can translate complex R scripts into production-ready business intelligence tools.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Certificate?
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.