In the final course of the statistical modeling for data science program, learners will study a broad set of more advanced statistical modeling tools. Such tools will include generalized linear models (GLMs), which will provide an introduction to classification (through logistic regression); nonparametric modeling, including kernel estimators, smoothing splines; and semi-parametric generalized additive models (GAMs). Emphasis will be placed on a firm conceptual understanding of these tools. Attention will also be given to ethical issues raised by using complicated statistical models.
Calculus, linear algebra, and probability theory.
University of Colorado Boulder
CU-Boulder is a dynamic community of scholars and learners on one of the most spectacular college campuses in the country. As one of 34 U.S. public institutions in the prestigious Association of American Universities (AAU), we have a proud tradition of academic excellence, with five Nobel laureates and more than 50 members of prestigious academic academies.
Syllabus - What you will learn from this course
An Introduction to Generalized Linear Models Through Binomial Regression
In this module, we will introduce generalized linear models (GLMs) through the study of binomial data. In particular, we will motivate the need for GLMs; introduce the binomial regression model, including the most common binomial link functions; correctly interpret the binomial regression model; and consider various methods for assessing the fit and predictive power of the binomial regression model.
Models for Count Data
In this module, we will consider how to model count data. When the response variable is a count of some phenomenon, and when that count is thought to depend on a set of predictors, we can use Poisson regression as a model. We will describe the Poisson regression in some detail and use Poisson regression on real data. Then, we will describe situations in which Poisson regression is not appropriate, and briefly present solutions to those situations.
Introduction to Nonparametric Regression
In this module, we will introduce the concept of a nonparametric regression model. We will contrast this notion with the parametric models that we have studied so far. Then, we’ll study particular nonparametric regression models: kernel estimators and splines. Finally, we will introduce additive models as a blending of parametric and nonparametric methods.
Introduction to Generalized Additive Models
Some models, such as linear regression, are easily interpretable, but inflexible, in that they don't capture many real-world relationships accurately. Other models, such as neural networks, are quite flexible, but very difficult to interpret. Generalized additive models (GAMs) are a nice balance between flexibility and interpretability. In this module, we will further motivate GAMs, learn the basic mathematics of fitting GAMs, and implementing them on simulated and real data in R.
Statistical modeling lies at the heart of data science. Well crafted statistical models allow data scientists to draw conclusions about the world from the limited information present in their data. In this three credit sequence, learners will add some intermediate and advanced statistical modeling techniques to their data science toolkit. In particular, learners will become proficient in the theory and application of linear regression analysis; ANOVA and experimental design; and generalized linear and additive models. Emphasis will be placed on analyzing real data using the R programming language.
