Learn what generalized linear models are, how to design your own, and how you can choose the right type of model for your data.
Generalized linear models make it possible to analyze complex, real-world data by extending the capabilities of traditional linear regression.
Researchers are increasingly using generalized linear models to handle complex data, with reports showing that 86.4 percent of psychology articles using generalized linear mixed models were published in top journals between 2014 and 2018 [1].
Generalized linear models can handle a wide range of data types, including binary, ordinal, count, longitudinal, and non-normal data distributions.
You can specify the random component, systematic component, and link function to tailor your model to your use case.
Learning how to use generalized linear models enables you to analyze a broader range of data types, allowing you to explore more complex research questions. If you’re ready to start exploring more data analytics techniques, you can learn key analytical skills and tools with the IBM Data Analyst Professional Certificate. With this 11-course series, you can gain technical experience through hands-on labs and projects and build a portfolio to showcase your work.
When you work with a generalized linear model (GLM), you have three core building blocks: random components, systematic components, and the link function. Understanding each can help you design models that adequately represent your variable relationships.
The random component describes the outcome data you want to model. In statistical terms, this is the probability distribution that your outcome variable follows. By correctly specifying your random component, you ensure your regression model matches the type of outcome data you’re analyzing.
For example, if you have a linear regression, you would assume the outcome follows a normal distribution, which looks like a bell curve. If you have a logistic regression, you would assume the outcome is binary (yes/no, success/failure), meaning it follows a binomial distribution. For a Poisson regression, you would assume the outcome is a count (such as the number of customer complaints per week) and follows a Poisson distribution. Each of these data distributions behaves differently, meaning correctly identifying your random component is an important step in ensuring your model is accurate.
Learn more: Linear Regression vs. Logistic Regression: What You Need to Know
The systemic component of your model combines your explanatory variables (predictors) linearly, meaning each predictor has a corresponding weight (coefficient) that indicates its contribution. This is often referred to as a “linear predictor.”
Your systematic component will look different depending on what you’re studying. When examining a disease outcome, you may consider predictors such as stage of disease, age, family history, and access to medical care. For math test outcomes, you might have predictors such as study hours, previous grades, and attendance.
The link function in your model is the bridge between the random and systematic components. It transforms your outcome so it can accurately connect to the linear predictors. Depending on your outcome type, you’ll choose different link functions:
Identity link: Used in linear regressions where your outcome is continuous
Logit link: Used in logistic regression to transform your outcome into odds
Log link: Used in Poisson regressions to connect count outcomes to your predictors
The link function is what makes your GLM “generalized,” extending the basic idea of regression to different variable types and relationships.
When you choose to use a GLM, you can select from a family of models that allow you to work with different types of outcomes. You’ll pick the GLM that best fits the nature of your outcome variable, such as continuous, binary, or counts of events. Some of the most common GLMs you may use in practice include the following.
A simple linear regression is the simplest form of a GLM. With this model, you’re looking at how one or more predictors relate to a continuous outcome variable. For example, you might look at how study hours relate to scores on an exam, or how days of sunshine relate to centimeters of plant growth.
When designing your simple linear regression model, your three components would be as follows.
Random component: Your outcome variable has a normal distribution with a mean and constant variance.
Systematic component: Your predictor variable is linear (for a multiple linear regression, you would have multiple predictor variables combined into a linear predictor).
Link function: You would use an identity link function.
If you have a binary outcome, such as true/false or yes/no, you can use a binary logistic regression to model the odds of “success.” For example, you could assess the odds of answering a certain test answer correctly based on study hours, or the odds of a flower blooming based on the number of days with sunlight.
When designing your binary logistic regression model, your three components would be as follows.
Random component: Your outcome variable has a binomial distribution.
Systematic component: Your predictor variable(s) combine to create a linear predictor.
Link function: You would use a logit link function.
If you have count data, which is the number of times something happens in a certain interval, you can use a Poisson regression. For example, you could model the number of students who scored an A on a test based on in-class instruction hours, or the number of plants that sprouted the first month based on sunlight hours.
When designing your Poisson regression model, your three components would be as follows.
Random component: Your outcome variable has a Poisson distribution.
Systematic component: Your predictor variable(s) combine to create a linear predictor.
Link function: You would use a log link function.
The primary purpose of using a GLM is to model variable relationships in situations where traditional linear regression doesn’t apply. Because real-world data isn’t always normally distributed, learning to use GLMs allows you to model outcomes accurately and make predictions that can inform decision-making.
Common data types that benefit from a GLM include binary data (e.g., yes/no data), clustered data (e.g., students in schools or patients in hospitals), and longitudinal data (e.g., repeated measures such as patient data over time). GLMs allow you, regardless of your field, to develop statistical models that accommodate a wider range of data types and distributions.
You can model a GLM in R using the glm() function. To use this function, you’ll create your model by specifying the formula, referencing your data set, and detailing the family (family = gaussian() for linear regression, family = binomial() for logistic regression). Once you do this, you can run your code to output the results from your model.
Professionals in data-driven fields often use inferential approaches, such as GLMs, to guide decision-making. Their use has grown rapidly across industries. In psychology, for example, a review of studies from 2014 to 2018 found increasing adoption of generalized linear mixed models (an extension of GLMs), with 86.4 percent published in top-quartile journals [1].
Careers that involve statistical modeling often use GLMs to analyze relationships and make predictions. Common titles you might find in this area include:
Actuary
Economist
While the job titles may differ, the underlying modeling framework remains similar. One of the key differences is in what kind of data is being analyzed and for what purpose. For example:
If you work in health care, you might use GLMs to predict disease trajectory based on patient symptoms or predict the cost of disease state based on patient-level data.
If you work in insurance, you might use GLMs to assess risk based on business characteristics or customer retention based on demographic or behavioral variables.
If you work in real estate, you might use GLMs to predict housing prices based on market movements and area data.
If you work in public health, you might use GLMs to create causal models that assess what environmental factors most predict health outcomes.
Like any statistical tool, GLMs have strengths and limitations that make them best suited for certain applications. Understanding the types of data and use cases that best fit this model can help you strategically match your research questions with the appropriate modeling technique.
Can accommodate a variety of data types: You can explore research questions using a wide variety of data types, such as binary, longitudinal, and clustered data.
Lends itself to flexible modeling approaches: You can choose your link function, random component, and systematic components separately to adapt your model to different use cases.
Provides straightforward interpretation: You can clearly communicate your output, its meaning, and the certainty of your estimate.
Responses must be independent: If your responses aren’t independent, other modeling types may be more appropriate.
Need to use linear predictors: GLMs rely on the assumption that your predictors can combine into a combined linear predictor.
To interpret a generalized linear model results, a good place to start is the coefficients. This shows you the direction and size of the relationship between each predictor and the outcome. Next, translate the coefficient into plain language based on the model type (e.g., unit change for linear regression, odds ratio in logistic regression). Following this, check the confidence interval and p-values to interpret statistical significance.
When starting with GLMs, the first step is to clearly define your research question and understand the type of outcome you’re working with (e.g., continuous, binary, counts). This will guide you toward the right kind of GLM, whether it’s linear regression, logistic regression, or Poisson regression.
Following this, you can begin to explore statistical software that estimates GLMs. Programs like R, Python (statsmodels package), SAS, or SPSS provide built-in functions that allow you to fit models and interpret results. You can start with simple data sets and research questions, and expand to more complex models after you master the basics.
You can continue to learn about careers that use advanced data analytics methods, like generalized linear models, on the Coursera YouTube channel. Or, you can continue exploring skill highlights, expert insights, and more:
Watch on YouTube: Linear Regression from Scratch | Predict Sales from Advertising Spend
Hear from experts: Meet the Data Analyst Using His Creativity to Tell Visual Stories
Build your vocabulary: Data Analysis Terms & Definitions
Accelerate your career growth with a Coursera Plus subscription. When you enroll in either the monthly or annual option, you’ll get access to over 10,000 courses.
Frontiers. “Report Quality of Generalized Linear Mixed Models in Psychology: A Systematic Review, https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2021.666182/full.” Accessed October 1, 2025.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.