Service level indicators (SLIs) and service level objectives (SLOs) are fundamental tools for measuring and managing reliability. In this course, students learn approaches for devising appropriate SLIs and SLOs and managing reliability through the use of an error budget.
Site Reliability Engineering: Measuring and Managing Reliability
Instructor: Google Cloud Training
Sponsored by Google
55,622 already enrolled
(926 reviews)
What you'll learn
How to make systems reliable
Quantifying risks to and consequences of SLOs
Understanding SLIs, SLOs and SLAs
Skills you'll gain
- Service Management
- Business Process Improvement
- Performance Measurement
- Operations Management
- Service Level Agreement
- Operational Performance Management
- Statistical Modeling
- Key Performance Indicators (KPIs)
- DevOps
- Mathematical Modeling
- Process Improvement
- Risk Modeling
- Operational Excellence
- Applied Mathematics
- Continuous Improvement Process
- Service Level
- Goal Setting
- Reliability
- Service Improvement
- Site Reliability Engineering
Details to know
Add to your LinkedIn profile
16 assignments
See how employees at top companies are mastering in-demand skills
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review
There are 7 modules in this course
This module is intended to bring you up to speed on the concepts underpinning SRE, CRE, and SLOs. If you're already familiar with these concepts, you may still find new information and perspectives in this module, but it is not necessary to complete it.
What's included
11 videos1 assignment
In this module we’re going to talk about how you measure the desired reliability of a service. We will address what to consider when setting SLOs for your application within your organization. We'll look at the three principles we use to measure the desired reliability of a service: figuring out what you want to promise and to whom, figuring out the metrics you care about that make your service reliability “good", and finally, deciding how much reliability is good enough.
What's included
7 videos4 assignments
In this module, we’ll start by introducing a mechanism for quantifying unreliability using something called an error budget. We'll show how error budgets help you decide when to focus on making a service more reliable. And then we'll learn about some of the engineering and operational improvements that can help you do that.
What's included
7 videos3 assignments
In this module we will start off by taking a look at some characteristics of monitoring metrics that can make them useful as SLIs and contrast these against other metrics that are less useful. Because the choice of where to measure an SLI is a key variable, we'll cover the five main ways you can measure an SLI and compare their pros and cons.
What's included
14 videos3 assignments5 discussion prompts
In this module, we'll start off with an overview of our four step process for developing SLOs and SLIs for a user journey. We'll introduce the fictional company that created our example mobile game, the infrastructure that we'll be working with, and the simple user journey we'll be applying the four step process to.
What's included
7 videos2 assignments2 peer reviews
In this module we'll be taking a critical look at the availability risks for our example service. We want to answer the question: "are our SLO targets and error budgets realistic?"
What's included
4 videos2 peer reviews
In this module, we'll cover best practices for documenting your SLOs, the rationale behind a formal error budget policy and how best to create one and finally, we'll look at an example error budget policy in order to understand the trade-offs and incentives that play out during negotiations when trying to write an error budget policy.
What's included
9 videos3 assignments3 discussion prompts
Instructor
Offered by
Why people choose Coursera for their career
Learner reviews
926 reviews
- 5 stars
69.65%
- 4 stars
21.05%
- 3 stars
5.18%
- 2 stars
2.15%
- 1 star
1.94%
Showing 3 of 926
Reviewed on Jun 7, 2020
The Couse was very good and informative . Only improvement needed I think should be the quality of the recording . It was very fast in some instance and voice quality was distorted
Reviewed on Sep 27, 2020
Excellent course on SRE principles. Peer reviews are awkward due to lack of metric information, but they content attempts to re-enforce the principles and provide practical experience to the learner
Reviewed on Jun 5, 2019
Content is great. I mark course as 4/5 because it seems that teachers simply read text from paper and this really complicates understanding information.
Recommended if you're interested in Information Technology
University of Colorado Boulder
Advancing Women in Tech
Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy