About this Course
71,874 recent views

100% online

Start instantly and learn at your own schedule.

Flexible deadlines

Reset deadlines in accordance to your schedule.

Advanced Level

Approx. 22 hours to complete

Suggested: 4 weeks of study, estimated 2 hours per week....

English

Subtitles: English

What you will learn

  • Check

    How to make systems reliable

  • Check

    Understanding SLIs, SLOs and SLAs

  • Check

    Quantifying risks to and consequences of SLOs

100% online

Start instantly and learn at your own schedule.

Flexible deadlines

Reset deadlines in accordance to your schedule.

Advanced Level

Approx. 22 hours to complete

Suggested: 4 weeks of study, estimated 2 hours per week....

English

Subtitles: English

Syllabus - What you will learn from this course

Week
1
27 minutes to complete

Introduction to SRE

This module is intended to bring you up to speed on the concepts underpinning SRE, CRE, and SLOs. If you're already familiar with these concepts, you may still find new information and perspectives in this module, but it is not necessary to complete it.

...
9 videos (Total 15 min), 1 quiz
9 videos
Introduction15s
Intro10s
CRE's Three Reliability Principles3m
Reliability in the Cloud3m
How SLOs help your business make decisions1m
How SLOs help you build features faster1m
How SLOs help you balance operational and project work1m
Making SLOs work for your organization59s
1 practice exercise
DevOps/SRE1m
1 hour to complete

Targeting Reliability

In this module we’re going to talk about how you measure the desired reliability of a service. We will address what to consider when setting SLOs for your application within your organization. We'll look at the three principles we use to measure the desired reliability of a service: figuring out what you want to promise and to whom, figuring out the metrics you care about that make your service reliability “good", and finally, deciding how much reliability is good enough.

...
7 videos (Total 14 min), 4 quizzes
7 videos
SLOs vs SLAs2m
The happiness test2m
How do we measure reliability?3m
Edge cases2m
100% is the wrong target1m
Iterating1m
4 practice exercises
A working service5m
SLOs and SLAs7m
Reliability and iterating1m
Targeting Reliability Assessment7m
1 hour to complete

Operating for Reliability

In this module, we’ll start by introducing a mechanism for quantifying unreliability using something called an error budget. We'll show how error budgets help you decide when to focus on making a service more reliable. And then we'll learn about some of the engineering and operational improvements that can help you do that.

...
7 videos (Total 19 min), 3 quizzes
7 videos
Error budgets3m
Everything is a trade-off3m
Error budgets: advanced concepts2m
Axes of improvement4m
Operational approach to increasing reliability2m
Module summary50s
3 practice exercises
Error budgets5m
Increasing reliability3m
Operating for Reliability Assessment5m
Week
2
1 hour to complete

Choosing a Good SLI

In this module we will start off by taking a look at some characteristics of monitoring metrics that can make them useful as SLIs and contrast these against other metrics that are less useful. Because the choice of where to measure an SLI is a key variable, we'll cover the five main ways you can measure an SLI and compare their pros and cons.

...
14 videos (Total 41 min), 3 quizzes
14 videos
User happiness in metric form1m
The properties of good SLI metrics4m
Ways of measuring SLIs4m
The SLI menu2m
The SLI equation1m
Request / Response SLIs5m
Data processing SLIs6m
"But my system is really complex!"2m
Managing complexity with aggregation2m
Managing complexity with bucketing3m
Achieveable SLOs1m
Aspirational SLOs1m
Continuous improvement1m
3 practice exercises
Measuring happiness1m
Commonly used SLIs2m
Correctness and Coverage2m
Week
3
5 hours to complete

Developing SLOs and SLIs

In this module, we'll start off with an overview of our four step process for developing SLOs and SLIs for a user journey. We'll introduce the fictional company that created our example mobile game, the infrastructure that we'll be working with, and the simple user journey we'll be applying the four step process to.

...
7 videos (Total 18 min), 4 quizzes
7 videos
The 4 step process1m
Our example game1m
Loading the profile page1m
Refining SLI specifications4m
Looking for observability gaps2m
Failure modes4m
2 practice exercises
Postmortem!15m
Setting Achievable SLO targets15m
Week
4
4 hours to complete

Quantifying Risks to SLOs

In this module we'll be taking a critical look at the availability risks for our example service. We want to answer the question: "are our SLO targets and error budgets realistic?"

...
4 videos (Total 20 min), 2 quizzes
4 videos
Is your error budget realistic?3m
Modeling risks in our spreadsheet5m
Analyzing risk9m
1 hour to complete

Consequences of SLO Misses

In this module, we'll cover best practices for documenting your SLOs, the rationale behind a formal error budget policy and how best to create one and finally, we'll look at an example error budget policy in order to understand the trade-offs and incentives that play out during negotiations when trying to write an error budget policy.

...
9 videos (Total 21 min), 3 quizzes
9 videos
No surprises2m
A dashboard example1m
Why an error budget policy?2m
Fundamentals of an error budget policy3m
How to draft an error budget policy3m
Example policy thresholds3m
A hypothetical policy scenario3m
Course conclusion and video wrap up47s
3 practice exercises
Error budget policies1m
Error budget policy -- considerations2m
Consequences of SLO Misses1m
4.5
20 ReviewsChevron Right

Top reviews from Site Reliability Engineering: Measuring and Managing Reliability

By RAMay 4th 2019

This is a excellent course that covers the in depth topics on Site Reliability Engineering

About Google Cloud

We help millions of organizations empower their employees, serve their customers, and build what’s next for their businesses with innovative technology created in—and for—the cloud. Our products are engineered for security, reliability, and scalability, running the full stack from infrastructure to applications to devices and hardware. Our teams are dedicated to helping customers apply our technologies to create success....

Frequently Asked Questions

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

More questions? Visit the Learner Help Center.