What Does a Site Reliability Engineer Do? Your Guide

Written by Coursera • Updated on

Site reliability engineers ensure that apps and websites run smoothly and reliably. Learn more about this emerging career and what skills you’ll need to get started.

[Featured image] A site reliability engineer (SRE) works on his desktop computer.

A site reliability engineer (SRE) makes sure that websites are more reliable, efficient, and scalable. They help create automated solutions to improve operational aspects of the site. As we continue to go online for more and more tasks in our daily lives, it’s increasingly important to keep these technologies up and running. Let’s take a closer look at this emerging career, including the skills you need to get started.

Did you know? SREs began appearing in 2003 when Google formed a team of software engineers for the sole purpose of improving the reliability and scalability of the company’s sites. The approach was so effective that other leading tech companies, including Netflix and Amazon, soon followed suit.

Placeholder

What is a site reliability engineer? 

The SRE role is tasked with making sure the site is equipped with the functions it needs to provide users with the requested services. In today’s automated world, that task includes building self-service tools that provide greater availability, performance, and efficiency for users. 

According to Google’s VP of Engineering, Ben Treynor, SRE is “what happens when you ask a software engineer to design an operations function.” Most SREs spend time on both operations tasks and development projects (developing new features, automating processes, scaling systems, etc.).

Tasks and duties for site reliability engineering roles might include:

  • Collaborating with software developers, engineers, and operations teams

  • Monitoring sites and software to make sure they’re performing properly (including on-call shifts)

  • Anticipating potential problems before they occur (and coming up with solutions)

  • Conducting post-incident reviews

  • Documenting your work to turn findings into repeatable actions

  • Coding automation within a site infrastructure 

  • Mentoring and coaching junior engineers

SRE vs. DevOps: What's the difference

The disciplines of SRE and DevOps overlap in many ways, but they also have one key difference. DevOps teams define what needs to be done to minimize gaps between software development and operations. SRE teams translate the pillars of DevOps into practices. If DevOps is the “what,” SRE is the “how.”

Placeholder

Site reliability engineer skills

Success in this role often entails being a proactive problem solver with an eye for both software engineering and development. These are some of the skills that will serve you well in this job:

  • Understanding of development and operations

  • Familiarity with production monitoring systems

  • Attention to detail

  • Analytical and problem-solving skills

  • Ability to collaborate across multi-functional teams

  • Coding in Java, Python, Perl, or Ruby

  • Technical writing skills

Why should I pursue a career as a site reliability engineer?

If you’re looking for a software-centric role in an emerging, in-demand field, a career as a SRE might be a good fit. The 2020 LinkedIn Emerging Jobs Report found that site reliability engineering jobs are growing by 34 percent annually [1]. According to Glassdoor, the annual average base pay for a site reliability engineer in the United States is $119,119 (March 2022) [2].

Site reliability engineer career path

Site reliability engineering is typically a mid-level role—a good option for those with a few years of experience as a systems administrator or software developer. Most companies require at least a bachelor’s degree in computer science or a related field. Additional certifications and experience with different operating and programming codes are also an advantage.  

If you’re starting out, a junior-level position on a site reliability engineering team is a good way to learn and grow. In this collaborative environment, you can work with others to solve issues while building your own skill sets. As you gain experience and technical knowledge, you can often advance your career into more senior positions. 

Get started with Coursera

Build skills you'll need as an SRE with Site Reliability Engineering: Measuring and Managing Reliability, offered by Google Cloud on Coursera. Develop a deeper understanding of how service level indicators (SLIs) and service level objectives (SLOs) are used to manage and measure reliability. Upon course completion, you'll have a certificate to share on your resume.

Placeholder

course

Site Reliability Engineering: Measuring and Managing Reliability

Service level indicators (SLIs) and service level objectives (SLOs) are fundamental tools for measuring and managing reliability. In this course, students ...

4.5

(733 ratings)

39,998 already enrolled

INTERMEDIATE level

Average time: 1 month(s)

Learn at your own pace

Related articles

Article sources

1. LinkedIn. “LinkedIn 2020 Emerging Jobs Report, https://business.linkedin.com/content/dam/me/business/en-us/talent-solutions/emerging-jobs-report/Emerging_Jobs_Report_U.S._FINAL.pdf." Accessed March 28, 2022.

2. Glassdoor. “Salary: Site Reliability Engineer, https://www.glassdoor.com/Salaries/site-reliability-engineer-salary-SRCH_KO0,25.htm.” Accessed March 28, 2022.

Written by Coursera • Updated on

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

Learn without limits