In the previous lesson, we covered the four-step process for developing SLIs and SLOs and introduced a simple user journey from our example game. The first step towards having SLOs defined for that journey is to state clearly in high-level terms, how we want to measure the performance of our service against the expectations of our users. Since this is a request response interaction, we almost certainly want to measure the availability and latency experienced by players when they load the profile page. Put another way, they will expect the profile page to successfully load and quickly. But these expectations raise questions. What does successful or quickly really mean? How will we measure these concepts? It's really important to be precise when defining SLI implementations. Your final SLI should be clear about where it is measured, what is being measured including any units, and what attributes of the underlying monitoring metrics are included or excluded. Remember, we want the final implementation to contain enough detail that someone could create monitoring configurations or software to measure the SLI based upon it. Let's see how this process of refinement works. We start with our high level availability specification. The proportion of valid requests that were served successfully. Our web servers serve way more than just these profile page requests, but it's only these requests that are valid for this particular user journey. We can identify profile page requests from HTTP requests path, which we know from our sequence diagram will either be /profile/user, or /profile/user/avatar. Let's update our specification. The proportion of HTTP GET requests for /profile/user or /profile/user/avatar that were served successfully. Onto a definition of success, our company just wants to get something simple measured quickly. So, we're going to use the HTTP status code as an indicator of success. Rather than enumerate every possible code, we'll just assume that any 500 class codes are bad and include 429 for good measure because that's what our servers return to users when they're overloaded. Plugging this back into the specification gives us, the proportion of HTTP GET requests for /profile/user or /profile/user/avatar that have 200, 300 or 400 response codes. The last question to answer is, where exactly we will record these response codes and measure the SLI? Since our load balancers have visibility over all of our incoming requests and they are already recording metrics of HTTP response codes, we can graph our SLI at once if we use this data. This gives us our final SLI implementation. The proportion of HTTP GET requests for /profile/user or /profile/user/avatar that have 200, 300 or 400 response codes measured at the load balancer. We can go through a similar process for our latency SLI. Setting the specific threshold for too slow is something we'll do later when setting SLO targets for the SLI. For now we can leave it as X milliseconds. So, we end up with an SLI implementation that looks like the proportion of HTTP GET requests for /profile/user that send their entire response within X milliseconds measured at the load balancer. Again here we're choosing to measure latency at the load balancer because it's already exporting useful metrics. We've chosen to include only profile page latency in the SLI since that's what matters most to users. Measuring avatar serving on the other hand, could be potentially pulling large amounts of image data from a blob storage service and may have a noticeably different effect on latency. In the next lesson we'll start thinking about SL0 targets for these SLIs. But we've run into a little problem. Because we haven't walked through how our infrastructure serves the profile page to check whether the SLIs we've proposed cover all the possible failure modes. We've suffered an outage that wasn't detected by our new SLIs. What can we do? We've also got another more complex user journey which you're going to practice the four-step process on. Can you specify some SLIs for this journey and then refine them into implementations?