Welcome back. Now that you've created a new metric, we're ready to compile the results of our test and draw conclusions. In this section, I'll show you what it means to aggregate a proportion metric, and then we'll run the results through an A/B testing calculator tool. Let's talk about what that means. In this video, we're going to talk about how to actually analyze a test. In the previous section, we talked a little bit about metrics and even built out some of our own metrics tables. The example we went through is order binary. This could answer the question: How many users made an order? The values are either one or zero, and the average value is somewhere between zero and one. We could even interpret that as an average. The average number of users who made an order. We can use this metric to answer the question: Did the variant cause more users to place an order? Another type of metric that we can use is a mean metric. We could use, for example, the number of orders. This helps us answer the question: How many orders did a user make? The values are any non-negative integer. So the average is going to be somewhere between zero and infinity. We could use a metric like this to answer the question: Did the variant cause users to create more orders? So slightly different. We're going to go through a walk-through of how to aggregate the proportion metrics, which is what we're going to talk about in this class. We're going to use the order binary metric from the previous exercise, and we're going to use it to complete the following: We're going to get a count of the users for each treatment group just for test_id number 7, and then we're going to count the number of users who have an order for each treatment group. So I've got the table already in here that we completed in the last exercise. Now, I just want to put it into a sub-table. So Select Star, From, and then I'm going to put some parentheses around it. I'll scroll to the bottom. Close that up. We'll call this table user_level because it's got user level metrics. Then, let's see. Actually, let's leave this Select Star there, and we'll go to the bottom. We'll add in a Where clause just to look at test_id number 7. We'll just check and make sure that that runs. Okay. Spacing. Cool. Running it. Great. Now, we've got results just for test_id 7. Let's come to the top and figure out what columns we want. So, we want to have the test_assignment, but we don't need to put the test_id in there because we already know what the test_id is going to be. Then, we need to count the user IDs. That's the users who are assigned in each test. Then, we also want to sum the order binary. That is going to be called users_with_orders. Okay. Because we're using some aggregation functions, we are going to need to group by test assignment. So I'm going to add that down at the bottom. Group By. Test_assignment. I'm going to get rid of that comma. All right, there we go. You can see how many users had orders in each group. So we're going to take our results from mode, and we're going to enter them into this web tool, this A/B testing web tool. I'm going to rename our labels. So Control and Treatment. Here, the number of trials is the number of users, and the number of successes is the users with order. So I'm going to cut those from mode. So here, users and we've got test_assignment one. That means they're in treatment so I'm going to put them there. Then, do the same thing with Control. Put that in the Number of trials. The number of users is the number of trials. Then the Successes is the users with orders. So that was in, yeah, it's in treatment. Now, we want to pull in the Control value. Okay. I'm running this at 95 percent confidence interval. Okay, we can see that the p-value is 0.059, and there is an observed lift of five percent. We believe that the actual lift is somewhere between negative 0.2 percent and 10 percent. Okay. So we've analyzed our first test. Let's think about a little bit about what could go wrong with the analysis, that maybe we didn't even see here. So for one, there could be errors or biased introduced in the assignment process. You might see this if the number of users treated is really different between the Treatment and Control. It could also be that the metrics we're calculating are not relevant to the hypothesis being tested. If this was a test that was about account creation, we shouldn't really expect there to be a lot of differences in order binary, or maybe we would see some. Okay. The other thing is the metrics could be not calculated properly. That happens sometimes. It certainly could be that the statistics are not calculated properly. So the length that I've given you to the Abba calculator, that will work for binary metrics. But when we're talking about an aggregation of mean metrics, we're going to also need to include some stuff that's a little bit different. We're going to need more statistics. We're going to need the average, which we did before just by adding up the sum, and we could have gotten it by dividing by the number of users. But here we also need the standard deviation. That's going to help us figure out our p-values. We're not going to go really deep into the statistics in this class. So we're not actually going to analyze any mean metrics in this exercise or the final project, because that's not the focus of the class. So one of the extra things that can go wrong with mean metrics is, computing the standard deviation can be a little bit tricky, especially if you're trying to segment results. If you are trying to look just at mobile users and you were using the standard deviation for everybody, you might be computing it incorrectly. So I'll add that down there to the list of things that could go wrong with the analysis of the test specifically for means metrics.