[SOUND]. In this lecture, we're going to talk about A/B Testing. This is a type of testing that doesn't rely on any cognitive or psychological understanding or model of user behavior. Rather, you just show users two or more options, that is where the name comes from, option A and B. And you put both of those options live on the web or live in your application, and measure how well they perform. You can actually find tremendous differences in performance based on very tiny changes in the way the interface works. This can be done to measure which sorts of security features, like authentication mechanisms are more effective, but it can be used in all kinds of ways. The process is quite simple. You show people different options and you measure the performance, but I think it's easiest to understand by looking at some good examples. I'd like us to start with this example from the Obama campaign. There's a reading for this week that's linked to a blog entry about exactly this campaign. So you can get more details from the company who ran this test to see more about what they did. But lets start with this page, this was the main welcome page when people came to Barack Obama's Presidential website, in the 2008 campaign. And this is the default page. So, you came there, you were asked to put in your information and, you can note that at the bottom you have this option to continue to the website. So, you don't actually need to fill out your information here, but you can and that puts you on their mailing list. The first thing that was measured in this study is if we could change the text of this sign up button and if that would make a difference. So there were four buttons that they used. Sign up was the default, that was the original button. And they tried but, replacing that with buttons that said, learn more, sign up now, or join us now. A certain percentage of users saw each kind of button. They were randomly assigned to one button or another when they came to the site. And I think an interesting thing when I talk about A/B testing in class, is to see what your perceptions are versus the reality. Because often, I have students who are completely wrong in every guess they have about which thing performed better. So, I'm going to always show you the options first, ask you to guess which one of these things did best, and then we'll actually take a look at the results. So, sign up is the default, we have three alternative buttons. Think about which one you think would do better, and then let's look at the results. So, compared to the response rate that they were getting with the sign up button, using the button that said learn more lead to an 18.6% increase in the number of people who signed up. However, using sign up now led to a 2.38% decrease in the number of sign ups, and join us now led to 1.3% increase which is a small increase, but nothing is dramatic as what we got with learn more. So that's pretty interesting. And this gets to one point which is both frustrating in some respects and, simple in other respects about A/B testing. We have no idea why one of these performed better than the other. We can come up with theories, but ultimately there's no insight into the why. There's no explanation that comes from A/B testing. We simply know the result that one button performed better. In this same test, the researchers looked at replacing this main image with either a video or an alternative image. What we're going to do now is look at the videos and images that they had and then we'll take a guess about which one did better and look at the results. [MUSIC] >> Now even as we speak, there are those who are preparing to divide us. The spin masters, the negative ad peddlers, who have raised the politics of anything goes. Well, I say to them tonight, there is not a liberal America and a conservative America. There is the United States of America. There is not a Black America, and a White America, and Latino America, and Asian America, there's the United States of America. [APPLAUSE] The pundits, the pundits like to slice and dice our country into red states and blue states. Red states for republicans, blue states for democrats, but I've got news for them too. We worship an awesome God in the blue states and we don't like federal agents poking around in our libraries in the red states. [NOISE] [MUSIC] >> It's been a long time since I wanted to hear what a politician had to say. >> You came here because you believe in what this country can be. Hi, I'm Barack Obama. Welcome to barackobama.com, the official website of my campaign for president. Here, you can sign up for invitations to campaign events, and learn more about our movement for change. >> And finally, there are two images. This one, called the family image. And this last one called the change image. All of these were shown as replacements for the main image that we saw originally. So which performed better? Make your guess, and now we'll look at the results. So our first row here shows the original image, obviously that doesn't improve on itself. And then we see the order of performance for the other options. The family image, by far, had the best improvement. It had a 13.1% improvement in the number of people who signed up. The change image did a little bit better than the original with a 3.85% increase. And all the videos had a decrease in sign up. And in fact, the interesting result is the video is designed to be more inspirational. Sam's video which was the long one that we saw first. And then the Springfield video both had really dramatic decreases in the number of people that signed up. Again, we can come up with theories about why this is, for example, people might get tired of watching the video and just leave the site instead of signing up. Or they might click through, using that link that doesn't require them to sign up. But we don't really know. We don't get any explanation for why with A/B testing. We just get the results. Let's look at another example. This is the National Alert Registry, and it's a registry that people can sign up for and get updates to. This is their main page, so you can see registered sex offenders listed in the area, you can sign up to get a report and there's credit card information at the bottom. If you look at the bottom of this page, there's actually a link here that will take you to a website that talks about the design and redesign of this page. So, they started off with this original design and they created a couple alternatives. The second option looks like this, it's a pretty small change from the previous one. If we click back and forth a couple times, you can see the font changes, and there's a little bit of layout difference. The text here, on the original, changes to a slightly shorter bit of text. But overall, the site looks about the same. Finally, they tried a third layout. This goes into a two column format, so the sign up information is more clearly in the center, and some of the information that was at the top is now moved into a parallel column. Which one of these three, do you think had the biggest impact on the number of people who signed up? Well, if we look, here are our three options side by side so you can compare them. Basically, the first two are very similar with a few changes in language, the third one changes the layout. And what we see is that for the period that they tested, the original generated 244 new sales. The second option, which had this click here text added, increased sales by 15.5%, they had 282 new sign ups. In this third option, with the parallel piece of text, new sales were only 114. That's a 53.5% decrease in the number of people who signed up. So, these very small changes actually can have a huge impact on the way users behave with the site. Here's one more really simple example. Dustin Curtis is a UI Designer and he has a blog and he did his own A/B test on his blog asking people to follow him on Twitter, and he had 25% of people assigned to see each of these four different options. The first one just says, I'm on Twitter, with a link to Twitter. The second one says follow me on Twitter. This third says, you should follow me on Twitter. And the fourth says you should follow me on Twitter here. Now, when I talk about this with HCI people, they all kind of cringe at this last option because we're kind of taught that you shouldn't use the word here as a link. Because if people are scanning through text for the underlines, and they see the word here underlined, it doesn't tell them anything about where they are going. The three other options have Twitter as the link, and that lets them know that it's going to Twitter. Which one of these four options do you think got the most people to follow Dustin on Twitter? Well here's the performance. Here we're seeing the percentage of people who followed him out of everyone who saw each of these options. The first option, I'm on Twitter, did the worst. And the last option that has that here underlined, even though it goes against some guidelines of good web design, had a much, much better performance. Performing about three times as well as this original option. Here's an example from a presentation given at Microsoft called Practical Guide to Controlled Experiments on the Web. Here, they were studying how often people filled out feedback in their Office Online Feedback form. This is the original form, so the response rate here gave them their baseline for comparison. They looked at two alternatives. This was Option A. Option B had a two stage response. You could click on a star and then after you clicked on it, you were given a box that had your rating and you could write a comment. Option C, asked you a simple question. Was this information helpful? Yes, no, or I don't know. And then based on which of those buttons the user clicked, they were given a different box with the question, to actually write a comment and submit it. Which of those three options do you think did best? So option A, B, and C, option B with the two stage response had double the response rate of A, the default. And option C had 3.5 times the response rate of B. So over seven times the response rate of A, which really shows that even though A is a simple interface, and in fact has the same options as the other two, the way that it's presented has a huge impact on how people respond. And here's one final example. It's from the same presentation, and the researchers here were comparing the performance of MSN Live Search. Now there's two windows shown here and they have the two variations that were being tested. Can you tell what those variations are, just by looking at the window? It's actually really subtle. The main difference is in the colors, and you can see it most clearly in the text of the links. Here, it's a kind of tealish blue, and it's a more bold blue in the second example. If there's any performance difference, it's one of those things that's actually quite mysterious to us as user interface designers. The differences in color really don't have an impact here on the clarity of what's going on in the site. It's not like one color text indicates a link more clearly than the others. It's pretty straightforward in both examples. And in fact, just looking at it, it's almost impossible for us as users to even see the difference between these two sites. So, why would there possibly be a difference in the performance that you see from users on each of these two sites. Is there one? In fact there is, and it's pretty big. The number of queries per user is 0.9% higher with the second example, and that may not sound big, but when you have millions and millions of people coming to your site, that results in thousands of additional searches. More importantly, the number of ad clicks per user increased 3.1% with the slighty bolder colors in the second example, then there were in the first example. Why is this the case? We don't know. And I think this example in particular highlights some of the frustrating parts of doing A/B testing. There's clear and obvious differences in the performance here, but there's very little to suggest why that might be the case. So here are some details on how to run an A/B test. First, you want to start with a small percentage of visitors trying your experimental conditions. Don't create two conditions and launch half the site with one and half the site with the other. Start with say a tenth of a percent of people seeing the experimental condition and then ramp it up. You want to do that so you can automatically stop testing if any condition has very bad performance. For example, if you have something that drives away half your visitors, or half your customers, you don't want to have that test running on a huge group of people for a long amount of time. Start it with a small group, and if it looks like it's doing okay, then increase that number until eventually you're testing with half your visitors. Finally, let people consistently see the same variations so they don't get confused. If you're changing the appearance and layout of your site and a person comes to the site and sees option A, comes back later that day and sees option B, it can be really confusing. So you want to make sure, once a user has been put in an experimental condition, that they remain in that same condition for return visits. Google Analytics allows you to easily do A/B testing, and they can keep track of what condition people have been put in, and keep the interface consistent for them on return visits. So in conclusion, small tweaks to an interface can lead to big differences in user behavior. We saw that in all these examples that we looked at today. A/B testing allows you to check that difference in behavior by showing alternative versions of the site to people. So, different groups see different versions of the site. You measure the performance of those groups, and that allows you to make conclusions about what version of your site is most effective. You don't get any explanation about why that difference leads to the change in behavior, but it's still really useful data to tweak your site and improve its performance