Counter-balancing has a couple of nice properties. For starters, you can treat it

as completely a between subjects design if you look at only the first task that

people do. And hopefully, the benefit that you get from practicing or the detriment

of fatique is roughly the same in both conditions and it'll even itself out. And

so, counter balancing washes out some of those concerns. There are times where this

isn't the case and that's a great time to use between subjects design. To minimize

learning effects, it's probably best to make the first and second tasks different.

So in this case, we might have people clean the D school as their first task and

clean the Gates building as their second. This also has the added benefit of giving

us some more Ecological validity by trying out a couple of different environments so

we know it's not just the Gates building that's better cleaned by robots. It also

has the benefit that both the D school and the Gates building are clean. How about

individual differences? Should we try to balance for shirt color, hair length,

gender, or whether the picture in their icon is squarish or more rectangular? It

depends on whether you think it's gonna make a difference in the study or whether

one might plausibly believe that it would make a difference in the study. Simplest

thing to do, is truly random assignment, we'll talk about more sophisticated

techniques in a couple of slides. Okay, so now we know a couple of ways of dealing

with two different alternatives, what if we have three? With three alternatives,

you can use something called a Latin Square. I'l l explain how this works in

three conditions, it generalizes more. In a classic Latin Square design, each person

is going to use all three conditions. You'll randomly divide your participants

up into three different groups. The first group is going to get the first one first,

the second one second and the third one third. The second group 231, and the third

one 312. So again, everybody gets all three conditions, but their order changes.

And if you look at any particular ordering segment, so like what people see first

what people see second what people see third, you can see that, that is also

evenly balanced across the three conditions. Whether you choose between

subjects approach or a within subjects approach. The most important thing to do

is to make sure that the odds that any particular participant ends up in a

particular condition or particular condition ordering is completely random

is, is even. We can illustrate this with an example. Say you wanted to find out

whether people were faster typing in the morning or typing in the afternoon and you

allow people to come in whenever they want. What if people who have a preference

for the morning, morning people are faster typers than people who have a preference

for the afternoon? Your conclusion is that morning would be, that morning was faster.

But that's not right, because just the morning people were faster, not that

there's something about the, the morning, or maybe not, you can't say confound. It's

possible that the causal reason was population difference and not experimental

manipulation. This confound is why a lot of economics is so hard. You're computing

correlations, but there's no manipulation. Random assignment is tool number one in

achieving a, an effective manipulation. So in our typing case, you would want to

assign people to be in either the morning condition or in the afternoon condition.

The morning and afternoon example can seem kind of stylized, but it shows up a lot in

the real world. For example, if you're running a website, one easy thing y ou

might do is show everybody one alternative on one day and one alternative on the

next. Well, there may be a difference between those days. We'll talk more about

running experiments later on. But for now I, wanted to point out that the key is if

you're going to do something like this make sure that people are randomly

assigned. Easiest way in most cases is a, is a between subjects design where you

assign people as they come in to see either one interface or the other. Here's

another example of the importance of random assignment. In the 1930's, some

studies were run at the Western Electric Factory outside Chicago, called Hawthorne.

And the plan was pretty simple, find out whether changes in lighting levels

affected productivity. So, experimenters came in, raised the lighting levels,

productivity goes up. Then, they tried lowering the light levels, productivity

went up. Tried a whole bunch of combinations, after each intervention,

productivity went up. The conclusion of course is that it's the active intervening

rather than that the light levels itself, which was the major cause behind the

productivity change. Presumably either the workers felt like people cared about them,

or the excitement of the experiment, or whatever. That was the driving factor. In

recent years, some economists have questioned whether in fact there was a

Hawthorne effect at the Hawthorne plant. If you're curious, you can Google more

about that. In either case, the name stuck and it means a case where what you're

seeing is your effect as a result of the intervention rather than the thing you

were trying to study, and you can avoid this with random assignment. We've talked

about counterbalancing the order of conditions that participants experience.

You can also counterbalance how you assign people the conditions. Say for example,

you are worried that typing speed will differentially affect something in your,

in your interface. You're building a new spreadsheet or something like that. You

could use a pretest to establish typing s peed ahead of time and use that to assign

people to conditions. There's many techniques for doing this. The simplest

way to do it is just look at high, high-speed typers versus low-speed typers.

The key, no matter what, is that each participant has an equal chance of ending

up in either condition. Let's walk through an example. If you can pretest everyone

ahead of time, one slick thing that you can do is form matched pairs. So, say we

get the typing speeds that we see here and after ordering they look like this. We can

group them into pairs, and then for each pair we can conceptually flip a coin about

which of them is going to land in which of the two conditions. I got one of these

dollar coins in the ticket machine the other day, it will do well for flipping,

so for a 35 and 37, it's heads. So, we'll put 35 down here, in the first condition,

37 goes in the second. And for 57 versus 59, it's tails, so 59 goes here. Third

one's heads, so that gives us 61, 68. And tails, and that gives us 99 goes here and

70 goes here. By doing this matched pairs, you're balancing out the performance of

people approximately in each condition and by having some randomness in there, you're

ensuring that you don't get some accidental statistical artifacts that

creep in by saying assign all your odds here and all your evens there. But say

people are coming in online so you can't pretest people for the experiment. Well

what you can do, is you can pick some threshold that you think is in about the

middle. So, for typing you might say 65 words a minute. And as people come in, you

can check whether they're above or below that threshold and label them as high,

high or low, fast or slow typers based on that. So, 35 we would say is low, 40 is

low, 90 is high, 68 is high and so on. And, you can assign them to your two

different conditions, call them A and B by high and low. So, our first low person to

come in 35, tails, so they go to B. And then to balance that out 40 we would go

heads. Next time we got a pair of l ows, we can flip the coin again and we're going

to do that for the highs also. So, the highs come in and heads. So, our first

high will go to A and our second high will go to B. You don't need to make sure that

you have even numbers of high and low typers, unless you're worried about that,

making a material difference on the, on the outcome. In fact, if you have enough

participants, you can look at this two by two grid and compare the outcomes on the

four cells. What you do need to make sure, is that there are the same number of fast

typers of high in A and in B and the same numbers of slow typers are low in A and in

B. There's lots of ways that you can do this kind of counterbalancing. In general,

all of them are treating the same thing which is try and help the law of large

numbers that you get in a between subjects study, work a little bit faster. Now,

there's a danger of assigning people based on a pretest that I would like to warn you

about. Say, we wanted to pretest for coins that were more likely to come up heads.

So, I have some coins here. I can flip each of these a couple of times, heads,

tails. Okay, that one was heads more. So, it's really heads more. Tails. Yes. That

was tails. Heads. And tails. So, we have three coins that had a heady tendency and

three coins that had a taily tendency. Now we can feed them a snack so we can get a

bagel. In fact, this is the same bagel that's in the last lecture so with bagels

so it's, it's not such a fresh bagel anymore but nonetheless, we think it'll

give our coin some sustinance so we can each eat some of the bagel and we want to

see whether our heady coins become more so and whether our taily coins become more

so. So, I think by the natural tendency of these coins is in one direction by feeding

them, I, I think that may make them more so. So, the question that we're going to

ask, is whether snacking increases the natural tendency of coins. And we can

re-flip all of these and I, I won't inflict that on you right now. But you can

try it yourself. And what you'll see of course is that, none of these coins really

had a tendency towards heads or tails, it's just that on a small number of

samples, you know, if you came out little bit with more heads, if you came out

little bit with more tails. And if you flip them again, there will be no

correlation between their heady tendency in the first half and heady tendency in

the second half. To make it a little more exciting than just the six coins that I

had in my pocket, I decided to generate 30 coins in Excel and I flipped them each 21

times and you can see the results here of how many heads each got out of 21 tosses.

I picked an odd number so they would have to be either heady or taily on the whole.

And what we can see if we rank them by number, is it turns out actually, fifteen

of them had a head tendency, fifteen had a tail. And the average of the, of the heads

is 12.9 and average of our tail tendencies is 8.3. Now, after I feed them the snack,

I can flip that again. And, I've highlighted in yellow and bolded again,

that the numbers that have a head tendency. And there's essentially no

correlation between here. It does turn out that our heads, have our, our initial

heads have an average of 10.7 after eating the bagel. And our initials tails have an

average of 9.6. So, we can see that both of these regress towards the mean. Both of

these are closer towards the expected value of ten and a half. We may see a

perceived difference here between the 9.6 and the 10.7. In a future lecture, we'll

learn how to test whether that's a real difference or whether that's a statistical

mirage. So, this danger of regression is very real and it shows up all the time in

statistical analyses of things like low performing schools increasing in the next

year or high performing schools decreasing. The assignment problem that

results in regression to the mean, is when you're making something like a dividing

line and you're using the assignment so everybody who scores high goes up and

everybody who scores low goes down and then you measure their performance

subsequently. If instead you use the pre-test to counterbalance, like we did

with the typists, so we put high and low-speed typists in both conditions, then

you no longer have that worry about regression to the mean. So, the major

question that we tackled here is, should every participant use every alternative?

And with it emerged was three major strategies. In a Within-subjects design,

everybody tries all the options. This has big benefits in terms of recruiting

participants. You get more work out of each person. And it works really well when

you're not worried about learning or practice or exposure issues. Trying one

version will pollute the other one. In Between-subjects study, each person tries

one condition. This requires more people and you may want to consider

counterbalancing for fair assignment. It has the benefit of course, that each

participant is uncorrupted and for this reason is the most common thing we see in

things like web studies. And if you use it between subjects design, you can use

counterbalancing to help even things out. What I am offering today, is just a really

high level overview. And I've necessarily glossed over a whole bunch of important

details. But I wanted to give you a initial lay of the land for running

studies. If you're interested in more, more reading in this area, my favorite

book in this area is David Martin's book Doing Psychology Experiments. I'll see you

next time.