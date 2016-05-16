Hi, welcome back, Caren Stalburg, here. We've been talking in this unit about assessments. And so when we create assessments, we also need to think about how are we gonna score them and what do they mean. So, today, we're gonna talk about how to create scoring rubrics. And then, I'd like you to also become a little bit familiar with the techniques that most individuals use when setting standards for exams. So let's make sure we're all talking about the same things. When I"m talking about score, what I mean is the actual performance of the individual on the actual assessment. When I"m talking about standard, or standard setting, what I mean is the acceptable score to indicate the desired level of performance. So, we could score an exam and say that somebody scored 60%. And that may or may not meet our standard for acceptability. It will depend on a number of things. So let's back up for a second and talk about scoring rubrics. What we're trying to do is look at the performance around an assessed domain and describe what that performance looks like. So let me break it down a little bit better for you. If we're talking about a novice learner when they're taking a history, that individual will obtain the most relevant points in the history, but they may miss some key components. Now, you can create a nine point score, you can create an eight point score. Really, it depends whatever you feel is an appropriate scoring sort of domain or scoring dimension. In this case it would allow the evaluator sort of three areas to say, yes this person is novice. But they're approaching intermediate, which means that they got most of the relevant history and most of the key elements but not all of it. Or they were actually really competent and included all of the relevant components of a patient history. So this is one way to create a scoring rubric. Another way is a way that I showed you in the previous section about verbal skills. So again, we've got the domain that we are looking for, the number score, and the actual behavior or skill that needs to be demonstrated. And we may say okay, if you get a 3, that that's a standard passing score. So everything above a 3 is going to be acceptable for passing. And it may provide gradations of excellence, which we may or may not use to provide sort of higher grades, like A, B, C. Or, we may say this is the cutoff, which is where under here, you did not display the skill appropriately. So if you were difficult to hear, or had poorly positioned audio equipment. Or if the audience was completely gone from your talk, then you actually did not fulfill the sort of necessary performance to say that you checked off on that skill. So it can get confusing. Let me talk a little bit about standards. The issue around standards is basically saying well, what does the score mean? Does it mean that the person is competent? Does it mean that the person has enough information to proceed to the next section? And all of these standards are actually defined by you as the educator. But the important thing is to make sure that it's a thoughtful judgment that is made by experts. So you don't want it to be capricious, right? The experts need to know the content of the assessment being done and the sort of content of the exam. And what is the purpose of the exam? A high stakes exam, like a licensure exam, may have and should have more stringent standards, and more reliable standards and reproducible standards, than a low stakes exam. But no matter what standard you set for your examination, the important thing is to make sure that the criteria you use can be explained and justified. So one example would be that if you were creating an exam for people to read EKGs, there are certain pieces of information that while important for folks to learn, deciding the difference between right bundle branch block and a left bundle branch block. That might be a higher order skill that someone at a more expert level could attain, and that would be fine. But the minimal passing individual should be able to identify an acute myocardial infarction. And, you also want to make sure that you understand the learners or the group being tested. So, in my EKG example, if we were looking at first year emergency medical technician students, whether or not they could identify a right bundle or a left bundle branch block may not be an appropriate standard to hold them to. However, all of them, we would argue, perhaps, should be able to recognize and acute MI. So where do we set those cut-off scores? So again, the cut-off score is the place or the number below which the performance of the individual is sort of deemed unacceptable. Now remember, where you place your cutoff score can have really significant ramifications for you, in terms of how many people are you failing. And how many people have to go through the examination again, and what is the efficiency issue there? What is the cost? But more importantly, the cutoff score can have significant ramifications for the individual. Because, if your cutoff score is too high, you're gonna have too many people failing. And perhaps that's not really an appropriate cut off score, or what you want for individuals. And if it's too low, then you run the risk of saying that people are able to do things that they actually really aren't capable of doing. So, when people talk about cut-off scores, and our psychometrician colleagues are looking at standards. We sort of talk about relative standards where they're norm-referenced, or based on the performance of a group of individuals who take the assessment. So we may say that the exam mean, whatever that is, is set at a C grade. And whoever is in the bottom 10th percentile will fail. So that translates into if the mean is 60%, people will still pass because it's a relative standard. It may be that the exam was very tough. It may be that the individuals taking the exam were novices. And so this was their first go around and the performance was low. But it's a relative standard, as opposed to an absolute standard which is criterion based. And so it's going to be independent of the group's performance. If a learner gets 70% of the exam questions correct, then that demonstrates they have mastered enough of the material to have an adequate performance. And if nobody in the class scores 70%, then no one passes. You don't necessarily adjust based on the group's performance. So, there are actually sort of formal ways to to this, and some of us are familiar with those. But I just want to mention them because sometimes people use this language to help understand how do you set the standard. So there's two methods. We're going to talk about, one is the Angoff method and the other is the Hofstee method. So first the Angoff method is a test-centered method for setting your standard. So what you do, again, is convene your panel of experts who have an understanding of the information. They have an understanding of your students. And they have an understanding of sort of where the goal is, what are the stakes, how rigorous is this exam. And based on the actual test question, they then provided judgment of how would a borderline student perform on this question? So the panel of experts judges each question and has a sense of, this is a really hard question, and probably only the superstars would get this right. So, I would expect 5% of the borderline students to get this question correct. So it's going to give, again, this estimate that the borderline student's going to answer correctly. And then what you do is for each item on the test, you average the estimates of the raters per item. So, meaning, there's ten items and you have five expert judges. Then for each item, you average their estimates. And then that becomes your set point across the entire test, because you take the average estimate for each item and average it across the exam, providing a cutoff score. The Hofstee method is sort of this blend between relative and absolute. So you ask your judges to actually make four judgments about the item. What would be the lowest and the highest acceptable passing score for this exam or assessment? And what would be the lowest and highest acceptable fail rate? So the passing score actually is going to be the mean of the four judgments. Meaning highest and lowest acceptable scores and highest and lowest acceptable fail rates plotted actually against cumulative score distribution. So let me show you what this looks like. If you take generally the fail rate, as well as the number of items correct, your judges will come up with the highest and lowest fail rate. So you may say okay, the absolute lowest fail rate we could tolerate on this exam would be 3% of the students. And the absolute highest fail rate we could tolerate would be about 15%. Then we would also look at the number of items that we think should be correct. So, the absolute highest number that would be an acceptable performance would be 70 questions, and the absolute lowest would be 60. So you come up with this box of sort of where's our range of where the cut point should be? And then you actually plot that against the student performance curve. And where those two intersect is basically your pass-fail cut point. So you could say from this graph that our panel of experts said that we don't want less than 2 to 3% failing, and we don't want more that 15% of our students failing. And we think that the items correct in a 100-point exam is somewhere between 60 and 70. And if we look at how our learners actually performed, the place where that sort of cuts is at the 66 to 67% rate. And then that means we may have about 10 to 15% of the students failing. And we have to decide as a group and as a panel of experts if that's an okay number to pick up. To say, when we administer this exam, we know there's going to be 15% who may fall below that cut point. So again, you may have unique ramifications from where you set your cut point. You actually may pass too many students and miss a student who shouldn't have been passed. There's always the practical aspects of administering a test. Some people would say 15% is way too high. I can't redo that standardized patient performance, or I can't re-administer a test to 15% of 20,000 people. And then remember that the cut-point for the Hofstee is gonna depend on sort of the rectangle and the curve of student performance, and they're actually gonna have to intersect. So sometimes the way that people perform on the exam is not what your panel of experts actually wanted them to or predicted them to. And so, you may have to shift or you may have to go back and say, is this really measuring what we think we're measuring? Is this exam correct? What else do we need to do? So standard setting is very, very, difficult. It certainly will not be clear to you, or crystal clear anyway, in sort of this short video. My intent here is to make you aware of the different options. And to just raise your awareness about things to consider when we're creating an assessments around intentional instructional methods. Remember that the higher the stakes, the more intentional your process of standard setting needs to be. Your standards, no matter what you decide, need to be defensible. They need to make sense. They need to be reproducible and relative to the context that you're in. And depending on how things evolve, your standard may need to be revisited or changed over time. So I know that's a lot of information for today. We will get into the meat of the course starting with the next unit talking more specifically about individual instructional methods. Great to see you, take care.