[MUSIC] >> Gavin, we've been talking a lot over the whole course, weeks one, two, three, and four about assessment and about multiple choice. And in these two weeks, you you talk a bit more about subjective assessment. Well, that sounds pretty dodgy - "subjective assessment", because we've had all this objective assesment. So, what are the virtues, if you like, and what are the cautions around this whole notion of subjective assessment? >> Well, I suppose the only thing really objective about multiple choice is the agreement of what the answer is. It's very subjective in the phrasing the question, choosing the answers. So, there's subjectivity and quality decision making that goes into designing those tasks. What's strikingly different about the more subjective things is that the marking of them is much more labor intensive and involves much more interpretation and weighted judgement. And that weighted judgement went into the design of the objective questions. But then has to be exercised in the open ended questions, in the scoring process. And the tragedy of being a human being is that we're really bad at scoring. We're really terrible at it. So many studies have been done giving university professors essays to mark and then three weeks later giving them the same essays to mark again, and you're lucky if people get a mark within three marks out of twenty compared to the first time they got marked. >> Seriously worrying. >> Yeah, higher education marking is you know a lottery in some cases. So, the problem is as humans were easily distracted, by environmental things, like, well, in the west everyone will tell you about how they're constantly interrupted by their cell phones and their texts and their emails and their twitters. And it's when you're not paying attention or if you're paying attention to the wrong things, you just make poor quality judgments. That's why they ban answering your cell phone in your car while you're driving, because next thing you know you're not paying attention. >> So, when it comes to schooling, the workloads for marking, I remember being an English teacher, and the workloads were often very large because we would mark our own class's work and then if it was a mock exam or an assignment that we count towards a final grade, we would then switch marking - so, I have to mark another class's. Now, that was fine if you're a math teacher and you just go "Tick, seven, nine, seven." But, when you actually have to read something the student's written and make a weighted judgement as to its merits, that's a complicated thing. It's not just a question of judging which one's taller or shorter, or which one's heavier or lighter - you're asking on a multi-dimensional scale, what are the various virtues and qualities of a piece of student work. And this is just difficult to do. And, we have mechanisms that improve the quality of our marking, like guided rubrics or exemplars, that show us what quality looks like. But, even with those tools, it's easy to get distracted when you're tired and it's 11 p.m and you have to be up at whatever and you're still marking and the boss is saying the marks have to be handed in tomorrow at 8 a.m. and you're only halfway there - you can bet that there's a huge error component as time goes on. And the consistency of your marking is rapidly diminishing. >> Yes, you trigger a thought there, if you know the research in to judges giving sentences. They give tougher sentences in the morning and then I think in the afternoon when they're tired-- >> Or maybe they've had one too many during lunch! >> Absolutely, that's part of it. So, the answer here is don't have one too many when you're marking. >> Oh, definitely not. The other thing is there's a wonderful study that I came across at last year's A-E-R-A conference by a young doc student in Vancouver. And he showed that if they could make the teacher feel sad or unhappy, the average marks that they gave were lower than if they could make the teacher feel happy. >> So, you know, having happy, smiling kittens around you before you start marking, might give your students more credit than they uh-- >> So, we have a proposal here. >> Yeah. >> Happy, smiling kittens when you're doing the marking. >> Yeah, I mean humans are-- we're soft-willed beings, we're easily led astray. And then this is the tragedy, we're making decisions about children's lives that matter to them, both whether it's for an external qualification or a grade that makes a difference for promotion or for streaming, or simply to be able to identify, "What feedback should I give this kid?" It's a difficult, demanding, intellectual task to weigh up the merits. One of the reasons journals invite authors to review is because you've jumped through the hoops, you know how to get published, so you must understand what it takes, when you read someone else's manuscript, what it takes to get over the line. Maybe part of the problem is that as teachers, we're not doing these things that we're asking children to do, enough ourselves? Can you really teach and mark student writing if you're not writing regularly? Should you be teaching literature if you don't read regularly? I mean, you know, it just seems to me that if you're going to be any good at marking it, you have to be kind of good at the thing itself. >> Yeah, there's some very reassuring kind of messages in there and some kind of slightly worrying messages as well. I mean about this whole business of assessment, which you've obviously devoted a great deal of time in your life to exploring, and I think the whole course really gets down to some of those really key issues and then actually assesses people at the same time. One of the things it does, of course, is this three people marking an essay. And, there you're looking for, I suppose, inter-rated reliability. >> Yes. >> I've just come from a Coursera conference in London, actually, just two days ago, and what they were saying was that actually students marking one another's essays, subjective marking, proved to be just about as good as when the professors mark them. I kind of sat back a bit and thought "That's... Hmm... a bit surprising" - is that surprising? >> The toughest marker in the university is the newly-minted grad student, and the professor is usually much more relaxed. So, if the students take seriously the content and the standards that are being promulgated and they're writing an essay task around that and they've engaged with it, they should be able to judge each other around those criteria. And if the content is reasonably - well, I guess it depends on the design of the task - if the task lends itself to a fairly clear description of what it is you have to do in this essay, then it's much easier to mark, than if it is very open and broad, "Reflect on and give your personal opinion on", whereas if it's more of an academic display of knowledge or interpretation and understanding based on evidence, then it's much easier to mark in a similar way to the teacher. The experts. Because it's more constrained. I think is Shakespeare art? I think most people agree that Shakespeare is art because we've been told it's art. But, when you actually comes down to reading Shakespeare, we might have a wide variety of opinions, as to whether it's good. And you know, the same with judging movies and music, so much more subjective, so much, less concrete in terms of the standards we're going to use. >> Yeah, i just watched the Wolf Of Wall Street and wondered why my my judgment of that move is so at odds with-- >> The critics? >> --with the critics' judgement, for example, yeah. >> Or, the other example that you give, which is the nice one about subjective judgement, is figure skating. >> Yes. >> Elaborate a bit on that. >> Sure, well, Olympic judging has to happen in the moment and on the fly, so they only do each each sub skill in their routine, whether it's diving or figure skating, in a very short period of time. So, the people sitting in the panel have to very quickly decide on an impression basis, how good was this? And, because there's the tension and the probability of bias, they simply cancel out some of the bias by, removing the highest score and the lowest score, so that if a judge is being bribed to give a high score or a low score we just get rid of it. So, that gets rid of some extreme responses. And then, there's a panel of five or seven judges left over and we average that. But, we don't just rely on one set of scores - in figure skating, there's technical merit, and artistic, but it's also multiple performances. There is compulsory figures as well as free, and so what you get is multiple sets of data judged multiple times by multiple experts and by the time it gets to the end, you have a pretty good idea as to who is actually best and the difference between first and second is often in the hundredths of a point. But, it's consistent enough across all of those instances to lead to a reasonably robust judgment. There have been cases of cheating and at least the Olympic committees have found it and disclosed it. The Salt Lake City's figure skating comes to mind when Canada was cheated out of the gold, but we got it back after we found out about the cheating. But without those multiple judges, and multiple instances, it's hard to really say very much about the quality of work. The generalizability theory research people have shown that to get reliable scoring of student writing, for example, you need anywhere between three and five pieces of writing, judged by anywhere between three and seven judges. And, in one medicine school study, they found that you needed a minimum of three to four hours testing, before you could make a reliable judgment about a student's competence. So, these are difficult skills to judge, that if we're going to use them robustly to make big decisions, we better be pretty confident of the quality of the judgments we're making. >> Yeah. >> And, we used to-- in my high school teaching, we used to be satisfied with two teachers marking. You mark your class and I mark your class, and vice versa. And we used to mark our student essays out of 20 marks and generally we had a rule of thumb that said if we were within three marks of each other out of 20, we just split the difference. And, if we were within only one and a half marks, we always took the higher of the two. But if we were more than three marks apart than we would have to argue why it was higher or lower than the other person suggested. And we would compromise after listening to the insights the other marker had and we found that in general because we had contributed to writing of the questions and we had contributed to discussions as a teaching team as to what good quality looks like, usually we were only discrepant on 25%. Now, 25%, is still a lot of work to go over and debate and you can get really passionate about it - "No way is this anything higher than a bare pass", and the other person thinks it's an amazing piece of work, and you go "No, no, no, no. It hasn't done this, hasn't done this", but she's going "It's done this, it's done this." And this is the wonderful debate where we learn what to value and how to value it and how to identify it. I suppose, the tragedy for children is we give them marks while we're still learning. >> You know, you go to a physician and he says, "How long have you been in practice?" You know, "Wait a minute, I don't want you to practice on me. I want you to be good." And then teachers are still in practice in a sense, and that's a sensible position to take. We're in practice. That means, I'm better this year than I was last year and I'll probably be better in two years than I am now, if I keep paying attention professionally to what I'm doing. >> Yeah, that's not really reassuring for me being in your class, however. >> Well, fortunately my marking is cross-checked by someone else. And then the final decision is given on the consensus of those two markers. So, in that case, there is some protection for the individual. But, yeah, it's the "What if I got the weakest teacher in the school?" effect. >> Mm-hm. >> And that is a real tension. Johnny's parents don't want him taught by the weak teacher. Johnny wouldn't want to be taught by the weak teacher, and I don't want to be the weak teacher. But somebody's the weak teacher in my school, and so, it's up to us to be professional, to say "Well, how can we build systems that help compensate for our competencies, or weaknesses in our competencies." But that, the scale of our competencies are not so severe that we shouldn't even be teachers in the first place. Most countries have entry standards and entry requirements and if you can't pass those, you don't get to be a teacher. >> This is where in a commonwealth context, of course, the whole thing becomes extremely fraught because we have such a range, from expert teachers from Canada and New Zealand, who are at the very peak of their profession, and at the other end you have teachers in some of the African countries, who have no qualifications, who have 150 children in their class. We're trying to do all of that, so that is our constituency. These are the people these courses are for and they come online and they-- But the wonderful thing about the whole program is that people who are in say Ghana or in another African country and who are struggling. don't necessarily get the best advice from the people in New Zealand or Canada, but from people who are in similar situations, and are able to say to them, "Now, have you thought of the small things that you can do in your context?" But, I think everything that you've done and talked about here, Gavin, on this course, is going to be extremely provocative, I think, in the best sense of the word, for the people who are taking this course and I think it's an extremely valuable course to have and the insights from these discussions that we've had, I think you're going to contribute immensely. So, thank you very much indeed - a) for agreeing to do the whole course, but for allowing us to have these kind of conversations. >> Sure, you're welcome. It's been a pleasure. [MUSIC]