In unit six, we're going to discuss the publication process as well as some ethical issues in scientific publication. In this first module, I'm going to talk about the problem of plagiarism. I'll start with an example. There have been a number of high-profile plagiarism scandals in politics. For example, in 2016, Melania Trump gave a speech at the Republican National Convention, that directly plagiarized a 2008 speech of Michelle Obamas. I'm showing you here some of the text from Melania Trump. You can see that it is a direct copy of Michelle Obama's words. Obama said in 2008, you work hard for what you want in life. That your word is your bond and you do what you say you're going to do. That you treat people with dignity and respect. Trump in 2016, says, from a young age my parents impressed on me the values that you work hard for what you want in life. That your word is your bond and you do what you say and keep your promise. That you treat people with respect. The chances that two people independently came up with these same words from scratch are infinitesimally small. Trump has cut and paste Obama's words, and just made a few minor changes, this is plagiarism. Plagiarism is when you try to pass off another person's writing as your own. And I think most people are aware that you shouldn't do this, that it's unethical to plagiarize. Where it gets murky, however, is I think many students aren't completely sure exactly what constitutes plagiarism. In this module, I'm going to try to make that clear. Most people know, at a high level, that you shouldn't be stealing large chunks of text from others, but plagiarism can be more subtle than that. Cutting and pasting a sentence, or even a part of a sentence from another source, and putting that into your own work as if you've written it, that's plagiarism. Taking somebody else's work, and cutting and pasting it into your document and then slightly re-arranging the words or changing a word here or there, that's plagiarism. Borrowing descriptions or definitions directly from Wikipedia, that's plagiarism. When you are writing you need to put things completely into your own words, or if you are going to borrow someone else's words, you need to put that material in quotation marks and cite the source. Many years ago, one of the first classes I taught at Stanford on woman's health. I assigned essays, and I was shocked when I got the first essay back from students. Many students had taken sentences, phrases, sometimes even whole paragraphs from the assigned readings from the course, without quotation marks and without citations. Of course, I recognized all the words since I had assigned those readings, and was very familiar with them. I realized that many students were unaware that taking bits and pieces of other people's words and stringing them together in your essay, without proper citation and quotes, is plagiarism. So I think many students are a bit hazy on the concept. Let me give you a concrete example. This is a hypothetical example. I made it up, but it's representative of what I've seen students do. This is a Wikipedia entry on Ernest Hemingway. It says, Ernest Miller Hemingway was an American author and journalist. His economical and understated style had a strong influence on 20th century fiction, while his life of adventure and his public image influenced later generations. Now, imagine a student is writing about Ernest Hemingway. So they go to Wikipedia, and they start cutting and pasting bits and pieces from the Wikipedia entry and they come up with the following. Ernest Hemingway's thrifty and understated style strongly influenced 20th century fiction. His audacious lifestyle and public image also influenced later generations, this is plagiarism. Notice, that when I made up this version I substituted a few words. I substituted thrifty for economical. I substituted audacious lifestyle for life of adventure, and I rearranged a few things and changed a few other things, like had a strong influence to strongly influenced, but these changes are just cosmetic. These are not my words. These are not my thoughts. These are the words and thoughts of the original author. I could have rewritten that text without having any understanding of Hemingway or of what a thrifty or understated style even means, or what his lifestyle was. In order to write something original about Hemingway, I can't just go to Wikipedia. I have to go to enough sources so that I can figure out for myself, about his style and influence. I have to understand Hemingway enough to be able to generate new prose about him, and that's what you need to do avoid plagiarism. When you're writing about others' ideas or work or you're writing from sources. You need to understand and digest the material well enough so that you can put it in your own words. This means you're probably going to need to go to multiple sources to adequately understand the subject matter. As you're collecting source material, it's okay to cut and paste passages from other people's work to get everything in one place, like in one word document. But put those passages in quotation marks so you don't later forget that those aren't your words. And when you go to write about it, work from memory, never start with someone else's text and just move their words around. By the time you're ready to write, you should understand the material well enough to be able to start from scratch. Draw your own conclusions, come up with your own ideas on the topic. Don't trust that the person on Wikipedia has got it right. And don't just mimic the original author's sentence structure or just re-arrange the original author's words. That's still plagiarism. It's easy to detect plagiarism with Google. If you ever suspect plagiarism, take a full sentence from the suspect material and put it into quotes, and then put that into Google. If Google returns to you that same sentence in another paper, it's almost certainly plagiarized. The statistical probability of two people, independently coming up with the identical or near identical string of seven to ten words is astronomically low. There are common phrases and clichés that maybe four or five words long that are widely used. But by the time that you get to about seven words in a row, and certainly by the time you get to ten, the chances of two people independently stringing together those same ten words in a row are infinitesimally small. In one case, I was editing a review article that a colleague had asked me to look at. She had commissioned it from an expert, and she wanted someone to edit it. As I was reading it I got suspicious about plagiarism for various reasons, and I started doing some sleuthing. I took random sentences from the paper and put them in Google, in quotes, and sure enough, I popped up all sorts of plagiarism. The author had pulled one sentence from this paper and two sentences from this other paper, and another few sentences from a third paper. The article was completely plagiarized. And once I started looking for it, it wasn't hard to sleuth out that plagiarism. I'll share with you another example of plagiarism that I happened to come across. A while back, I was writing a paper on the use of estrogen to treat young women with bone loss due to eating disorders. There were only a few previous randomized trials looking at this question, and I had all the papers in front of me on this subject. And I was reading this 1995 paper that you see here, they had found a negative result. Estrogen in their study did not help the bones of women with anorexia nervosa. And as I was reading through the discussion I read this paragraph, that you see here, in which the authors laid out possible explanations for the negative results. And they listed the explanations with numbers. They said one possibility is this, the second possibility is this, the third possibility is this, and a fourth and likely explanation is this. So that structure actually stuck in my head. A few days later, I was reading through a 2002 paper, and I got a sense of deja vu. The discussion section of that later paper had the same numbered list of explanations. So I rifled back through my pile of papers and found the 1995 paper, and started comparing the two papers. Here's what I found. The 2002 paper was by a different set of authors, and it was published in a different journal. But it was a similar study, another randomized trial putting anorexic women on estrogen, and they also had a negative result. I put the two papers side by side, and I'm showing you here in red the only differences between the 1995 paper and the 2002 paper in that particular passage. Basically they were identical. The 2002 authors just changed a few words here and there. Instead of dose of estrogen, they said estrogen dose. Instead of bone mass can, they said bone mass may. Obviously, those changes are not sufficient. This is still blatant plagiarism. One funny thing is here, I think they were trying to cover their tracks a little so they took out the third explanation. The 1995 paper had four explanations. In the 2002 paper they cut it down to three, but it's still plagiarism. I got curious then, and I went through both papers carefully comparing their contents. The 2002 paper was almost entirely a direct copy of the 1995 paper. The data were different, but almost all of the text of the 2002 paper was copied straight from the 1995 paper. Just to show you a few examples here, I'm showing you the concluding sentences of both papers. You can compare them on your own, but you can see that they are nearly identical. Just one more example from this same case. If you want to look at it, again another example of just the blatant plagiarism in that 2002 paper. To be fair, the 2002 paper was coming from a foreign country, and my guess is that the authors weren't confident in their writing of English. So they decided to take that 1995 paper and use it as a template. Of course, that doesn't explain excuse the behavior, but that might explain what happened. This is a very extreme case of plagiarism, but it's actually published in the literature. Hopefully, modern plagiarism detection software will prevent such extreme cases in the future. Another thing that students are often less aware of is the concept of self-plagiarism. This is recycling your own writing or your own data from one published paper to the next. That constitutes self-plagiarism, and it's also unethical. The problem is that if you're just rehashing old material, why are you publishing a new paper at all? If you have to rehash old material, that means you don't have anything new to say. Plus, you may be violating copyright laws from the journal that owns the published paper. You are not supposed to plagiarize even from your own work. Now, one possible exception to this is that there may be some duplication of text within the materials and methods section. For example, there maybe only so many ways to explain what you did to the cells. So if you've used the same experiment across multiple studies, there may be some text repeated. It shouldn't be a complete copy, but if you explain some of the materials and methods in the same way, in multiple papers, journal editors are usually fine with that. But everything else should be new. The introduction section shouldn't just be a rehash of a previously published introduction, for example. Authors also sometimes engage in something that's akin to plagiarism which is duplication of their old results and their old data. Sometimes an author will take a data set that's already published, and maybe add a little bit of new data, or tweak the data slightly, and present the whole thing as a new paper. This is misleading because it's putting things out there as if they were two independent pieces of information supporting a hypothesis when there isn't. It's padding the medical literature with superfluous papers, and there's also, of course, a copyright issue. So self-plagiarism, also, is not allowed. People have actually tried to go through the literature, and figure out exactly how prevalent is plagiarism in the scientific literature. There was a story in Nature, where they reported that in pilot tests of the plagiarism detection software CrossCheck, some journals found that 6% to 23% of submitted manuscripts had to be rejected out right due to plagiarism. 6% to 23% is a surprisingly high percentage, I think. Now, this includes both plagiarism and self-plagiarism. Another study used automatic detection software, and then they manually confirmed plagiarism in submitted papers. They found that 8% of the papers had plagiarized other people's work and about 3% involved self-plagiarism. So again, we're ranging around 10% of the papers having significant plagiarism in there. That's a pretty big number. Another group of researchers did a study of plagiarism in the personal statements for residency applications. These were doctors who were applying to do their residency at Brigham and Women's Hospital, which is a very prestigious institution. The researchers used a plagiarism detection software to review about 5000 personal statements, and then they confirmed suspected plagiarism manually. They found that 5% of essays had clear evidence of plagiarism. Meaning 1 in 20 of the doctors applying for this prestigious residency had plagiarized. So plagiarism is a significant problem. Be careful in your own work to put everything in your own words. I want to share one last example that's a bit more subtle. I suspect that this kind of plagiarism is more common than the more blatant examples of plagiarism that I shared earlier. So it's worth me wrapping up with here. I was reading a 2009 paper, and I came across this sentence. Recent registry-based and hospital-based studies have documented a statistically significant increased risk of melanoma after breast cancer with standardized incident ratios ranging from 1.4 to 2.7. I was writing about this topic for the lay public, and I wanted to verify the numbers so that I could use them in my story. So I pulled some of the references from the 2009 paper. When I pulled their fifth reference, a paper from 2004, I found the exact same sentence which I'm underlining here. The 2009 authors had copied word for word, and reference for reference, one, two, three and four. Those references are the exact same the summary of the 2004 authors. Now, this is subtle, and to be fair, I didn't find any other evidence of plagiarism elsewhere in the paper. But this is plagiarism, and there are two problems with it. First, the 2009 authors have borrowed the work in exact words of the previous authors. The previous authors went through the literature, summarized everything, synthesized it and put it into words. And the 2009 authors are just stealing that. The second problem is that the 2009 authors are completely trusting that the 2004 authors have accurately summarized the literature. That's a leap of faith I wouldn't take. As I've told you before, miscitations and inaccurate citations are rampant in the literature. I would never trust a secondary source like this. I would always go back to the primary sources, and check the numbers for myself. Plus, the 2009 authors are assuming that nothing new has happened between 2004 and 2009. They haven't bothered to update the literature search. Now, the authors of the second paper could simply have put the copied sentence in quotation marks. This acknowledges that they are borrowing the work and words of the previous authors. It also gets them off the hook with regards to accuracy, because by quoting, you are saying these are the numbers according to someone else. You are not saying that you have checked or agree with those numbers. But even better, what the 2009 authors should have done is to dig through the literature themselves, and summarize the literature de novo, rather than simply copying somebody else's summary.