Welcome to week ten of our medical software class today we're going to finish off the medical software life cycle. We're going to spend most of our time talking about validation but then we'll have short segments on the bargain of the cycle, deployment, maintenance and retirement as well. So the calls of this week's class are to talk about software validation, will introduce clinical trials. We talk and then we'll have some short segments at the end of talk about deployment, maintenance and retirement. Is that the bag ends of the medical software life cycle? So we have five segments. The first two will do an introduction to validation and then an introduction to clinical trials. The third segment to give some concrete advice for creating a validation plan and then we'll do the back part of the cycle and segment four you talk about deployment. And segment five will at least highlight some issues involved in the maintenance and retirement of software. All of those last topics, deployment, maintenance treatment are huge topics in their own right but really beyond the scope of the sanction introductory class. This week's material is based on chapter 15 and 16 of the textbook and we've been following this throughout this course. So let's start with validation and we're going to talk about the process of validation. Just remember that validation is the process of establishing that our software meets the needs of our user. Not that the software works correctly, but it meets the needs of the user. So working correctly is a prerequisite for validation but not sufficient. It is necessary but not sufficient fundamentally we're interested in does this software allow a user to do what they need to do, not what we thought they needed to do. That's verification but what they actually need to do that is validation. So here's the V-model and we're climbing up the back end of the V. And we're here with validation and this corresponds to the needs of our user captured in use cases. That is V the symmetric concept of it. Again, if we go back to the FDA's own mission statement about efficacy, safety and security, I bought their efficacy because that's going to be the emphasis here. So in this lecture will focus on efficacy when it comes to safety and software primary considerations usability. And we're going to talk about that in week seven. When it comes to security a lot of the interesting work, a lot of the efforts of the problems come from cyber security and we discuss that in week three. So primarily here will focus on efficacy. What a certificates mean? Well, if this is the first of its kind devices which can software efficacy offense means good enough better than some kind of standard of care that is currently in the field. If there's a competing technology efficacy means better or just as good but cheaper or requires less skill to use. And these terms are subjective quality and we have to figure out how to make numbers out of them to have quantitative criteria so we can demonstrate that. So that's the fundamental problem to prove efficacy. What we really need our objective quantitative criteria. Objective meaning that they would stand an external evaluation and they have to be quantities you can measure something and demonstrate efficacy. The key discussion that we're going to follow here comes from this document created originally by the International Medical Device Records this forum on the clinic evaluation of some software is a medical device. This was issued in June of 2017 and reissued by the FDA in December of 2017 was just an additional coverage the same exact document. And this is where the guy that this is where our discussions coming based on this is the guidance who will follow. The guidance defines clinical evaluation is having three components valid clinical association, analytical validation and clinical validation. So let me summarize what these three things are now and will pick them up each in its own slides. Valid clinical association means we measure something relevant to the disease. If our software measures tumor volume and are diseases brain cancer, we need to demonstrate that tumor volume is somehow related to brain cancer. Okay and this may sound obvious year. But imagine if we're measuring tumor shape, we have demonstrated that that predicts something. Analytical validation means that are so often does what it does well for measuring tumor volume we're measuring it accurately. That's analytical validation and clinical validation means that the thing we measure helps our clinical users do something. Do something better than they were able to do before. So we're measuring something relevant. We're measuring it accurately and what we're measuring helps our users. That's the three pillars of this document here. So let's take them each one at a time. So valid clinical association and the table you see above is straight out of the document, a screenshot of the document. The question asks is, is there a valid clinical association because your SaMD's our software output and your SaMD,s targeted clinical conditions? We have to answer the question how much meaning and confidence can be assigned to the clinical significance of the SaMD's output. So what does this process involved? It may be something as simple as literature searches. We're measuring tumor volume. We want to look up and say to demonstrate that tumor volume is a good predictor of outcome in cancer, it's irrelevant. We may have to do some original research. We have to look at professional society guidelines that establish that these things are true. And we may need to create new evidence in terms of secondary data analysis using existing data journalism, new results. Or you may have to go all the way to performing clinical trials to demonstrate the thing we're trying to measure is actually relevant to the disease. So this is valid clinical association. The second step is analytical validation. And the question we're asked here is does your SaMD correctly process input data to generate accurate, reliable and precise output date? So does your SaMD meet technical requirements. And this is sort of the software of correctly constructed in MRS specifications and confirms the needs of the user. And the evidence generation for this tape is the standard process of verification and validation, right? What we've always talked so far the disaster is accurate and meets the needs of our users. The third step is the clinical validation. This is the final step of the process. Does the use of your SaMD's accurate, reliable and precise output data achieve your intended purpose in your target population in the context of clinical care. So we have to answer the question. Does your SaMD achieve the intended clinical purpose? And part of this involves showing that the software's been tested in the target population for your intended use and that the users achieve clinically meaningful outcomes. So that final hurt release not only are softer producing relevant output, not as the output accurate, but let's actually help our users produced clinically meaningful outcomes. Does it help with the diagnosis of patients? Does it help with the outcomes of patients ultimately, that is a final hurdle. So let's take an example for computer-aided diagnosis for computer diagnosis. We have two basic types of validation testing. We have standalone testing where we have valued the performance of the software. So in computer-aided diagnosis let's say we have an image, we're trying to detect whether we see cancer in that image. We have a piece of software that says okay the probability of having a cancer in this image is some percentage would highlight some cancer. This is standalone testing. It gets using a computer-aided diagnosis when it functions as an aid to an expert. So we get the result from software an expert users that as an input or as a side by side or as a second vote to their own diagnosis. And there we have the user plus system testing that is actually clinical performance. So the next step here is reader the user plus software testing. This measure this performance of the solution as supposed to the software alone. And to discuss this we have a clip from an interview from Dr Nicholas Patrick of the FDA that's going to walk us through this type of example. >> We're again talking about a computer say detection to assist device. So the clinicians still involved in the diagnostics, we would follow a similar path way we would do that standalone testing. Because what we want to do with that testing is to benchmark the performance and make sure it meets its performance criteria. So there's some criteria that set out, we think it can produce this amount of sensitivity or this number of false positives and meet those criteria. So we do the standalone testing the benchmark performance. The other reason we do standalone testing is because when you put it out in the clinic, what the clinicians want to know is how well should I expect this to perform. How many marks should I expect to see on a case or what's the sensitivity specificity of this device? The second phase is then that clinical assessment and what we want to do is now is assessed not the device itself, but the clinician, how well does the clinician perform when they use that device? And so again, we would use as we talked about before usually a type of multi reader, multi case study design. Where we would acquire data, the clinicians would read it without the aid and then with the aid. And that could be in different sessions, it could be within the same session depending on the type of AI that it's being implemented now and the actually how it's supposed to be applied. Is it a second reader is a concurrent read. Is that a first read type of device and so forth? And then once we do that clinical study, the goal would be to show typically that it's better than standard of care that's happening now. So how well do those clinicians perform in standalone care and when you add this age did they do better than that standalone care in a statistical and clinically significant way. Again, those are the two main pieces of the types of studies we would do. There's lots of details and nuances of how you would set those up and what's appropriate the references and how you true things and things along those lines that are really important. But the basic concepts are those two basic studies are used as the main data sources for determining whether it's a safe and effective products. >> The final concept will touch on here is this concept of independence of review and this is a title of a slight graphic that comes from the guidance and it is the highlight here. The important point is it's best for software to be valued by team independent of the team that created. This assures that the reviews perform appropriately. It mimics really the perspective of a user and a key consideration here is that the reviewers should have nothing to gain from the success of the assumption. This should be often a separate company and get contracted out of the hospital has contracted to do the evaluation. They have nothing to gain if this succeeds or fails and this gives you an objective assessment. Well, the graph shows is remember that the four risk categories from the IMDRF. Classes 1 to 4 in the independence of review gets more important as we get towards its higher classes and we get to situations where it is more critical for the patient. It's more useful. It's more important to have this kind of extra level as the software is more critical. Finally, we're going to talk a little bit about the issues to do with artificial intelligence machine learning. There's some extra complications here in we have to do have independence of performance evaluation for this outcome. So what this means is that the data used for testing should ideally be completely different from that used for training a model. Ideally the test should also be separate entities, not the same people created the model and they should bring their own data. And we're going to have a lot more to say about this in the next weekend when it comes to this machine learning and artificial intelligence programs, it is very easy to get artificially inflated results. If one is not careful, we contaminate the data. If the data sets are the same that we use for testing and training, we're going to get artificially high performance but so that's something to be very careful about. And we'll touch a lot more on this in the discussion of those topics in next week's lecture. So this concludes this introduction to validation. In the next segment we'll take a slightly to talk about clinical trials in the organization and the segment after that we'll discuss the creation of a validation plan. Thank you.