In the last video, I gave you some motivation for why it is important to use data standards, and what they can do for us. In this video, I will expand this further, by going through some theoretical and practical concepts, that are central to how and why standards are developed. And introduce to you how we can find out some more about specific ones using the Coursera platform. In the last video, I motivated the use of standards and their benefit beyond just data exchange. Adopting standards can help in all steps of the data management pipeline. As you see in this slide, standards can help with, with your storage and retrieval needs. Specifically, the question of how you can represent and store your facts, and how to aggregate, retrieve, and meaningfully analyze them. It turns out that this is not a straightforward task. Which is due to many reasons, including the sheer volume and complexity of bio-medical knowledge. these two questions are indeed the focus of a very active area of research in medical informatics. Before we proceed any further, I would like to expand on these concepts and provide some basic definitions. First, I will define the term, terminology. Terminology is an organized set of terms in a specific subject field, whose meanings have been defined or are generally understood in the relevant field. Without terminology, can you make sense of the following responses in your database of emergency department cases? The clinic staff may have recorded in the database the trauma cases of some of the patients that they see. For example, broken left shin, fractured right femur, non-displaced fracture of the left lateral malleolus, which is a way to say broken ankle, etcetera. is that data usable for research? Without terminology to constrain your data, your database simply has lumps of text that need to be read by humans to make sense for analysis. Had these findings been recorded using a standard set of terms with agreed upon meaning, then this data would have been much easier to tackle, on a larger scale. Before I go any further, here are some additional definitions. Vocabulary is a, is a terminological dictionary which contains designations and definitions from one or more specific subject fields. A nomenclature is a terminology structured according to pre-established naming rules. For example, you, you need one term from column A and one term from column B. LOINC is an example of a nomenclature. It, LOINC stands for Logical Observation Identifier Names and Codes. It's a universal code system for identifying lab and clinical observations that we will describe in more detail later. A formal distinct and unique six part naming is given to each LOINC term. And so each term, each term, each LOINC term is presented with a six part name. And these parts refer to a component, which is the measured, evaluated or observed entity. The, a kind of property that is characteristic of that term and etcetera. interface terminology is a systematic collection of related phrases or terms that support human entry of information into the computer programs. Reference terminology is a terminology where term each has a formal definition designed for data aggregation and retrieval. Formal turn up, terminological systems that are present concepts with a set of symbols and rules that create a structured and coded system that is computable. Formal and structured rules enhance internal consistency for terminology and facilitate its evolution and maintenance. A classification is an arrangement of concepts into classes and their subdivisions to explore the semantic relations between them. The classes are represented by means of a notation. So you can have drugs, you can have people, you can have, different types of things that belong to these different classes. Granularity. The granularity of a term is a measure of its specificity and refinement. Multiple granularities are needed for multipurpose terminologies. A subsumption hierarchy is an organization of terms into types sub-types, sub-sub-types, etcetera, for the purpose of making generalizations and specializations explicit. For example, you, this refers to whether terminology has consistent semantics for a parent child relationship. And it's The is a relationship is usually an indicator that this is a, what's called a strong taxonomy. When learning about controlled medical terminologies, it's always good to reference the Desiderata for controlled medical terminologies article by Jim Cimino. It makes a good case for the following desirable attributes needed in controlled medical terminologies. First, content. Its adequate expressiveness is critical, you need, you need concepts that cover the entire meanings that you are trying to cover in this domain. Content must be added using a formal methodology in order to prevent creating a patchwork of terms with inconsistent granularity and organization. Concept orientation means that terms must correspond to at least one meaning. That's called non-vagueness, and no more than one meaning, that's called non-ambiguity. Meanings must correspond to no more than one concept. That's called non-redundancy, and each concept in the terminology must have a single coherent meaning. Concept permanence. The meaning of a concept, the meaning of a concept once created is, should be permanent. It's preferred name may evolve, and it may be flagged as inactive or archaic, but its meaning must, must remain the same. Once you use a term to define a concept, you should not use that same term to, to mean something else. a terminology must have unique concept IDs that are free of hierarchical or other implicit meaning. Hierarchical arrangements of controlled medical terminologies is necessary to locate concepts, group concepts, and convey meaning. Ma multiple valid arrangements of concepts serving different past processes exist. Agreement on a single essential hierarchy is unlikely and unfortunately not necessary. Formal definitions mean that the term, that the terms needed to be also represented in, in a form. Form of, form of terminology, means that the terms need to be also represented in a form that can be manipulated with a computer. And not just in narrative text definitions whose audience is human human readers. Reject not elsewhere classification. This is bad practice. Catch, catchall terms can only be defined by exclusion. As the terminology evolves, the meaning of not elsewhere classified would also change. Additionally, not elsewhere classified can never have formal definition. Therefore NEC, or not elsewhere classified cannot be considered a valid term in your terminology. As we mentioned, the granularity of a term is a measure of its specificity and refinement. For example, diabetes mellitus is a more coarsely granular, is more coarsely granular than diabetes mellitus type two. Multiple granularities are needed for multi-purpose terminologies. Poly-hierarchical terminologies with multiple levels of grani, granularity must not permit inconsistent views of a concept. controlled medical terminologies should include formal explicit information about the concepts. about how the concepts are to be used. The content and structure of controlled medical terminologies must change over time to handle additions, refinement disambiguation, obsolescence, discovered redundancy, and minor name changes. clear detailed descriptions of the change, of these changes are necessary. Synonymy is a type of redundancy in a terminology that is desirable because it helps people recognize the terms they associate with a particular concept. Because synomony maps to a single concept, the coding is not, the coding is not considered redundant. Here are some additional concepts that are a subject for more in-depth study of terminologies. They are beyond of this, beyond the scope of this course. In short, a terminology must be able to distinguish when two items are the same, that's called identity. Terminology should also be able to distinguish precisely, should, how two, how two non-identical terms are similar. This is where poly-hierarchical representation, formal definitions, classification, and subsumption, subsumption concepts that we talked about come into play. Post-coordination and pre-coordination, this refers to the ability in a terminology to bring in complex concepts from different levels of details and compose them together as needed. From fun, from more fundamental concepts. So in post-coordination, you would have the concept left femur fraction is composed of the concept left, femur and fraction. this I'll, this gives you more flexibility in choice in what you can represent. The rules of how things are related to each other are implied. And you'll, you will need to define exactly how they are. How things are defined. This may be inefficient, and may allow you to compose things inappropriately. However, the dine, the downside, that's, these are, on the other hand you can have pre-coordination, is, is another way of doing things. And in pre-coordination all levels of details are modeled as distinct concepts. So you would have one concept in your terminology that says, this is a left femur fraction. Another one would, would say right femur fraction, left tibial fraction, right tibial fraction, etcetera. These are all distinct entries. pre-coordination offers no flexibility. You have limited choice, and you need to say explicitly how these four concepts are related. So there's no way for a computer or software to come without you explicitly telling it that these four concepts, left tibial fracture, right tibial fracture, that these are all, for example, fractures. unlike what you have in post coordination. semantic and syntactic interoperability. Syntactic interoperability just means that, when you are sharing data, the structure of the data, the syntax of the variables and what you code in the data is the same and can be consumed by the other machine. However, you also sometimes need to address the meaning that's inside the message, and that refers to the semantic interoperability. So you will, you standardize terminology with semantic interoperability, enables computers to utilize, utilize clinically meaningful information. Finally, if you want to go dive even deeper into knowledge representation by medicine, I recommend the articles list, the article listed here. In this article, you can also learn about the Ogden-Richards Semiotic Triangle of meaning. Which basically describes the relationship between symbols, concepts which are products of human thought and the actual things in the world. And how they are may not always be one to one correspondence between the three components of this triangle. So the take home message from all of this terminologies need to make sense of your digital content. Otherwise, all your information systems are just electronic filing cabinets as my mentor Doctor Steve Brown would say. another lesson is that simple lists don't work. You can't just enumerate the things you need to represent, or they would work on small scale. Some form of, of computable structure, as is typically provided by standard terminologies that we have, is needed for your work to scale. You need structure to link terms and expressions that mean the same thing. You need structure to avoid the introduction of duplicate concepts and to aid and to validate the data creation. Finally controlled medical terminologies, like software, must be designed to support selected tasks based on functional requirements. So which standard should I use? And the answer like with many things in this course is, it depends. What are you using the data for? do you, are you using it for a clinical trial? Do you plan to submit the data to the FDA or some other agency with known standards? What is the source of the data? How are you, what are the types of, how does the data look, like when you're combining it together. Are you pulling data out of lab machines physician order entry systems that that have a lot of drug information in them? Are you using the EMR, with diagnosis codes for example, like tonic data capture? patient reported is, is, are you capturing data and surveys? are you building your database once, for one study, or are you building a database that you plan to use to support multiple, even unforeseen studies in the future. In that, in the latter case you would want to have, to, you'd want to use a, a known, an existing standard terminology. what resources do you have? It, some complex terminologies require access to programmers. you will be building complex data extra, extra, complex data extraction systems. And sometimes you might even be using natural language processing, like we said earlier, to find the facts in your text. are you doing manual data entry? Because that could be very cumbersome if, if your data entry personnel are relying on very large complex coding lists, etcetera. So, in this course, we're going to talk some detail, we'll cover some of the known standards in clinical research in more detail, and for that we'll use the wiki. We're going to leverage the interactive nature of the Coursera platform. And here are some of the standard terminologies that we will cover in more detail on the wiki. CDISC is a very commonly used one, and CDISC acronym stands for Clinical Data Interchange Standards Consortium. And it's, it's actually composed of a suite of, of standards. The CDISC mission is to develop and support global platform independent data standards that enable information system interoperability to improve medical research and related areas of health care. So, some of the standards in that suite of standards includes CDASH which is used, which defines a minimum set of data collection for 16 domains. it harmonizes element names in, for data entry. And with their definitions and metadata. The objective establish, is to establish a standardized data collection baseline across all the sub, submissions. Further down in the data management pipeline, there's the SDTM, standard. Which stands for Study Data Tabulation Model. It's the model recommended for FDA regulatory submissions since 2004. for statistical analysis, ADAM is a subset of CDISC. that is needed to, it is the standard for analyzing. it provides a data model for data analysis, that's what ADAM stands for. and it provides a detailed statistical analysis, a detailed statistical analysis preformed on a clinical trial results. ODM is an XML based model for data exchange. These are all within CDISC and we'll talk about them in more detail. LOINC stand for Logical Observation Identifiers Names and Codes. Health Level seven is a messaging system that a lot of the electronic systems in, in health care use. to send information around, the, International Statistical Classification of Diseases and Related Health Problems, ICD. there are existing. Existing standards based on that the ICD-9, the ICD-10, the ICD-O-3 which is used for tumor registry. It's a medical classification list developed by the World Health Organization. It cause for diseases, science and symptoms, abnormal findings, complaints, social circumstances and external causes of injury or disease. So we'll talk about those in, in more detail. SNOMED is an acronym for Systematized Nomenclature of Medicine, clinical terms. It's a systematically organized computer, computer process-able collection of medical terms, providing codes, terms, synonyms, and definitions. Using clinical documentation and reporting. The primary purpose of SNOMED-CT is to encode the meanings that are used in health information and to support the effective clinical recording of data with the aim of improving patient care. So it's, the main thing about SNOMED is it's, it's computer process-able, and it provides you with a lot of explicit relationships about the different concepts in there. And we'll talk about many others, so we'll provide more details about that in, in, elsewhere.