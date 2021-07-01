Welcome to the video on data analytics, transforming legal studies and research. This video is the third and last in the series on a tale of three axis, Law Digital Age. We're joined by our esteemed guest Dr. Ittai Bar-Siman-Tov from the Bar-Ilan University in Ramat Gan Israel. He will take us through the third axis, interrelationship between the digital age and the law, and explores how data science and AI are transforming legal studies and research. As we shall see, an output of Data Science methodologies impacts how we studied law or conduct empirical legal research. After watching this video, you will be able to explain how data science and AI can change Legal Research and Legal Studies. Dr. Ittai Bar-Siman-Tov, welcome back to our video. How exactly are data science and AI relevant for legal research? As we explained in the previous video, law has a lot to say on how to regulate AI and other technologies. Therefore, there is much room for normative legal scholarship in this field. Yet in this video, I want to focus on how data science and AI methods can contribute to research about the law. In particular, how it is relevant to the empirical study of law. Now, traditional methods of processing and studying legal information are based on close reading and allowances by human readers. This is very costly, time-consuming, and highly dependent on legally trained experts. As a result, empirical legal scholarship has traditionally been dependent on costly and complex human annotation. This endeavors are therefore usually limited to relatively small samples. Automated analysis tools, in contrast, can enable collecting and analyzing massive amounts of legal documents quickly, easily and relatively cheaply. For example, instead of studying case-law or legislation by close reading of each legal document and it's analysis by human readers, data science methods can enable analyzing the entire corpus of all judicial decisions by all courts or all laws in a certain jurisdiction. This may allow taking a much broader perspective by an entire legal system to give, but one example, the Eurostat R Package facilitates access to vast amounts of EU law data for researchers. To get a sense of what I mean by vast amounts, these package included as of 2020, approximately 52,000 decisions, 4,000 to 300 directives, 300 and 150 recommendations, 140,000 regulations, 5,000 international agreements, and 31,000 court rulings. The combination of massive amounts of legal data available in digital format with rapid advances in data analytic tools provide endless opportunities to analyze the law in ways that were unimaginable in the past. The application of AI in data science methods therefore holds immense potential to improve the quality and efficiency of collecting, processing, and analyzing legal data. This enables conducting large-scale research projects that were practically impossible before the digital revolution and to generate noble insights on the law. It has the potential to revolutionize legal research. Wow, those numbers do speak for themselves. But you're talking about potential, don't you think that this revolution is all ready here? Well, I think this revolution it's incipient stages. To be sure, there are AI scientists and legal scholars who have been stressing the potential synergy between AI and law for some time now and in recent years, we see an explosion of interest in this field. But generally speaking, for a variety of reasons, law has generally lagged in its use of AI and machine learning compared to other domains. Until quite recently, there have been relatively few legal studies employing AI methods. Until, I'd say two or three years ago, scholarship about the use of AI and machine learning in legal scholarship is consistently described, it is largely uncharted, new frontier. In the past five years, we do see important advances and dramatic rise in studies using various data science tools to study the law. But I think it would be too early to say that these developments have all ready revolutionized the study of law, it's more accurate to say that they have the potential to do so. Fair enough. Can you give us some specific examples of legal studies that employ data science tools, perhaps from your own law datalab. Yeah, sure. Maybe I'll mention two projects out of many that we're currently conducting. So, remember that in one of our previous videos, we talked about the problems of terms of service and privacy, policies of mobile apps that nobody reads, but can have potential very important implications in our rights. We have a couple of research projects in our lab. It gives various data science approaches to deal with this problem. In one project, Ayala Barzialy, Alon Singer and I, together with two computer scientists, Jonathan Azaria and David Sarnoff, used unsupervised machine learning approach to analyze privacy policies, our caller, went to over nearly 870,000 apps in Google Play Store and we generated another dataset of privacy policies. After various process of filtering, clean up, and segmentations, we were left with over 0.5 million paragraphs, which we analyzed through topic modeling. Through this process, we were able to identify a more comprehensive and nuance list of privacy policy topics compared to previous words. Our mostly automated methodology has significant advantages compared to many classification and supervised machine learning techniques that were previously applied to privacy policy analysis. It requires considerably less effort, making it practical and scalable too for analyzing large, dynamically changing legal corpora, such as privacy policies that naturally change all the time. Well, that sure does sounds exciting and the numbers again speak for themselves. What's the other project you wanted to elaborate on? The second project with the [inaudible] , again, we work with a data scientist called [inaudible] , and we tried to teach an algorithm to identify ethical problems in terms of service and privacy policies of mobile apps. Here we use a different approach. Instead of unsupervised, we actually use supervised machine learning approach that teachers an algorithm to emulate the work of a human qualitative analysis. As a newbie, can you explain the difference between supervised and unsupervised machine learning? Sure. I'll make it pretty simple, I hope. Simply put, in supervised machine learning, human coding is used to teach a machine to replicate familiar manual coding task. That is, at the first stage, human coders code a sample set of the corpus and classify it into several categories. Then the coded documents are used as a training set for training in automatic classifier. The "algorithm" learns how to sort documents into categories using the training set, and after validating and reaching an acceptable level of accuracy, the algorithm classifies the remaining documents. This is supervised machinery. Maybe to give a simple example, not of a legal text. One example like the data that scientists uses is for pictures. How do you teach an algorithm to recognize a chair? It could be difficult to tell them if it has four legs because many chairs don't have four legs. It might be problematic to say if it has a back because some don't have a back, etc. What they do, they have humans offer thousands and thousands and it may be more pictures and just to classify whether this is a chair or this is a tractor or a car or whatever, and then all those annotated pictures that were classified by humans are fed to the algorithm, and the algorithm learns to identify what is a chair and whatnot and then it receives new pictures that it had not seen before and try to emulate the human classification based on the new corpora that was given to it. This is supervised machine learning. In unsupervised machine learning, in contrast, no preliminary human coding is required. Unsupervised learning methods, the underlying features of the text without a manually coded training set or predetermined categories. Instead, unsupervised learning methods use modeling assumptions in properties of the text to estimate a set of categories and simultaneously assign documents to these categories. Wow, that picture example really did make it more concrete for me, and it had me thinking about the one example that Google used I think for wolves and distinction between a wolf and a dog that looks like a wolf, so it really hit home with me. You talked about using data science and machine learning in legal research. What about legal education? Do such data science methods have a place in the legal curriculum? I think definitely. Remember that in the previous video, we talked about how data science and AI transform legal practice. Law students need to learn these tools even if they aspire to be practicing lawyers rather than legal researchers in academia. In my faculty, for example, in addition to courses and seminars and law and technology, we offer a course on legal analytics and a course on data science for lawyers. Of course, the goal is not that the law students would be trained as data scientists themselves, but instead, the goal is to give them the basic understanding so they could be able to use simple off-the-shelf packages and tools to conduct basic analysis themselves, or that they would gain sufficient understanding to be able to work with data scientists. I could not agree with you more on that point. I do agree with you that it should have a place with legal curriculum as well. Thank you so much, Dr. [inaudible]. It was a pleasure to have you here, and I am sure will be reading a lot about your research and as well, maybe who knows? We'll do a second MOOC on legal analytics, which already has my vote. Who knows? You have now learned about some of the ways that data science and machine learning could transform legal research. This video has given you a first taste, a tip of the iceberg. To get a broader and more detailed overview on the use of computational methods in legal research, please read Frankenreiter and Livermore, Computational Methods and Legal Analysis in the Annual Review of Law and Social Science number 16 on page 39- 57. Then answer the brief quiz below. By completing this video and completing these assignments, you have completed the first module about the three-axis in the relationship between law and data. Think of the themes discussed in this introductory module as you learn the next modules. In the following videos, we will touch upon certain topics raised by Dr. [inaudible]. Thank you for joining us for this series on the tale of three-axis law in the digital age, and we will see you in the following module.