In order to adapt machine learning approaches in health care applications, and in particular, in clinical decision support systems, it is important to address privacy concerns. In this video, we are going to discuss how advanced machine learning models can inherently expose sensitive information even if this is unintentional. Apparently, identity information remains in the data even if sensitive fields have been removed or pseudo-anonymized. We're going also to emphasize that it is important to consider methods that filter out sensitive information early in the processing pipeline. Deep learning models perform extremely well with correlated data, which contributed to substantial improvements in several fields. In particular, this has been demonstrated in computer vision. Neural network represent a form of memory mechanism, with compressed representations of the training data stored within their weights. This course is an unintended memorization process. It is therefore possible to reconstruct parts of the training data from the algorithm weights themselves. In other words, we should think about it as devising a good model from training data is not necessarily a one-way street. In order to learn a mapping from specific features to corresponding labels, the model needs to remember in its parameters some information about the data it was trained on. This property can be exploited in model inversion or reconstruction attack. It can cause catastrophic data leakage. For example, it has been shown that images can be reconstructed with impressive accuracy and detail allowing visualization of the original training data. Earlier, we have discussed about privacy issues with relation to electronic health records. We have mentioned simple strategies to protect the privacy of subjects. This will anonymization to remove personally identifiable information from a dataset. Pseudo anonymization, for example, replaces personally identifiable information in a dataset with a [inaudible] synthetic entry. In this way, it protects the privacy of the patient. However, neural networks are powerful in determining an individual's identity despite anonymization. This identification is based on other information present in the dataset even when direct information has been removed. For example, it is possible to identify a subject by exploiting information and similarity to other datasets in which the same individual is contained. It has been reported that large-scale re-identification attacks and the sale of this information with relation to medical records have become a business model for data mining companies. For example, health insurance companies are interested in re-identifying patient records. By wishing to reduce their financial risk, they can discriminate against individuals with certain illnesses. In any way, the identification by naive anonymization or pseudo anonymization alone must be viewed as a technically insufficient measure against identity inference. Another type of privacy attack is a dataset reconstruction attack. This is based on deriving an individual's characteristics from results of computations performed on the data-set without necessarily having access to the data-set itself. Finally, tracing attack or membership inference attack refers to determining whether an individual is present in the dataset or not, without necessarily determining their exact identity. In the algorithm level, the most common privacy attacks are the adversarial attacks and the model inversion or otherwise reconstruction attacks. Adversarial attacks can compromise the computation null result by introducing malicious training examples, and we call this model poisoning. Normally it is difficult to detect the differences of these samples. It comes down to differences in noise that can affect the effectiveness of the model. On the other hand, model inversion attacks refers to algorithm that they try to derive information about the dataset stored within neural network weights by observing the algorithm behavior. Information leakage can be also viewed with relation to the user's actions and intentions. For example, a passive user can interact with a trained model as intended by design and in compliance with the protocols. Nevertheless, if the model has vulnerabilities, they can compromise privacy. These vulnerabilities can later be exploited by a malicious attacker. Involuntary leakage types include feature leakage, memorization and plain over-fitting. Over-fitting is also a type of memorization which is easily to be detected by looking at the accuracy on the training and testing data. Feature leakage is also related to memorization, but is more specific to certain sensitive properties. A malicious attacker would exploit vulnerabilities of the Machine Learning models based on more sophisticated approaches. He might be able, for example, to reconstruct the training data based on modeling version attacked or aim to reconstruct some machine learning model itself. Even if he's not able to reconstruct exactly the input data, he might still be able to infer whether a patient's data are included in the dataset or extract specific information with relation to a patient. Here we see statistics on papers about data leakage with relation to the data type of training dataset and by the type of leakage. We see here privacy attacks with relation to images, tabular data, text and time series. Images and tabular data are the most explored domains. We also observe the memorizeship inference attacks and the construction attacks has attracted a lot of popularity among the research community. Model extraction attacks, which are designed to st the trained model functionality are following in terms of popularity among the research community. Finally, poisoning and property inference attacks have attracted less attention. On the right-hand side, we see similar statistics with relation to the methods of inference. In particular, we're looking into classification, regression, generation and Machine Learning as a service. Generation reflect synthesis of realistic high-quality data that could solve the shortage of open-access data in the medical domain. On the other hand, machine learning as a service separates data from machine learning model. It cannot follow complexity by making it impossible to detect and track a source of a problem that result in bias and fairness or privacy attacks. Here we see that research in privacy is mostly focused into classification tasks. Disentanglement in representation learning refers to the ability to break down data features in key categories and represent them in their own latent spaces. Our recent work has demonstrated that it's possible to separate the latent representation of emotions from the identity representation in human pose data. Human pose data, like most biological data, carries signatures of human identity, which are called biometrics. To achieve the effect of subject data disentanglement, we adopt a cross-subject training approach. In this way, we maximize the inference of the property of interest while we're minimizing variance of the subject identity. In other words, latent representations of the subject data are transferred between each of the subjects. Here we see an example of the application of this approach. On the top left, we see a visual character that has been used to reconstruct human gait data that reflect either anger or happiness or sadness or they're just neutral. In this scenario, we are interested in developing an algorithm that can dynamically filter streams of data to eliminate unnecessary information that can compromise privacy. Here we achieved that by disentangle biometrics from affects. We see here how the data point they look before disentanglement and after disentanglement with relation to the affect. We also see that by disentangle biometrics and affect, we can actually improve dramatically the classification performance of the affect. We compare this with other approaches. Finally, we have a measurable way to test whether the approach also protects the privacy of the subject. We see that indeed the ability of the algorithm to classify subject after we have achieved disentanglement is very small with relation to the original data. Summarizing, deep neural networks are powerful in memorizing information with relation to the training data. This property results in inherent vulnerabilities that can be exploited by malicious attacker and compromise patient's privacy. Therefore, it's important to consider privacy early in the development of a clinical decision support system. In this way, we can separate biometrics from features of interest and filter identity information early in the processing pipeline.