An introduction to data integration and statistical methods used in contemporary Systems Biology, Bioinformatics and Systems Pharmacology research. The course covers methods to process raw data from genome-wide mRNA expression studies (microarrays and RNA-seq) including data normalization, differential expression, clustering, enrichment analysis and network construction. The course contains practical tutorials for using tools and setting up pipelines, but it also covers the mathematics behind the methods applied within the tools. The course is mostly appropriate for beginning graduate students and advanced undergraduates majoring in fields such as biology, math, physics, chemistry, computer science, biomedical and electrical engineering. The course should be useful for researchers who encounter large datasets in their own research. The course presents software tools developed by the Ma’ayan Laboratory (http://icahn.mssm.edu/research/labs/maayan-laboratory) from the Icahn School of Medicine at Mount Sinai, but also other freely available data analysis and visualization tools. The ultimate aim of the course is to enable participants to utilize the methods presented in this course for analyzing their own data for their own projects. For those participants that do not work in the field, the course introduces the current research challenges faced in the field of computational systems biology.

From the lesson

Principal Component Analysis, Self-Organizing Maps, Network-Based Clustering and Hierarchical Clustering

This module is devoted to various method of clustering: principal component analysis, self-organizing maps, network-based clustering and hierarchical clustering. The theory behind these methods of analysis are covered in detail, and this is followed by some practical demonstration of the methods for applications using R and MATLAB.