This course is about learning the fundamental computing skills necessary for effective data analysis. You will learn to program in R and to use R for reading data, writing functions, making informative graphs, and applying modern statistical methods.

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.

A student who has completed this course is able to:

- Read formatted data into R
- Subset, remove missing values from, and clean tabular data
- Write custom functions in R to implement new functionality and making use of control structures such as loops and conditionals
- Use the R code debugger to identify problems in R functions
- Make a scatterplot/boxplot/histogram/image plot and modify a plot with custom annotations
- Define a new data class in R and write methods for that class

Some familiarity with programming concepts will be useful as well basic knowledge of statistical reasoning. At Johns Hopkins, this course is taken by first-year graduate students in Biostatistics.

- Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)
- S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
- Programming with Data: A Guide to the S Language by John M. Chambers (Springer)

The course will consist of lecture videos broken into 8-10 minute segments.
There will be four graded quizzes and four programming assignments
that will be graded. There will be approximately 3 hours of video content
per week.

**What resources will I need for this class?**A computer is needed on which the R software environment can be installed (recent Mac, Windows, or Linux computers are sufficient).

**Is there a textbook for the class?**There is no required textbook for the class and all materials will be provided. There are, however, a few suggested readings.

**How is this course different from “Data Analysis”?**This course will focus on developing the programming skills necessary for managing data and for implementing statistical methods. The course will not focus on teaching properties of specific statistical algorithms unless they are used to demonstrate important programming techniques. Some of the topics covered in this course are relevant to the “Data Analysis” course but the two do not need to be taken in sequence.