So far, we looked at methods that use the covariates and or, the propensity score, but we have not modeled the regression function. That may seem rather strange, as under unconfoundedness, if you can model the expected value of the potential outcome given X and that's equal by that's identified and equal to the conditional expectation and the regression function, you can use that to estimate the average treatment effect at covariate value X and the average effect of treatment at the covariate value X, and then you can average up across your estimates over the sample to get an estimate of the average treatment effect. Of course similarly, you can do that to get an estimate of the average effective treatment on the treated, and you'll use the imputations but of course you can use observed values in place of imputed values when you actually observe. So, for the treated observations, you can observe their potential outcomes in the treatment group and you can use that, you just need to impute there values, their control values. Okay. Now, the average treatment effect can also be used to estimate the sample average treatment effect and the finite population average treatment effect. In fact, it does estimate those as well just the standard errors be different. To estimate the effect of treatment on the treated, you don't need to model the regression function in the treatment group and this just gives that estimate. So, you only need to model the control group. Okay. In empirical work, linear regression is still often used to model the regression function, but when the regression function is misspecified, estimates of treatment effects can be very biased and they're going to be inconsistent in general. So, you'll recall way back when in this course that this is a concern that motivated many of the approaches based on the propensity score in the first place. However, we also saw when we discussed waiting by the inverse of a propensity score, that if the model for the propensity score is misspecified, this will also lead to biased estimates of treatment affects. Recent theoretical and computational advances help to address these kinds of concerns, both about the propensity score and the regression function. They can both be modeled non-parametrically. For example, although logistic regression sometimes with interactions and quadratic terms remains the most common way of modeling the propensity score, you could use other methods like CART for example, and it's for the regression function, Jennifer Hill models the regression function using Bayesian additive regression trees. So you can do better, but it turns out that on the theoretical side, it turns out that it's theoretically advantageous to combine the use of models for both functions and more precisely, it is possible to consistently estimate treatment effects, if either of these functions i.e the propensity score or the regression function is modeled correctly, and this property is referred to as double-robustness and I've given some citations for accessible introductions. I should also say that if both models are correct, the estimates are efficient. So, let's have a look at how this works. Suppose we specify the regression function as g, and the propensity score is E and the regression depends on parameters, beta propensity scoring and parameter is alpha. Let's look at the following estimator. Now, we're only estimating expected value of Y1, so for the treatment. So, that's EDR. DR is for double-robust and the hat is know is for estimation. Okay. So, let's have this estimator and here's the estimator, looks quite formidable but we're going to break it down, and we're going to suppose that both alpha, hat and beta hat are consistent for alpha and beta. You can see from that previous formula, that the estimator is estimating the following function. Now, let's start to break this down. Let's suppose that the model for the propensity scores is properly specified. Then, what did we see before? We saw that assuming unconfoundedness, the first term of this equation by which I'm referring to an expected value ZY over e of X alpha, that that is equal to the expectation of Y1, for the treatment outcomes. The corresponding term is the inverse probability treatment estimator that we previously discussed. Right? So, we're in business there. All right. So, now we have to look at the second term. So, what's going on there? Now, the first thing we're going to do is iterated expectations, and the second thing is since we're conditioning on X, we can pull g out of that. Then, we're looking at the expected value of Z minus e of X alpha given X. But remember that the expected value of Z given X is just e of X alpha. We're assuming the propensity scores properly specified and so that's zero. I mean intuitively, you can look at it as unconfoundedness allows you to pull these two things out and Z minus e of X alpha is the residual. So, that gives you the explanation for the case where the propensity score is correctly specified. Now, let's go to the other part and think about the regression function. Let's suppose that that is correctly specified, and now we're going to rewrite the estimator in another way. So, now we want to look at the first term of this equation and you can see that if the first term of this equation, the expectation of this first term is zero, then we're going to be left with just g which is what we want. So, let's break down that first term. So, the first thing we're doing is rewriting this in terms of iterated expectations, and then we have unconfoundedness which allows us to pull the Z and the Y one apart, and so then at the end of the day, we're left with the expected value of Z over e of X alpha given X which is just one, and then we're left multiplying that by the expected value of Y1 minus the g function, given X and okay. What's that? That's just the residuals from the regression function at the value X, that's zero. Okay. So, if either the propensity score, the regression function is specified correctly, then this double robust estimator of the treatment outcome is consistent for that. Then, similarly same kind of reasoning, you can do it for yourself and it would be illustrative for you to do so. The same kind of reasoning is going to get you a similar type of estimator for the expectation of the control outcomes. That's going to be doubly robust for that, so then if you difference the two, you're just going to get a doubly robust estimator, the average treatment effect. Very straightforward. So, standard errors, they're given below in that formula and there's a SAS macro for doing this, and there's also a Stata program for doing this. Okay, and finally, I just want to mention that you can modify this and you get a doubly robust estimator of effective treatment on the treated, and you can look at that and go through the similar kind of reasoning and work it out and see that that's going to work. Okay, double-robustness.