22 Treatment Effect Estimation

This chapter presents CATE estimation using the econml package (Keith Battocchi 2019). The causalml package by Uber (Chen et al. 2020) is less complete than econml at the moment, and we do not cover it.

from econml.dml import DML, LinearDML, SparseLinearDML, NonParamDML, CausalForestDML
from econml.metalearners import TLearner, SLearner, XLearner, DomainAdaptationLearner
from econml.sklearn_extensions.model_selection import GridSearchCVList

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
from sklearn.linear_model import Lasso
from sklearn.model_selection import RepeatedKFold, train_test_split
from numpy.random import binomial, multivariate_normal, normal, uniform
import numpy as np

#=== ignore warnings ===#
import warnings
warnings.filterwarnings("ignore")

Let’s also generate synthetic dataset using make_regression().

# Define DGP
def generate_data(n, d, controls_outcome, treatment_effect, propensity):
    """Generates population data for given untreated_outcome, treatment_effect and propensity functions.
    
    Parameters
    ----------
        n (int): population size
        d (int): number of covariates
        controls_outcome (func): untreated outcome conditional on covariates
        treatment_effect (func): treatment effect conditional on covariates
        propensity (func): probability of treatment conditional on covariates
    """
    # Generate covariates
    X = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n)
    # Generate treatment
    T = np.apply_along_axis(lambda x: binomial(1, propensity(x), 1)[0], 1, X)
    # Calculate outcome
    Y0 = np.apply_along_axis(lambda x: controls_outcome(x), 1, X)
    treat_effect = np.apply_along_axis(lambda x: treatment_effect(x), 1, X)
    Y = Y0 + treat_effect * T
    return (Y, T, X)

# controls outcome, treatment effect, propensity definitions
def generate_controls_outcome(d):
    beta = uniform(-3, 3, d)
    return lambda x: np.dot(x, beta) + normal(0, 1)

treatment_effect = lambda x: (1 if x[1] > 0.1 else 0)*8
propensity = lambda x: (0.8 if (x[2]>-0.5 and x[2]<0.5) else 0.2)

# DGP constants and test data
d = 5
n = 1000
n_test = 250
controls_outcome = generate_controls_outcome(d)
X_test = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n_test)
delta = 6/n_test
X_test[:, 1] = np.arange(-3, 3, delta)

Y, T, X = generate_data(n, d, controls_outcome, treatment_effect, propensity)

22.1 Average Treatment Effect

DoubleML

import DoubleML

22.2 S-, X-, and T-learner

This section shows how to train S-, X-, and T-learner. See Chapter 12 for how these learners work, which would help you understand what you need to specify for each of the learners.

22.2.1 S-learner

To train an S-learner, you need to specify only one estimator, which estimates $E[Y|T, X, W]$. This can be done using overall_model in SLearner.

#=== specify the overall model ===#
overall_model = GradientBoostingRegressor(
  n_estimators=100,
  max_depth=6,
  min_samples_leaf=10
)

#=== set up an S-learner ===#
S_learner = SLearner(overall_model=overall_model)

#=== train ===#
S_learner.fit(Y, T, X=X)

<econml.metalearners._metalearners.SLearner object at 0x2fc2c2f10>

Estimate $\theta(X)$ using the effect method,

S_te = S_learner.effect(X_test)

#=== see the first 10 ===#
print(S_te[:10])

[-0.96345437  0.49912816  0.32140746  0.32493562  0.52117931  0.39061349
  1.12985202  0.61109233 -0.05638891  0.74819205]

22.2.2 T-learner

To train a T-learner, you need to specify only one estimator, which estimates $E[Y|T=1, X, W]$ and $E[Y|T=0, X, W]$ sparately. This can be done using models in TLearner.

#=== set up an estimator ===#
models = GradientBoostingRegressor(
  n_estimators=100, 
  max_depth=6, 
  min_samples_leaf=10
)

#=== set up a T-learner ===#
T_learner = TLearner(models=models)

#=== train ===#
T_learner.fit(Y, T, X=X)

<econml.metalearners._metalearners.TLearner object at 0x2fc3bedc0>

Estimate $\theta(X)$ using the effect method,

T_te = T_learner.effect(X_test)

#=== see the first 10 ===#
print(T_te[:10])

[1.52115791 2.06799171 0.64311159 1.39796878 1.1590299  1.47958283
 2.18855353 1.82882139 1.65287604 2.99781446]

22.2.3 X-learner

To train an X-learner, you need to specify two estimators

Estimator that estimates $E[Y|T=1, X, W]$ and $E[Y|T=0, X, W]$ sparately just like T-learner.
Estimator that estimates $E[T|X, W]$ for weighting by propensity score

This can be done using models (for the first) and propensity_model (for the second) in XLearner.

#=== set up an estimator for 1 ===#
models = GradientBoostingRegressor(
  n_estimators=100, 
  max_depth=6, 
  min_samples_leaf=10
)
#=== set up a propensity model ===#
propensity_model = RandomForestClassifier(
  n_estimators=100,
  max_depth=6,
  min_samples_leaf=10
)

#=== set up an X-learner ===#
X_learner = XLearner(models=models, propensity_model=propensity_model)

#=== train ===#
X_learner.fit(Y, T, X=X)

<econml.metalearners._metalearners.XLearner object at 0x2fc3dd460>

Estimate $\theta(X)$ using the effect method,

X_te = X_learner.effect(X_test)

#=== see the first 10 ===#
print(X_te[:10])

[ 0.88547092  2.02973059  0.55806609 -0.98178738  0.37334261  0.97004328
  0.79535256  0.38061293 -0.31194151  1.20117012]

22.3 R-learner

This section shows how to run various estimators that fall under R-learner, which is referred to as the _Rlearner class in econml. As we saw in Chapter 12, R-learner is a DML and econml offers many estimators under _Rlearner.

DML
- LinearDML
- SparseLinearDML
- KernelDML
NonParamDML
CausalForestDML

All the estimators under _Rlearner require that estimators for $E[Y|X]$ and $E[T|X]$ are specified. This can be done by model_y for $E[Y|X]$ and model_t $E[T|X]$. However, some estimators require that you specify the final (second stage) model using model_final while others do not.

22.3.1 DML

In this example, let’s use gradient boosting regression for both model_y and model_t and use lasso with cross-validation for model_final. Let’s import GradientBoostingRegressor() and LassoCV() from the scikitlearn package.

22.3.2 `LinearDML`

LinearDML is a DML estimator that uses unregularlized linear model in the second stage. So, it assumes that $\theta(X)$ can be written as follows in Equation 12.1:

\[ \begin{aligned} \theta(X) = \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k \end{aligned} \]

It solves the following minimization problem,

\[ \begin{aligned} argmin_{\beta_1, \dots, \beta_k} \sum_{i=1}^N (\tilde{Y}_i - \beta_1 x_1\cdot \tilde{T}- \beta_2 x_2\cdot \tilde{T}, \dots, - \beta_K x_K\cdot \tilde{T})^2 \end{aligned} \]

This can be solved by simply regressing $\tilde{Y}_i$ on $x_1\cdot \tilde{T}$ through $x_K\cdot \tilde{T}$. Once $\beta_1, \dots,\beta_K$ are estimated, then $\hat{\theta}(X)$ is

\[ \begin{aligned} \hat{\theta}(X) = \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \dots + \hat{\beta}_k x_k \end{aligned} \]

So, it is easy to interpret how $X$ affects treatment effects using LinearDML. This estimator should be used only when there are small numbers of heterogeneity drivers, $X$.

Since LinearDML runs a linear-in-parameter model without regularization, you do not need to specify the estimator for the final stage. We use GradientBoostingRegressor() for model_y and GradientBoostingClassifier() for model_t. Let’s set up our LinearDML,

est = LinearDML(
  model_y = GradientBoostingRegressor(),
  model_t = GradientBoostingClassifier()
)

We now invoke the fit method. Here, $W=X$.

est.fit(Y, T, X = X, W = X)

<econml.dml.dml.LinearDML object at 0x2fc410cd0>

Predict $\theta(X)$ for X_test.

ldml_te = est.effect(X_test)

print(ldml_te[:5])

[-3.72248997 -3.01852346 -3.47501606 -3.59183133 -3.48650655]

22.3.3 `NonParamDML`

As the name suggests, it runs non-parametric regression (e.g., reandom forest) at the second stage. Unlike LinearDML, we need to specify model_final. Internally, it solves the following problem:

\[ \begin{aligned} \sum_{i=1}^N \tilde{T}_i^2(\frac{\tilde{Y}_i}{\tilde{T}_i} - \theta(X)) = 0 \end{aligned} \]

The estimator specified for model_final regress $\frac{\tilde{Y}_i}{\tilde{T}_i}$ and $X_i$ with sample weight of $\tilde{T}_i^2$.

Let’s use GradientBoostingRegressor() as the final model.

est = NonParamDML(
  model_y = GradientBoostingRegressor(),
  model_t = GradientBoostingClassifier(),
  model_final = GradientBoostingRegressor()
)

We now invoke the fit method. Here, $W=X$.

est.fit(Y, T, X = X, W = X)

<econml.dml.dml.NonParamDML object at 0x2fc459fd0>

Predict $\theta(X)$ for X_test.

ldml_te = est.effect(X_test)

print(ldml_te[:5])

[0.66405196 2.14662201 0.45083237 1.34571212 0.88009555]

22.3.4 CausalForestDML

22.4 Orthogonal Forest

Chen, Huigang, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. 2020. “CausalML: Python Package for Causal Machine Learning.” https://arxiv.org/abs/2002.11631.

Keith Battocchi, Maggie Hei, Eleanor Dillon. 2019. “EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation.” https://github.com/microsoft/EconML.

# Treatment Effect Estimation This chapter presents CATE estimation using the `econml` package [@econml]. The `causalml` package by Uber [@chen2020causalml] is less complete than `econml` at the moment, and we do not cover it. ```{r} #| include: false library(reticulate) use_virtualenv(here::here("ml-learning")) ``` ```{python} from econml.dml import DML, LinearDML, SparseLinearDML, NonParamDML, CausalForestDML from econml.metalearners import TLearner, SLearner, XLearner, DomainAdaptationLearner from econml.sklearn_extensions.model_selection import GridSearchCVList from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier from sklearn.linear_model import Lasso from sklearn.model_selection import RepeatedKFold, train_test_split from numpy.random import binomial, multivariate_normal, normal, uniform import numpy as np #=== ignore warnings ===# import warnings warnings.filterwarnings("ignore") ``` Let's also generate synthetic dataset using `make_regression()`. ```{python} # Define DGP def generate_data(n, d, controls_outcome, treatment_effect, propensity): """Generates population data for given untreated_outcome, treatment_effect and propensity functions. Parameters ---------- n (int): population size d (int): number of covariates controls_outcome (func): untreated outcome conditional on covariates treatment_effect (func): treatment effect conditional on covariates propensity (func): probability of treatment conditional on covariates """ # Generate covariates X = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n) # Generate treatment T = np.apply_along_axis(lambda x: binomial(1, propensity(x), 1)[0], 1, X) # Calculate outcome Y0 = np.apply_along_axis(lambda x: controls_outcome(x), 1, X) treat_effect = np.apply_along_axis(lambda x: treatment_effect(x), 1, X) Y = Y0 + treat_effect * T return (Y, T, X) ``` ```{python} # controls outcome, treatment effect, propensity definitions def generate_controls_outcome(d): beta = uniform(-3, 3, d) return lambda x: np.dot(x, beta) + normal(0, 1) treatment_effect = lambda x: (1 if x[1] > 0.1 else 0)*8 propensity = lambda x: (0.8 if (x[2]>-0.5 and x[2]<0.5) else 0.2) ``` ```{python} # DGP constants and test data d = 5 n = 1000 n_test = 250 controls_outcome = generate_controls_outcome(d) X_test = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n_test) delta = 6/n_test X_test[:, 1] = np.arange(-3, 3, delta) ``` ```{python} Y, T, X = generate_data(n, d, controls_outcome, treatment_effect, propensity) ``` ## Average Treatment Effect `DoubleML` ```{python} #| eval: false import DoubleML ``` ## S-, X-, and T-learner This section shows how to train S-, X-, and T-learner. See @sec-het-dml for how these learners work, which would help you understand what you need to specify for each of the learners. ### S-learner To train an S-learner, you need to specify only one estimator, which estimates $E[Y|T, X, W]$. This can be done using `overall_model` in `SLearner`. ```{python} #=== specify the overall model ===# overall_model = GradientBoostingRegressor( n_estimators=100, max_depth=6, min_samples_leaf=10 ) #=== set up an S-learner ===# S_learner = SLearner(overall_model=overall_model) #=== train ===# S_learner.fit(Y, T, X=X) ``` Estimate $\theta(X)$ using the `effect` method, ```{python} S_te = S_learner.effect(X_test) #=== see the first 10 ===# print(S_te[:10]) ``` ### T-learner To train a T-learner, you need to specify only one estimator, which estimates $E[Y|T=1, X, W]$ and $E[Y|T=0, X, W]$ sparately. This can be done using `models` in `TLearner`. ```{python} #=== set up an estimator ===# models = GradientBoostingRegressor( n_estimators=100, max_depth=6, min_samples_leaf=10 ) #=== set up a T-learner ===# T_learner = TLearner(models=models) #=== train ===# T_learner.fit(Y, T, X=X) ``` Estimate $\theta(X)$ using the `effect` method, ```{python} T_te = T_learner.effect(X_test) #=== see the first 10 ===# print(T_te[:10]) ``` ### X-learner To train an X-learner, you need to specify two estimators 1. Estimator that estimates $E[Y|T=1, X, W]$ and $E[Y|T=0, X, W]$ sparately just like T-learner. 2. Estimator that estimates $E[T|X, W]$ for weighting by propensity score This can be done using `models` (for the first) and `propensity_model` (for the second) in `XLearner`. ```{python} #=== set up an estimator for 1 ===# models = GradientBoostingRegressor( n_estimators=100, max_depth=6, min_samples_leaf=10 ) #=== set up a propensity model ===# propensity_model = RandomForestClassifier( n_estimators=100, max_depth=6, min_samples_leaf=10 ) #=== set up an X-learner ===# X_learner = XLearner(models=models, propensity_model=propensity_model) #=== train ===# X_learner.fit(Y, T, X=X) ``` Estimate $\theta(X)$ using the `effect` method, ```{python} X_te = X_learner.effect(X_test) #=== see the first 10 ===# print(X_te[:10]) ``` ## R-learner This section shows how to run various estimators that fall under R-learner, which is referred to as the `_Rlearner` class in `econml`. As we saw in @sec-het-dml, R-learner is a DML and `econml` offers many estimators under `_Rlearner`. + `DML` - `LinearDML` - `SparseLinearDML` - `KernelDML` + `NonParamDML` + `CausalForestDML` All the estimators under `_Rlearner` require that estimators for $E[Y|X]$ and $E[T|X]$ are specified. This can be done by `model_y` for $E[Y|X]$ and `model_t` $E[T|X]$. However, some estimators require that you specify the final (second stage) model using `model_final` while others do not. ### DML In this example, let's use gradient boosting regression for both `model_y` and `model_t` and use lasso with cross-validation for `model_final`. Let's import `GradientBoostingRegressor()` and `LassoCV()` from the `scikitlearn` package. ### `LinearDML` `LinearDML` is a DML estimator that uses <span style="color:blue"> unregularlized </span> linear model in the second stage. So, it assumes that $\theta(X)$ can be written as follows in @eq-model-framework: $$ \begin{aligned} \theta(X) = \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k \end{aligned} $$ It solves the following minimization problem, $$ \begin{aligned} argmin_{\beta_1, \dots, \beta_k} \sum_{i=1}^N (\tilde{Y}_i - \beta_1 x_1\cdot \tilde{T}- \beta_2 x_2\cdot \tilde{T}, \dots, - \beta_K x_K\cdot \tilde{T})^2 \end{aligned} $$ This can be solved by simply regressing $\tilde{Y}_i$ on $x_1\cdot \tilde{T}$ through $x_K\cdot \tilde{T}$. Once $\beta_1, \dots,\beta_K$ are estimated, then $\hat{\theta}(X)$ is $$ \begin{aligned} \hat{\theta}(X) = \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \dots + \hat{\beta}_k x_k \end{aligned} $$ So, it is easy to interpret how $X$ affects treatment effects using `LinearDML`. This estimator should be used only when there are small numbers of heterogeneity drivers, $X$. Since `LinearDML` runs a linear-in-parameter model without regularization, you do not need to specify the estimator for the final stage. We use `GradientBoostingRegressor()` for `model_y` and `GradientBoostingClassifier()` for `model_t`. Let's set up our `LinearDML`, ```{python} est = LinearDML( model_y = GradientBoostingRegressor(), model_t = GradientBoostingClassifier() ) ``` We now invoke the `fit` method. Here, $W=X$. ```{python} est.fit(Y, T, X = X, W = X) ``` Predict $\theta(X)$ for `X_test`. ```{python} ldml_te = est.effect(X_test) print(ldml_te[:5]) ``` ### `NonParamDML` As the name suggests, it runs non-parametric regression (e.g., reandom forest) at the second stage. Unlike `LinearDML`, we need to specify `model_final`. Internally, it solves the following problem: $$ \begin{aligned} \sum_{i=1}^N \tilde{T}_i^2(\frac{\tilde{Y}_i}{\tilde{T}_i} - \theta(X)) = 0 \end{aligned} $$ The estimator specified for `model_final` regress $\frac{\tilde{Y}_i}{\tilde{T}_i}$ and $X_i$ with sample weight of $\tilde{T}_i^2$. Let's use `GradientBoostingRegressor()` as the final model. ```{python} est = NonParamDML( model_y = GradientBoostingRegressor(), model_t = GradientBoostingClassifier(), model_final = GradientBoostingRegressor() ) ``` We now invoke the `fit` method. Here, $W=X$. ```{python} est.fit(Y, T, X = X, W = X) ``` Predict $\theta(X)$ for `X_test`. ```{python} ldml_te = est.effect(X_test) print(ldml_te[:5]) ``` ### CausalForestDML ## Orthogonal Forest

22.1 Average Treatment Effect

22.2 S-, X-, and T-learner

22.2.1 S-learner

22.2.2 T-learner

22.2.3 X-learner

22.3 R-learner

22.3.1 DML

22.3.2 LinearDML

22.3.3 NonParamDML

22.3.4 CausalForestDML

22.4 Orthogonal Forest

22.3.2 `LinearDML`

22.3.3 `NonParamDML`