22  Treatment Effect Estimation

This chapter presents CATE estimation using the econml package (Keith Battocchi 2019). The causalml package by Uber (Chen et al. 2020) is less complete than econml at the moment, and we do not cover it.

from econml.dml import DML, LinearDML, SparseLinearDML, NonParamDML, CausalForestDML
from econml.metalearners import TLearner, SLearner, XLearner, DomainAdaptationLearner
from econml.sklearn_extensions.model_selection import GridSearchCVList

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
from sklearn.linear_model import Lasso
from sklearn.model_selection import RepeatedKFold, train_test_split
from numpy.random import binomial, multivariate_normal, normal, uniform
import numpy as np

#=== ignore warnings ===#
import warnings
warnings.filterwarnings("ignore")

Let’s also generate synthetic dataset using make_regression().

# Define DGP
def generate_data(n, d, controls_outcome, treatment_effect, propensity):
    """Generates population data for given untreated_outcome, treatment_effect and propensity functions.
    
    Parameters
    ----------
        n (int): population size
        d (int): number of covariates
        controls_outcome (func): untreated outcome conditional on covariates
        treatment_effect (func): treatment effect conditional on covariates
        propensity (func): probability of treatment conditional on covariates
    """
    # Generate covariates
    X = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n)
    # Generate treatment
    T = np.apply_along_axis(lambda x: binomial(1, propensity(x), 1)[0], 1, X)
    # Calculate outcome
    Y0 = np.apply_along_axis(lambda x: controls_outcome(x), 1, X)
    treat_effect = np.apply_along_axis(lambda x: treatment_effect(x), 1, X)
    Y = Y0 + treat_effect * T
    return (Y, T, X)
# controls outcome, treatment effect, propensity definitions
def generate_controls_outcome(d):
    beta = uniform(-3, 3, d)
    return lambda x: np.dot(x, beta) + normal(0, 1)

treatment_effect = lambda x: (1 if x[1] > 0.1 else 0)*8
propensity = lambda x: (0.8 if (x[2]>-0.5 and x[2]<0.5) else 0.2)
# DGP constants and test data
d = 5
n = 1000
n_test = 250
controls_outcome = generate_controls_outcome(d)
X_test = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n_test)
delta = 6/n_test
X_test[:, 1] = np.arange(-3, 3, delta)
Y, T, X = generate_data(n, d, controls_outcome, treatment_effect, propensity)

22.1 Average Treatment Effect

DoubleML

import DoubleML

22.2 S-, X-, and T-learner

This section shows how to train S-, X-, and T-learner. See Chapter 12 for how these learners work, which would help you understand what you need to specify for each of the learners.

22.2.1 S-learner

To train an S-learner, you need to specify only one estimator, which estimates \(E[Y|T, X, W]\). This can be done using overall_model in SLearner.

#=== specify the overall model ===#
overall_model = GradientBoostingRegressor(
  n_estimators=100,
  max_depth=6,
  min_samples_leaf=10
)

#=== set up an S-learner ===#
S_learner = SLearner(overall_model=overall_model)

#=== train ===#
S_learner.fit(Y, T, X=X)
<econml.metalearners._metalearners.SLearner object at 0x2fc2c2f10>

Estimate \(\theta(X)\) using the effect method,

S_te = S_learner.effect(X_test)

#=== see the first 10 ===#
print(S_te[:10])
[-0.96345437  0.49912816  0.32140746  0.32493562  0.52117931  0.39061349
  1.12985202  0.61109233 -0.05638891  0.74819205]

22.2.2 T-learner

To train a T-learner, you need to specify only one estimator, which estimates \(E[Y|T=1, X, W]\) and \(E[Y|T=0, X, W]\) sparately. This can be done using models in TLearner.

#=== set up an estimator ===#
models = GradientBoostingRegressor(
  n_estimators=100, 
  max_depth=6, 
  min_samples_leaf=10
)

#=== set up a T-learner ===#
T_learner = TLearner(models=models)

#=== train ===#
T_learner.fit(Y, T, X=X)
<econml.metalearners._metalearners.TLearner object at 0x2fc3bedc0>

Estimate \(\theta(X)\) using the effect method,

T_te = T_learner.effect(X_test)

#=== see the first 10 ===#
print(T_te[:10])
[1.52115791 2.06799171 0.64311159 1.39796878 1.1590299  1.47958283
 2.18855353 1.82882139 1.65287604 2.99781446]

22.2.3 X-learner

To train an X-learner, you need to specify two estimators

  1. Estimator that estimates \(E[Y|T=1, X, W]\) and \(E[Y|T=0, X, W]\) sparately just like T-learner.
  2. Estimator that estimates \(E[T|X, W]\) for weighting by propensity score

This can be done using models (for the first) and propensity_model (for the second) in XLearner.

#=== set up an estimator for 1 ===#
models = GradientBoostingRegressor(
  n_estimators=100, 
  max_depth=6, 
  min_samples_leaf=10
)
#=== set up a propensity model ===#
propensity_model = RandomForestClassifier(
  n_estimators=100,
  max_depth=6,
  min_samples_leaf=10
)

#=== set up an X-learner ===#
X_learner = XLearner(models=models, propensity_model=propensity_model)

#=== train ===#
X_learner.fit(Y, T, X=X)
<econml.metalearners._metalearners.XLearner object at 0x2fc3dd460>

Estimate \(\theta(X)\) using the effect method,

X_te = X_learner.effect(X_test)

#=== see the first 10 ===#
print(X_te[:10])
[ 0.88547092  2.02973059  0.55806609 -0.98178738  0.37334261  0.97004328
  0.79535256  0.38061293 -0.31194151  1.20117012]

22.3 R-learner

This section shows how to run various estimators that fall under R-learner, which is referred to as the _Rlearner class in econml. As we saw in Chapter 12, R-learner is a DML and econml offers many estimators under _Rlearner.

  • DML
    • LinearDML
    • SparseLinearDML
    • KernelDML
  • NonParamDML
  • CausalForestDML

All the estimators under _Rlearner require that estimators for \(E[Y|X]\) and \(E[T|X]\) are specified. This can be done by model_y for \(E[Y|X]\) and model_t \(E[T|X]\). However, some estimators require that you specify the final (second stage) model using model_final while others do not.

22.3.1 DML

In this example, let’s use gradient boosting regression for both model_y and model_t and use lasso with cross-validation for model_final. Let’s import GradientBoostingRegressor() and LassoCV() from the scikitlearn package.

22.3.2 LinearDML

LinearDML is a DML estimator that uses unregularlized linear model in the second stage. So, it assumes that \(\theta(X)\) can be written as follows in Equation 12.1:

\[ \begin{aligned} \theta(X) = \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k \end{aligned} \]

It solves the following minimization problem,

\[ \begin{aligned} argmin_{\beta_1, \dots, \beta_k} \sum_{i=1}^N (\tilde{Y}_i - \beta_1 x_1\cdot \tilde{T}- \beta_2 x_2\cdot \tilde{T}, \dots, - \beta_K x_K\cdot \tilde{T})^2 \end{aligned} \]

This can be solved by simply regressing \(\tilde{Y}_i\) on \(x_1\cdot \tilde{T}\) through \(x_K\cdot \tilde{T}\). Once \(\beta_1, \dots,\beta_K\) are estimated, then \(\hat{\theta}(X)\) is

\[ \begin{aligned} \hat{\theta}(X) = \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \dots + \hat{\beta}_k x_k \end{aligned} \]

So, it is easy to interpret how \(X\) affects treatment effects using LinearDML. This estimator should be used only when there are small numbers of heterogeneity drivers, \(X\).

Since LinearDML runs a linear-in-parameter model without regularization, you do not need to specify the estimator for the final stage. We use GradientBoostingRegressor() for model_y and GradientBoostingClassifier() for model_t. Let’s set up our LinearDML,

est = LinearDML(
  model_y = GradientBoostingRegressor(),
  model_t = GradientBoostingClassifier()
)

We now invoke the fit method. Here, \(W=X\).

est.fit(Y, T, X = X, W = X)
<econml.dml.dml.LinearDML object at 0x2fc410cd0>

Predict \(\theta(X)\) for X_test.

ldml_te = est.effect(X_test)

print(ldml_te[:5])
[-3.72248997 -3.01852346 -3.47501606 -3.59183133 -3.48650655]

22.3.3 NonParamDML

As the name suggests, it runs non-parametric regression (e.g., reandom forest) at the second stage. Unlike LinearDML, we need to specify model_final. Internally, it solves the following problem:

\[ \begin{aligned} \sum_{i=1}^N \tilde{T}_i^2(\frac{\tilde{Y}_i}{\tilde{T}_i} - \theta(X)) = 0 \end{aligned} \]

The estimator specified for model_final regress \(\frac{\tilde{Y}_i}{\tilde{T}_i}\) and \(X_i\) with sample weight of \(\tilde{T}_i^2\).

Let’s use GradientBoostingRegressor() as the final model.

est = NonParamDML(
  model_y = GradientBoostingRegressor(),
  model_t = GradientBoostingClassifier(),
  model_final = GradientBoostingRegressor()
)

We now invoke the fit method. Here, \(W=X\).

est.fit(Y, T, X = X, W = X)
<econml.dml.dml.NonParamDML object at 0x2fc459fd0>

Predict \(\theta(X)\) for X_test.

ldml_te = est.effect(X_test)

print(ldml_te[:5])
[0.66405196 2.14662201 0.45083237 1.34571212 0.88009555]

22.3.4 CausalForestDML

22.4 Orthogonal Forest

Chen, Huigang, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. 2020. “CausalML: Python Package for Causal Machine Learning.” https://arxiv.org/abs/2002.11631.
Keith Battocchi, Maggie Hei, Eleanor Dillon. 2019. EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation.” https://github.com/microsoft/EconML.