Causal Machine Learning (CML) Methods

What is CML?

Unlike prediction-oriented machine learning (POML) methods, the focus of causal machine learning (CML) methods is to identify the treatment effect of a treatment (or small number of distinct treatments).

\[ TE(X) = \theta(X)\cdot T \]

$\theta(X)$ is the impact of the treatment when $T$ is binary and marginal impact of the treatment when $T$ is continuous. $\theta(X)$ is a function of attributes ($X$), meaning that the impact of the treatment can vary (heterogeneous) based on the value of the attributes.

$\theta(X) = \theta$, where $\theta$ is a constant, is a special case where the treatment effect is not a function of any observed features.

$T$ may be continuous or discrete.

CML considers the following model (following the documentation of the econml Python package)

\[ \begin{aligned} Y & = \theta(X)\cdot T + g(X, W) + \varepsilon \\ T & = f(X, W) + \eta \end{aligned} \]

$W$ are the collection of attributes that affect $Y$ along with $X$ (represented by $g(X, W)$), but not as drivers of the heterogeneity in the impact of the treatment. $X$ not just affects $Y$ as drivers of the heterogeneity in the impact of the treatment ($\theta(X)\cdot T$), but also directly along with $W$.

Both $X$ and $W$ are potential confounders. While we do control for them (eliminating their influence) by partialing out $f(X, W)$ and $g(X, W)$, the sole focus is on the estimation of $\theta(X)$. This is in stark contrast to the focus of the ML methods we have seen in earlier sections, which primarily focuses on the accurate prediction of the level of the dependent variable, rather than how the level of the dependent variable changes when treated like CML methods.

In this chapter, we first cover double-debiased machine learning (DML) method by Chernozhukov et al. (2018), which many prominent CML methods follow. We then move on to discuss R-leaner, followed by causal forest and orthogonal forest.

Some notes on CML

Important

Point 1. The use of causal machine learning model does not guarantee you the estimate you got is indeed a causal effect.
Point 2. Careful examinations of the underlying assumptions is necessary just like traditional causal inference methods.
Point 3. CML is not a panacea at all. How can it be better than the traditional approaches?
- They may be able to capture complex non-linear interactions of variables without specifying how they interact
- They are robust to specification errors
- They can be less efficient to correctly specified (or well approximated) parametric models

Point 1 is not just for CML, but for any statistical models in general. Any statistical model is just a mathematical manipulation of numbers. Model themselves have no ability to identify causal effects.

More on this later….

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal 21 (1): C1–68. https://doi.org/10.1111/ectj.12097.

# Causal Machine Learning (CML) Methods {.unnumbered} ## What is CML? Unlike prediction-oriented machine learning (POML) methods, the focus of causal machine learning (CML) methods is to identify the treatment effect of a treatment (or small number of distinct treatments). $$ TE(X) = \theta(X)\cdot T $$ $\theta(X)$ is the impact of the treatment when $T$ is binary and marginal impact of the treatment when $T$ is continuous. $\theta(X)$ is a function of attributes ($X$), meaning that the impact of the treatment can vary (heterogeneous) based on the value of the attributes. ::: {.column-margin} $\theta(X) = \theta$, where $\theta$ is a constant, is a special case where the treatment effect is not a function of any observed features. ::: ::: {.column-margin} $T$ may be continuous or discrete. ::: CML considers the following model (following [the documentation of the `econml` Python package](https://econml.azurewebsites.net/spec/estimation/dml.html)) $$ \begin{aligned} Y & = \theta(X)\cdot T + g(X, W) + \varepsilon \\ T & = f(X, W) + \eta \end{aligned} $$ $W$ are the collection of attributes that affect $Y$ along with $X$ (represented by $g(X, W)$), but not as drivers of the heterogeneity in the impact of the treatment. $X$ not just affects $Y$ as drivers of the heterogeneity in the impact of the treatment ($\theta(X)\cdot T$), but also directly along with $W$. Both $X$ and $W$ are potential confounders. While we do control for them (eliminating their influence) by partialing out $f(X, W)$ and $g(X, W)$, the sole focus is on the estimation of $\theta(X)$. This is in stark contrast to the focus of the ML methods we have seen in earlier sections, which primarily focuses on the accurate prediction of the level of the dependent variable, rather than how the level of the dependent variable changes when treated like CML methods. In this chapter, we first cover double-debiased machine learning (DML) method by @Chernozhukov2018, which many prominent CML methods follow. We then move on to discuss R-leaner, followed by causal forest and orthogonal forest. ## Some notes on CML :::{.callout-important} + Point 1. The use of causal machine learning model does not guarantee you the estimate you got is indeed a causal effect. + Point 2. Careful examinations of the underlying assumptions is necessary just like traditional causal inference methods. + Point 3. CML is not a panacea at all. How can it be better than the traditional approaches? * They may be able to capture complex non-linear interactions of variables without specifying how they interact * They are robust to specification errors * They can be less efficient to correctly specified (or well approximated) parametric models ::: Point 1 is not just for CML, but for any statistical models in general. Any statistical model is just a mathematical manipulation of numbers. Model themselves have no ability to identify causal effects. More on this later....