Causal Machine Learning (CML) Methods
What is CML?
Unlike prediction-oriented machine learning (POML) methods, the focus of causal machine learning (CML) methods is to identify the treatment effect of a treatment (or small number of distinct treatments).
\[ TE(X) = \theta(X)\cdot T \]
\(\theta(X)\) is the impact of the treatment when \(T\) is binary and marginal impact of the treatment when \(T\) is continuous. \(\theta(X)\) is a function of attributes (\(X\)), meaning that the impact of the treatment can vary (heterogeneous) based on the value of the attributes.
\(\theta(X) = \theta\), where \(\theta\) is a constant, is a special case where the treatment effect is not a function of any observed features.
\(T\) may be continuous or discrete.
CML considers the following model (following the documentation of the econml
Python package)
\[ \begin{aligned} Y & = \theta(X)\cdot T + g(X, W) + \varepsilon \\ T & = f(X, W) + \eta \end{aligned} \]
\(W\) are the collection of attributes that affect \(Y\) along with \(X\) (represented by \(g(X, W)\)), but not as drivers of the heterogeneity in the impact of the treatment. \(X\) not just affects \(Y\) as drivers of the heterogeneity in the impact of the treatment (\(\theta(X)\cdot T\)), but also directly along with \(W\).
Both \(X\) and \(W\) are potential confounders. While we do control for them (eliminating their influence) by partialing out \(f(X, W)\) and \(g(X, W)\), the sole focus is on the estimation of \(\theta(X)\). This is in stark contrast to the focus of the ML methods we have seen in earlier sections, which primarily focuses on the accurate prediction of the level of the dependent variable, rather than how the level of the dependent variable changes when treated like CML methods.
In this chapter, we first cover double-debiased machine learning (DML) method by Chernozhukov et al. (2018), which many prominent CML methods follow. We then move on to discuss R-leaner, followed by causal forest and orthogonal forest.
Some notes on CML
Point 1 is not just for CML, but for any statistical models in general. Any statistical model is just a mathematical manipulation of numbers. Model themselves have no ability to identify causal effects.