<- 1000
N <- rnorm(N)
x
(<- mean(x)
theta_hat_mm )
[1] -0.02858194
Method of moment is a class of statistical methods that derive estimators from moment conditions. We will go over the basic concepts of method of moments here because it will help us understand how double machine learning methods work.
Let’s first review what moments are.
The statistics that we use all the time, expected value and variance of \(x\), are the 1st raw moment of \(x\) and second central moment of \(x\), respectively.
\[ \begin{aligned} \mu_1 & = E[x] \\ \mu_2' & = E[(x-\mu_1)^2] \end{aligned} \]
Sample analogs of moments are defined as follows:
In general, method of moments work like this.
It is best to see some examples to understand this better.
We would like to estimate the expected value of a random variable \(x\).
Step 1: In this example, the statistics of interest (\(\theta\)) is the expected value of a random variable \(x\). The moment condition is simply
\[ \begin{aligned} \theta = E[x] \end{aligned} \]
Step 2: The sample analog of \(E[x]\) is \(\frac{1}{n}\sum_{i=1}^N x_i\). So,
\[ \begin{aligned} \theta = \frac{1}{n}\sum_{i=1}^N x_i \end{aligned} \]
Step 3: Well, the equation is already solve with respect to \(\theta\).
\[ \begin{aligned} \hat{\theta} = \frac{1}{n}\sum_{i=1}^N x_i \end{aligned} \]
The method of moment estimator of the expected value of a random variable is sample mean (\(\frac{1}{n}\sum_{i=1}^N x_i\)).
<- 1000
N <- rnorm(N)
x
(<- mean(x)
theta_hat_mm )
[1] -0.02858194
Now, let’s look at an estimation task that is more familiar and relevant to our work: estimating the coefficients of a linear model.
Consider the following linear model.
\[ \begin{aligned} y = \alpha + \beta x + \mu \end{aligned} \tag{B.1}\]
By the assumption of zero conditional mean (\(E[\mu|X] = 0\)), where \(X = \{1, x\}\) (the intercept and \(x\)), the following hold as the moment conditions for this problem:
\[ \begin{aligned} E[\mu \cdot x] & = E_x[E_{\mu}[\mu \cdot x|x]] \\ & = E_x[xE_{\mu}[\mu|x]] \\ & = E_x[x\cdot 0] \;\; \mbox{(by the assumption)} \\ & = 0 \end{aligned} \]
\[ \begin{aligned} E[\mu\cdot 1] & = 0\\ E[\mu\cdot x] & = 0 \end{aligned} \tag{B.2}\]
From Equation B.1, we can see that \(\mu = y - \alpha + \beta x\). Substituting this into Equation B.2,
\[ \begin{aligned} E[(y - \alpha + \beta x)\cdot 1] & = 0\\ E[(y - \alpha + \beta x)\cdot x] & = 0 \end{aligned} \tag{B.3}\]
Now, the sample analogs of these moment conditions are,
\[ \begin{aligned} \sum_{i=1}^N(y_i - \alpha + \beta x_i) & = 0\\ \sum_{i=1}^N(y_i - \alpha + \beta x_i)\cdot x_i & = 0 \end{aligned} \]
Do these look familiar to you? They should be because they are identical to the first order conditions of OLS. Solving the equations,
\[ \begin{aligned} \hat{\alpha}_{mm} & = \frac{1}{N}\sum_{i=1}^N y_i \\ \hat{\beta}_{mm} & = \frac{\sum_{i=1}^N (y_i-\bar{y})(x_i - \bar{x})}{\sum_{i=1}^N (x_i-\bar{x})^2} \end{aligned} \]
Now, consider the following model,
\[ \begin{aligned} y = \alpha + \beta x + \mu \end{aligned} \]
where \(E[\mu|x] = f(x)\). So, \(x\) is endogenous. Fortunately, we have found an external instrument \(z\) such that \(E[u|z] = 0\) and \(z\) has explanatory power on \(x\) (\(z\) is not a weak instrument). According to these assumptions, we can write the following moment conditions:
\[ \begin{aligned} E[\mu\cdot 1] & = 0 \;\; \mbox{(w.r.t to the intercept)}\\ E[\mu\cdot z] & = 0 \end{aligned} \tag{B.4}\]
The key difference from the previous case is that we are not using \(E[\mu\cdot x] = 0\) because we believe that this moment condition is not satisfied. Instead, we are using \(E[\mu\cdot z] = 0\) because we believe this condition is satisfied.
Substituting \(\mu = y - \alpha + \beta x\) into Equation B.4,
\[ \begin{aligned} E[(y-\alpha + \beta x)\cdot 1] & = 0 \\ E[(y-\alpha + \beta x)\cdot z] & = 0 \end{aligned} \]
The sample analogs of these conditions are,
\[ \begin{aligned} \sum_{i=1}^N(y_i - \alpha + \beta x_i) & = 0\\ \sum_{i=1}^N(y_i - \alpha + \beta x_i)\cdot z_i & = 0 \end{aligned} \]
Now, we can solve these conditions with respect to \(\alpha\) and \(\beta\), which are instrumental variable estimators.