Appendix B — Primer on method of moment

Method of moment is a class of statistical methods that derive estimators from moment conditions. We will go over the basic concepts of method of moments here because it will help us understand how double machine learning methods work.

B.1 Moments

Let’s first review what moments are.

Definition

\(n\)th raw moment of a random variable \(x\) (denoted as \(\mu_n\)) is defined as

\[ \begin{aligned} \mu_n = E[x^n] \end{aligned} \]

\(n\)th central moment of a random variable \(x\) (denoted as \(\mu_n'\)) is defined as

\[ \begin{aligned} \mu_n' = E[(x-\mu)^n] \end{aligned} \]

The statistics that we use all the time, expected value and variance of \(x\), are the 1st raw moment of \(x\) and second central moment of \(x\), respectively.

\[ \begin{aligned} \mu_1 & = E[x] \\ \mu_2' & = E[(x-\mu_1)^2] \end{aligned} \]

Sample analogs of moments are defined as follows:

Sample analog of moment conditions

Suppose you have \(N\) realized values of a random variable \(x\) (\(x_1, x_2, \dots, x_N\)), then the sample analog of \(n\)th raw and central moments are

\[ \begin{aligned} \mu_n = E[x^n] & \Rightarrow \frac{1}{n}\sum_{i=1}^N x_i^n \\ \mu_n' = E[(x-\mu_1)^n] & \Rightarrow \frac{1}{n} \sum_{i=1}^N (x_i-\bar{x})^n \end{aligned} \]

, respectively.

B.2 Method of moments estimator

In general, method of moments work like this.

Method of moments in general
    1. For the given statistics of interest (say, \(\theta\)), write equations that define \(\theta\) using moments either implicitly or explicitly.
    1. Replace the moment conditions with their sample analogs
    1. Solve for \(\theta\)

It is best to see some examples to understand this better.

B.2.1 Simple example

We would like to estimate the expected value of a random variable \(x\).

Step 1: In this example, the statistics of interest (\(\theta\)) is the expected value of a random variable \(x\). The moment condition is simply

\[ \begin{aligned} \theta = E[x] \end{aligned} \]

Step 2: The sample analog of \(E[x]\) is \(\frac{1}{n}\sum_{i=1}^N x_i\). So,

\[ \begin{aligned} \theta = \frac{1}{n}\sum_{i=1}^N x_i \end{aligned} \]

Step 3: Well, the equation is already solve with respect to \(\theta\).

\[ \begin{aligned} \hat{\theta} = \frac{1}{n}\sum_{i=1}^N x_i \end{aligned} \]

The method of moment estimator of the expected value of a random variable is sample mean (\(\frac{1}{n}\sum_{i=1}^N x_i\)).

N <- 1000
x <- rnorm(N)
(
theta_hat_mm <- mean(x)
)
[1] -0.02858194

B.2.2 Method of moments to estimate a linear-in-parameter model

Now, let’s look at an estimation task that is more familiar and relevant to our work: estimating the coefficients of a linear model.

Consider the following linear model.

\[ \begin{aligned} y = \alpha + \beta x + \mu \end{aligned} \tag{B.1}\]

By the assumption of zero conditional mean (\(E[\mu|X] = 0\)), where \(X = \{1, x\}\) (the intercept and \(x\)), the following hold as the moment conditions for this problem:

\[ \begin{aligned} E[\mu \cdot x] & = E_x[E_{\mu}[\mu \cdot x|x]] \\ & = E_x[xE_{\mu}[\mu|x]] \\ & = E_x[x\cdot 0] \;\; \mbox{(by the assumption)} \\ & = 0 \end{aligned} \]

\[ \begin{aligned} E[\mu\cdot 1] & = 0\\ E[\mu\cdot x] & = 0 \end{aligned} \tag{B.2}\]

From Equation B.1, we can see that \(\mu = y - \alpha + \beta x\). Substituting this into Equation B.2,

\[ \begin{aligned} E[(y - \alpha + \beta x)\cdot 1] & = 0\\ E[(y - \alpha + \beta x)\cdot x] & = 0 \end{aligned} \tag{B.3}\]

Terminology alert: Score function

Score function is a function of parameters to estimate inside \(E[]\) of the moment conditions. \(\Psi(\cdot)\) is often used to represent a score function.

So, the score functions in Equation B.3 are

  • \(\Psi_1(\alpha, \beta) = y - \alpha + \beta x\)
  • \(\Psi_2(\alpha, \beta) = (y - \alpha + \beta x)\cdot x\)

for the first and second moment conditions.

Now, the sample analogs of these moment conditions are,

\[ \begin{aligned} \sum_{i=1}^N(y_i - \alpha + \beta x_i) & = 0\\ \sum_{i=1}^N(y_i - \alpha + \beta x_i)\cdot x_i & = 0 \end{aligned} \]

Do these look familiar to you? They should be because they are identical to the first order conditions of OLS. Solving the equations,

\[ \begin{aligned} \hat{\alpha}_{mm} & = \frac{1}{N}\sum_{i=1}^N y_i \\ \hat{\beta}_{mm} & = \frac{\sum_{i=1}^N (y_i-\bar{y})(x_i - \bar{x})}{\sum_{i=1}^N (x_i-\bar{x})^2} \end{aligned} \]

B.2.3 Instrumental variable approach as a method of moment estimator

Now, consider the following model,

\[ \begin{aligned} y = \alpha + \beta x + \mu \end{aligned} \]

where \(E[\mu|x] = f(x)\). So, \(x\) is endogenous. Fortunately, we have found an external instrument \(z\) such that \(E[u|z] = 0\) and \(z\) has explanatory power on \(x\) (\(z\) is not a weak instrument). According to these assumptions, we can write the following moment conditions:

\[ \begin{aligned} E[\mu\cdot 1] & = 0 \;\; \mbox{(w.r.t to the intercept)}\\ E[\mu\cdot z] & = 0 \end{aligned} \tag{B.4}\]

The key difference from the previous case is that we are not using \(E[\mu\cdot x] = 0\) because we believe that this moment condition is not satisfied. Instead, we are using \(E[\mu\cdot z] = 0\) because we believe this condition is satisfied.

Substituting \(\mu = y - \alpha + \beta x\) into Equation B.4,

\[ \begin{aligned} E[(y-\alpha + \beta x)\cdot 1] & = 0 \\ E[(y-\alpha + \beta x)\cdot z] & = 0 \end{aligned} \]

The sample analogs of these conditions are,

\[ \begin{aligned} \sum_{i=1}^N(y_i - \alpha + \beta x_i) & = 0\\ \sum_{i=1}^N(y_i - \alpha + \beta x_i)\cdot z_i & = 0 \end{aligned} \]

Now, we can solve these conditions with respect to \(\alpha\) and \(\beta\), which are instrumental variable estimators.