03: Monte Carlo Simulation

Monte Carlo Simulation: Introduction


Monte Carlo Simulation: Introduction

It is a way to test econometric theories via simulation.


How is it used in econometrics?

  • confirm ecoometric theory numerically
    • OLS estimators are unbiased if \(E[u|x]=0\) along with other conditions (theory)
    • I know the above theory is right, but let’s check if it is true numerically
  • You kind of sense that something in your data may cause problems, but there is no proven econometric theory about what’s gonna happen (I used MC simulation for this purpose a lot)
  • assist students in understanding econometric theories by providing actual numbers instead of a series of Greek letters

Suppose you are interested in checking what happens to OLS estimators if \(E[u|x]=0\) (the error term and \(x\) are not correlated) is violated.

Question

Can you use the real data to do this?



Answer No because you will never observe either error term or true value of \(\beta\)s.

You generate data (you have control over how data are generated)

  • You know the true parameter unlike the real data generating process
  • You can change only the part that you want to change about data generating process and econometric methods with everything else fixed

Generating data

Pseudo random number generators (Pseudo RNG)

Algorithms for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers


Examples

Draw from a uniform distribution:


Numbers drawn using pseudo random number generators are not truly random

  • What numbers you will get are pre-determined
  • What numbers you will get can be determined by setting a seed

Demonstration


Question

What benefits does setting a seed have?

\(x \sim N(0, 1)\)

\(x \sim N(2, 2)\)

R functions for often-used distributions

  • Normal
  • Uniform
  • Beta
  • Chi-square
  • F
  • Logistic
  • Log-normal
  • many others

For each distribution, you have four different kinds of functions:

  • dnorm: density function
  • pnorm: distribution function
  • qnorm: quantile function
  • rnorm: random draw

dnorm(x) gives you the height of the density function at \(x\).

dnorm(-1) and dnorm(2)

pnorm(x) gives you the probability that a single random draw is less than \(x\).

What is the probability that a single random draw from a Normal distribution with mean = 1 and sd = 2 is less than 1?


Work here


Answer

pnorm(1, mean = 1, sd = 2) 

qnorm(x), where \(0 < x < 1\), gives you a number \(\pi\), where the probability of observing a number from a single random draw is less than \(\pi\) with probability of \(x\).

We call the output of qnorm(x), \(x%\) quantile of the standard Normal distribution (because the default is mean = 0 and sd = 1 for rnorm()).

What is the 88% quantile of Normal distribution with mean = 0 and sd = 9?

Code
qnorm(0.88, mean = 0, sd = 9)

Monte Carlo Simulation: Introduction


Monte Carlo Simulation: Steps

  • specify the data generating process
  • generate data based on the data generating process
  • get an estimate based on the generated data (e.g. OLS, mean)
  • repeat the above steps many many times
  • compare your estimates with the true parameter

Question

Why do the steps \(1-3\) many many times?

Monte Carlo Simulation: Example 1

Question

Is sample mean really an unbiased estimator of the expected value?


That is, is \(E[\frac{1}{n}\sum_{i=1}^n x_i] = E[x]\), where \(x_i\) is an independent random draw from the same distribution,

  • repeat the above steps many times
  • We use a loop to do the same (similar) thing over and over again

R code


Verbally

For each of \(i\) in \(1:B\) \((1, 2, \dots, 1000)\), do print(i).

  • i takes the value of 1, and then print(1)
  • i takes the value of 2, and then print(2)
  • …
  • i takes the value of 999, and then print(999)
  • i takes the value of 1000, and then print(1000)

Compare your estimates with the true parameter

Monte Carlo Simulation: Example 2

Question

What happens to \(\beta_1\) if \(E[u|x]\ne 0\) when estimating \(y=\beta_0+\beta_1 x + u\)?

Monte Carlo Simulation: Example 3 (optional)

Model

\[\begin{aligned} y = \beta_0 + \beta_1 x + u \\ \end{aligned}\]
  • \(x\sim N(0,1)\)
  • \(u\sim N(0,1)\)
  • \(E[u|x]=0\)


Variance of the OLS estimator

True Variance of \(\hat{\beta_1}\): \(V(\hat{\beta_1}) = \frac{\sigma^2}{\sum_{i=1}^n (x_i-\bar{x})^2} = \frac{\sigma^2}{SST_X}\)

Its estimator: \(\widehat{V(\hat{\beta_1})} =\frac{\hat{\sigma}^2}{SST_X} = \frac{\sum_{i=1}^n \hat{u}_i^2}{n-2} \times \frac{1}{SST_X}\)


Question

Does the estimator really work? (Is it unbiased?)

True Variance

  • \(SST_X = 112.07\)
  • \(\sigma^2 = 4\)

\[V(\hat{\beta}) = 4/112.07 = 0.0357\]


Check

Your Estimates of Variance of \(\hat{\beta_1}\)?

Exercise (optional)

Using MC simulations, find out how the variation in \(x\) affects the OLS estimators


Model setup

\[\begin{align} y = \beta_0 + \beta_1 x_1 + u \\ y = \beta_0 + \beta_1 x_2 + u \end{align}\]
  • \(x_1\sim N(0,1)\) and \(x_2\sim N(0,9)\)
  • \(u\sim N(0,1)\)
  • \(E[u_1|x]=0\) and \(E[u_2|x]=0\)