class: center, middle, inverse, title-slide # Monte Carlo Simulation ### AECN 896-002 --- class: middle <style type="text/css"> @media print { .has-continuation { display: block !important; } } .remark-slide-content.hljs-github h1 { margin-top: 5px; margin-bottom: 25px; } .remark-slide-content.hljs-github { padding-top: 10px; padding-left: 30px; padding-right: 30px; } .panel-tabs { <!-- color: #062A00; --> color: #841F27; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 0px; } .panel-tab { margin-top: 0px; margin-bottom: 0px; margin-left: 3px; margin-right: 3px; padding-top: 0px; padding-bottom: 0px; } .panelset .panel-tabs .panel-tab { min-height: 40px; } .remark-slide th { border-bottom: 1px solid #ddd; } .remark-slide thead { border-bottom: 0px; } .gt_footnote { padding: 2px; } .remark-slide table { border-collapse: collapse; } .remark-slide tbody { border-bottom: 2px solid #666; } .important { background-color: lightpink; border: 2px solid blue; font-weight: bold; } .remark-code { display: block; overflow-x: auto; padding: .5em; background: #ffe7e7; } .hljs-github .hljs { background: #f2f2fd; } .remark-inline-code { padding-top: 0px; padding-bottom: 0px; background-color: #e6e6e6; } .r.hljs.remark-code.remark-inline-code{ font-size: 0.9em } .left-full { width: 80%; height: 92%; float: left; } .left-code { width: 38%; height: 92%; float: left; } .right-plot { width: 60%; float: right; padding-left: 1%; } .left5 { width: 49%; height: 92%; float: left; } .right5 { width: 49%; float: right; padding-left: 1%; } .left3 { width: 29%; height: 92%; float: left; } .right7 { width: 69%; float: right; padding-left: 1%; } .left4 { width: 38%; height: 92%; float: left; } .right6 { width: 60%; float: right; padding-left: 1%; } ul li{ margin: 7px; } ul, li{ margin-left: 15px; padding-left: 0px; } ol li{ margin: 7px; } ol, li{ margin-left: 15px; padding-left: 0px; } </style> <style type="text/css"> .content-box { box-sizing: border-box; background-color: #e2e2e2; } .content-box-blue, .content-box-gray, .content-box-grey, .content-box-army, .content-box-green, .content-box-purple, .content-box-red, .content-box-yellow { box-sizing: border-box; border-radius: 5px; margin: 0 0 10px; overflow: hidden; padding: 0px 5px 0px 5px; width: 100%; } .content-box-blue { background-color: #F0F8FF; } .content-box-gray { background-color: #e2e2e2; } .content-box-grey { background-color: #F5F5F5; } .content-box-army { background-color: #737a36; } .content-box-green { background-color: #d9edc2; } .content-box-purple { background-color: #e2e2f9; } .content-box-red { background-color: #ffcccc; } .content-box-yellow { background-color: #fef5c4; } .content-box-blue .remark-inline-code, .content-box-blue .remark-inline-code, .content-box-gray .remark-inline-code, .content-box-grey .remark-inline-code, .content-box-army .remark-inline-code, .content-box-green .remark-inline-code, .content-box-purple .remark-inline-code, .content-box-red .remark-inline-code, .content-box-yellow .remark-inline-code { background: none; } .full-width { display: flex; width: 100%; flex: 1 1 auto; } </style> <style type="text/css"> blockquote, .blockquote { display: block; margin-top: 0.1em; margin-bottom: 0.2em; margin-left: 5px; margin-right: 5px; border-left: solid 10px #0148A4; border-top: solid 2px #0148A4; border-bottom: solid 2px #0148A4; border-right: solid 2px #0148A4; box-shadow: 0 0 6px rgba(0,0,0,0.5); /* background-color: #e64626; */ color: #e64626; padding: 0.5em; -moz-border-radius: 5px; -webkit-border-radius: 5px; } .blockquote p { margin-top: 0px; margin-bottom: 5px; } .blockquote > h1:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h2:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h3:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h4:first-of-type { margin-top: 0px; margin-bottom: 5px; } .text-shadow { text-shadow: 0 0 4px #424242; } </style> <style type="text/css"> /****************** * Slide scrolling * (non-functional) * not sure if it is a good idea anyway slides > slide { overflow: scroll; padding: 5px 40px; } .scrollable-slide .remark-slide { height: 400px; overflow: scroll !important; } ******************/ .scroll-box-8 { height:8em; overflow-y: scroll; } .scroll-box-10 { height:10em; overflow-y: scroll; } .scroll-box-12 { height:12em; overflow-y: scroll; } .scroll-box-14 { height:14em; overflow-y: scroll; } .scroll-box-16 { height:16em; overflow-y: scroll; } .scroll-box-18 { height:18em; overflow-y: scroll; } .scroll-box-20 { height:20em; overflow-y: scroll; } .scroll-box-24 { height:24em; overflow-y: scroll; } .scroll-box-30 { height:30em; overflow-y: scroll; } .scroll-output { height: 90%; overflow-y: scroll; } </style> $$ \def\sumten{\sum_{i=1}^{10}} $$ $$ \def\sumn{\sum_{i=1}^{n}} $$ # Outline 1. [Introduction](#mvr) 2. [MC Simulations](#mcs) --- class: inverse, center, middle name: mvr # Monte Carlo Simulation: Introduction <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- class: middle .content-box-green[**Monte Carlo Simulation**] A way to test econometric theories via simulation --- class: middle .content-box-green[**How is it used in econometrics?**] + confirm ecoometric theory numerically - OLS estimators are unbiased if `\(E[u|x]=0\)` along with other conditions (theory) - I know the above theory is right, but let's check if it is true numerically + You kind of sense that something in your data may cause problems, but there is no proven econometric theory about what's gonna happen (I used MC simulation for this purpose a lot) + assist students in understanding econometric theories by providing actual numbers instead of a series of Greek letters --- class: middle .content-box-green[**Question**] Suppose you are interested in checking what happens to OLS estimators if `\(E[u|x]=0\)` (the error term and `\(x\)` are not correlated) is violated. Can you use the real data to do this? --- class: middle .content-box-green[**Key part of MC simulation**] <span style = "color: blue;"> You </span> generate data (you have control over how data are generated) + You know the true parameter unlike the real data generating process + You can change only the part that you want to change about data generating process and econometric methods with everything else fixed --- class: middle # Generating data .content-box-green[**Pseudo random number generators**] Algorithms for generating a sequence of numbers whose properties <span style = "color: blue;"> approximate </span> the properties of sequences of random numbers --- class: middle .content-box-green[**Examples in R: Uniform Distribution**] ```r runif(5) # default is min=0 and max=1 ``` ``` ## [1] 0.16009897 0.67268728 0.81840298 0.60820923 0.01744986 ``` --- class: middle ```r x <- runif(10000) hist(x) ``` <img src="data:image/png;base64,#MC_x_files/figure-html/unnamed-chunk-3-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle .content-box-green[**Pseudo random number generator**] + Pseudo random number generators are not really random number generators + What numbers you will get are pre-determined + What numbers you will get can be determined by setting a <span style = "color: red;"> seed </span> .content-box-green[**An example**] ```r set.seed(2387438) runif(5) ``` ``` ## [1] 0.0474233 0.7116970 0.4066674 0.2422949 0.3567480 ``` .content-box-green[**Question**] What benefits does setting a seed have? --- class: middle .content-box-green[**Examples in R: Normal Distribution**] .left5[ `\(x \sim N(0, 1)\)` ```r # default is mean = 0,sd = 1 x <- rnorm(10000) hist(x) ``` <img src="data:image/png;base64,#MC_x_files/figure-html/norm_1-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ `\(x \sim N(2, 2)\)` ```r # mean = 2, sd = 2 x <- rnorm(10000, mean = 2, sd = 2) hist(x) ``` <img src="data:image/png;base64,#MC_x_files/figure-html/norm_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle .content-box-green[**Other distributions**] + Beta + Chi-square + F + Logistic + Log-normal + many others --- class: middle .content-box-green[**d, p, q, r**] For each distribution, you have four different kinds of functions: + <span style = "color: red;"> `d`</span>`norm`: density function + <span style = "color: red;"> `p`</span>`norm`: distribution function + <span style = "color: red;"> `q`</span>`norm`: quantile function + <span style = "color: red;"> `r`</span>`norm`: random draw --- class: middle .content-box-green[**dnorm**] `dnorm(x)` gives you the height of the density function at `\(x\)`. --- class: middle `dnorm(-1)` and `dnorm(2)` <img src="data:image/png;base64,#MC_x_files/figure-html/pnorm2-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle .content-box-green[**pnorm**] `pnorm(x)` gives you the probability that a single random draw is <span style = "color: red;"> less </span> than `\(x\)`. --- class: middle `pnorm(-1)` <img src="data:image/png;base64,#MC_x_files/figure-html/pnorm1_1-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle `pnorm(2)` <img src="data:image/png;base64,#MC_x_files/figure-html/pnorm1_2-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle .content-box-green[**Practice**] What is the probability that a single random draw from a Normal distribution with `mean = 1` and `sd = 2` is less than 1? --- class: middle `qnorm(x)`, where `\(0 < x < 1\)`, gives you a number `\(\pi\)`, where the probability of observing a number from a single random draw is less than `\(\pi\)` with probability of `\(x\)`. We call the output of `qnorm(x)`, `\(x%\)` quantile of the standard Normal distribution (because the default is `mean = 0` and `sd = 1` for `rnorm()`). --- class: middle `qnorm(0.95)` <img src="data:image/png;base64,#MC_x_files/figure-html/qnorm1-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle .content-box-green[**Practice**] What is the 88% quantile of Normal distribution with `mean = 0` and `sd = 9`? --- class: inverse, center, middle name: mcs # Monte Carlo Simulation: Introduction <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1000px></html> --- class: middle .content-box-green[**Monte Carlo Simulation: Steps**] + specify the data generating process + generate data based on the data generating process + get an estimate based on the generated data (e.g. OLS, mean) + repeat the above steps many many times + compare your estimates with the true parameter .content-box-green[**Question**] Why do the steps `\(1-3\)` many many times? --- class: middle # Monte Carlo Simulation: Example 1 Is sample mean really an unbiased estimator of the expected value? That is, is `\(E[\frac{1}{n}\sum_{i=1}^n x_i] = E[x]\)`, where `\(x_i\)` is an independent random draw from the same distribution, --- class: middle .content-box-green[**Sample Mean: Steps 1-3**] ```r #--- steps 1 and 2: ---# # specify the data generating process and generate data x <- runif(100) # Here, E[x]=0.5 #--- step 3 ---# # calculate sample mean mean_x <- mean(x) mean_x ``` ``` ## [1] 0.507078 ``` --- class: middle .content-box-green[**Sample Mean: Step 4**] + repeat the above steps many times + We use a <span style = "color: blue;"> loop </span> to do the same (similar) thing over and over again --- class: middle .content-box-green[**Loop: for loop**] ```r #--- the number of iterations ---# B <- 1000 #--- repeat steps 1-3 B times ---# for (i in 1:B) { print(i) # print i } ``` .content-box-green[**Verbally**] For each of `\(i\)` in `\(1:B\)` `\((1, 2, \dots, 1000)\)`, do `print(i)`. + `i` takes the value of 1, and then `print(1)` + `i` takes the value of 2, and then `print(2)` + ... + `i` takes the value of 999, and then `print(999)` + `i` takes the value of 1000, and then `print(1000)` --- class: middle .content-box-green[**Step 4**] ```r #--- the number of iterations ---# B <- 1000 #--- create a storage that stores estimates ---# estimate_storage_mean <- rep(0, B) #--- repeat steps 1-3 B times ---# for (i in 1:B) { #--- steps 1 and 2: ---# # specify the data generating process and generate data x <- runif(100) # Here, E[x]=0.5 #--- step 3 ---# # calculate sample mean mean_x <- mean(x) estimate_storage_mean[i] <- mean_x } ``` --- class: middle Compare your estimates with the true parameter ```r mean(estimate_storage_mean) ``` ``` ## [1] 0.500199 ``` ```r hist(estimate_storage_mean) ``` <img src="data:image/png;base64,#MC_x_files/figure-html/step_5-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle # Monte Carlo Simulation: Example 2 .content-box-green[**Question**] What happens to `\(\beta_1\)` if `\(E[u|x]\ne 0\)` when estimating `\(y=\beta_0+\beta_1 x + u\)`? --- class: middle ```r #--- load the fixest pacakge for feols() ---# library(fixest) #--- Preparation ---# B <- 1000 # the number of iterations N <- 100 # sample size estimate_storage <- rep(0, B) # estimates storage #--- repeat steps 1-3 B times ---# for (i in 1:B) { #--- steps 1 and 2: ---# mu <- rnorm(N) # the common term shared by both x and u x <- rnorm(N) + mu # independent variable u <- rnorm(N) + mu # error y <- 1 + x + u # dependent variable data <- data.frame(y = y, x = x) #--- OLS ---# reg <- feols(y ~ x, data = data) # OLS estimate_storage[i] <- reg$coefficient["x"] } ``` --- class: middle ```r hist(estimate_storage) ``` <img src="data:image/png;base64,#MC_x_files/figure-html/ex2_res-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle # Examle 3: Variance of OLS Estimators .content-box-green[**Model**] `\begin{aligned} y = \beta_0 + \beta_1 x + u \\ \end{aligned}` + `\(x\sim N(0,1)\)` + `\(u\sim N(0,1)\)` + `\(E[u|x]=0\)` .content-box-green[**Variance of the OLS estimator**] True Variance of `\(\hat{\beta_1}\)`: `\(V(\hat{\beta_1}) = \frac{\sigma^2}{\sum_{i=1}^n (x_i-\bar{x})^2} = \frac{\sigma^2}{SST_X}\)` Its estimator: `\(\widehat{V(\hat{\beta_1})} =\frac{\hat{\sigma}^2}{SST_X} = \frac{\sum_{i=1}^n \hat{u}_i^2}{n-2} \times \frac{1}{SST_X}\)` .content-box-green[**Question**] Does the estimator really work? (Is it unbiased?) --- class: middle ```r set.seed(903478) #--- Preparation ---# B <- 10000 # the number of iterations N <- 100 # sample size beta_storage <- rep(0, B) # estimates storage for beta V_beta_storage <- rep(0, B) # estimates storage for V(beta) x <- rnorm(N) # x values are the same for every iteration SST_X <- sum((x - mean(x))^2) #--- repeat steps 1-3 B times ---# for (i in 1:B) { #--- steps 1 and 2: ---# u <- 2 * rnorm(N) # error y <- 1 + x + u # dependent variable data <- data.frame(y = y, x = x) #--- OLS ---# reg <- feols(y ~ x, data = data) # OLS beta_storage[i] <- reg$coefficient["x"] #* store estimated variance of beta_1_hat V_beta_storage[i] <- vcov(reg)["x", "x"] } ``` --- class: middle .content-box-green[**True Variance**] + `\(SST_X = 112.07\)` + `\(\sigma^2 = 4\)` `$$V(\hat{\beta}) = 4/112.07 = 0.0357$$` .content-box-green[**Check**] Your Estimates of Variance of `\(\hat{\beta_1}\)`? ```r # === mean ===# mean(V_beta_storage) ``` ``` ## [1] 0.03579118 ``` --- class: middle ```r ggplot(data = data.frame(x = V_beta_storage)) + geom_density(aes(x = x)) + geom_vline(xintercept = round(4 / SST_X, digits = 4)) ``` <img src="data:image/png;base64,#MC_x_files/figure-html/unnamed-chunk-7-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle # Exercise .content-box-green[**Problem**] Using MC simulations, find out how the variation in `\(x\)` affects the OLS estimators --- class: middle .content-box-green[**Model setup**] `\begin{aligned} y = \beta_0 + \beta_1 x_1 + u \\ y = \beta_0 + \beta_1 x_2 + u \end{aligned}` + `\(x_1\sim N(0,1)\)` and `\(x_2\sim N(0,9)\)` + `\(u\sim N(0,1)\)` + `\(E[u_1|x]=0\)` and `\(E[u_2|x]=0\)` --- class: middle # Solution ```r #--- Preparation ---# B <- 1000 # the number of iterations N <- 100 # sample size estimate_storage <- matrix(0, B, 2) # estimates storage for (i in 1:B) { #--- generate data ---# x_1 <- rnorm(N, sd = 1) # indep var 1 x_2 <- rnorm(N, sd = 3) # indep var 2 u <- rnorm(N) # error y_1 <- 1 + x_1 + u # dependent variable 1 y_2 <- 1 + x_2 + u # dependent variable 2 data <- data.table(y_1 = y_1, y_2 = y_2, x_1 = x_1, x_2 = x_2) #--- OLS ---# reg_1 <- feols(y_1 ~ x_1, data = data) # OLS reg_2 <- feols(y_2 ~ x_2, data = data) # OLS #--- store coef estimates ---# estimate_storage[i, 1] <- reg_1$coefficient["x_1"] # equation 1 estimate_storage[i, 2] <- reg_2$coefficient["x_2"] # equation 2 } ``` ```r #--- assign new names ---# beta_1s <- estimate_storage[, 1] beta_2s <- estimate_storage[, 2] #--- mean ---# mean(beta_1s) ``` ``` ## [1] 0.9961388 ``` ```r mean(beta_2s) ``` ``` ## [1] 0.9994399 ``` ```r #--- sd ---# sd(beta_1s) ``` ``` ## [1] 0.1041601 ``` ```r sd(beta_2s) ``` ``` ## [1] 0.03322701 ``` --- class: middle # Visualization ```r plot_data_1 <- data.table(x = beta_1s, type = "Equation 1") plot_data_2 <- data.table(x = beta_2s, type = "Equation 2") plot_data <- rbind(plot_data_1, plot_data_2) ggplot(data = plot_data) + geom_density(aes(x = x, fill = type), alpha = 0.5) + scale_fill_discrete(name = "") + xlab("Coefficient Estimate") + theme( legend.position = "bottom" ) ``` <img src="data:image/png;base64,#MC_x_files/figure-html/compare_viz_1-1.png" width="60%" style="display: block; margin: auto;" />