Univariate
The most important assumption \(E[u|x] = 0\) (zero conditional mean) is almost always violated (unless you data comes from randomized experiments) because all the other variables are sitting in the error term, which can be correlated with \(x\).
Multivariate
More independent variables mean less factors left in the error term, which makes the endogeneity problem less severe
Uni-variate vs. bi-variate
\[\begin{align} \mbox{Uni-variate}\;\; wage = & \beta_0 + \beta_1 educ + u_1 (=u_2+\beta_2 exper)\\ \mbox{Bi-variate}\;\; wage = & \beta_0 + \beta_1 educ + \beta_2 exper + u_2 \end{align}\]What’s different?
uni-variate: \(\widehat{\beta}_1\) is biased unless experience is uncorrelated with education because experience was in error term
bi-variate: able to measure the effect of education on wage, holding experience fixed because experience is modeled explicitly ( We say \(exper\) is controlled for. )
The impact of per student spending (expend
) on standardized test score (avgscore
) at the high school level
More generally,
\[\begin{align} y=\beta_0+\beta_1 x_1 + \beta_2 x_2 + u \end{align}\]Uni-variate
\(y = \beta_0 + \beta_1x + u\),
\(E[u|x]=0\)
Bi-variate
\(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u\),
In the following wage model,
\[\begin{align*} wage = & \beta_0 + \beta_1 educ + \beta_2 exper + u \end{align*}\]Mean independence condition is
\[\begin{align} E[u|educ,exper]=0 \end{align}\]Verbally:
This condition would be satisfied if innate ability of students is on average unrelated to education level and experience.
Model
\[\begin{align} y=\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k + u \end{align}\]Mean independence assumption?
\(\beta_{OLS}\) (OLS estimators of \(\beta\)s) is unbiased if,
\[\begin{align} E[u|x_1,x_2,\dots,x_k]=0 \end{align}\]Verbally: this condition would be satisfied if the error term is uncorrelated wtih any of the independent variables, \(x_1,x_2,\dots,x_k\).
When you are asked to present regression results in assignments or your final paper, use the msummary()
function from the modelsummary
package.
(1) | |
---|---|
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | |
(Intercept) | 8.284*** |
(0.874) | |
dist | 0.166*** |
(0.017) | |
Num.Obs. | 50 |
R2 | 0.651 |
RMSE | 3.09 |
Std.Errors | IID |
OLS
Find the combination of \(\beta\)s that minimizes the sum of squared residuals
So,
Denoting the collection of \(\widehat{\beta}\)s as \(\widehat{\theta} (=\{\widehat{\beta}_0,\widehat{\beta}_1,\dots,\widehat{\beta}_k\})\),
\[\begin{align} Min_{\theta} \sum_{i=1}^n \Big[ y_i-(\widehat{\beta}_0+\widehat{\beta}_1 x_{1,i} + \widehat{\beta}_2 x_{2,i} + \dots + \widehat{\beta}_k x_{k,i}) \Big]^2 \end{align}\]Find the FOCs by partially differentiating the objective function (sum of squared residuals) wrt each of \(\widehat{\theta} (=\{\widehat{\beta}_0,\widehat{\beta}_1,\dots,\widehat{\beta}_k\})\),
\[\begin{align} \sum_{i=1}^n(y_i-(\widehat{\beta}_0+\widehat{\beta}_1 x_{1,i} + \widehat{\beta}_2 x_{2,i} + \dots + \beta_k x_{k,i}) = & 0 \;\; (\widehat{\beta}_0) \\ \sum_{i=1}^n x_{i,1}\Big[ y_i-(\widehat{\beta}_0+\widehat{\beta}_1 x_{1,i} + \widehat{\beta}_2 x_{2,i} + \dots + \beta_k x_{k,i}) \Big]= & 0 \;\; (\widehat{\beta}_1) \\ \sum_{i=1}^n x_{i,2}\Big[ y_i-(\widehat{\beta}_0+\widehat{\beta}_1 x_{1,i} + \widehat{\beta}_2 x_{2,i} + \dots + \beta_k x_{k,i}) \Big]= & 0 \;\; (\widehat{\beta}_2) \\ \vdots \\ \sum_{i=1}^n x_{i,k}\Big[ y_i-(\widehat{\beta}_0+\widehat{\beta}_1 x_{1,i} + \widehat{\beta}_2 x_{2,i} + \dots + \beta_k x_{k,i}) \Big]= & 0 \;\; (\widehat{\beta}_k) \\ \end{align}\]Or more succinctly,
\[\begin{align} \sum_{i=1}^n \widehat{u}_i = & 0 \;\; (\widehat{\beta}_0) \\ \sum_{i=1}^n x_{i,1}\widehat{u}_i = & 0 \;\; (\widehat{\beta}_1) \\ \sum_{i=1}^n x_{i,2}\widehat{u}_i = & 0 \;\; (\widehat{\beta}_2) \\ \vdots \\ \sum_{i=1}^n x_{i,k}\widehat{u}_i = & 0 \;\; (\widehat{\beta}_k) \\ \end{align}\]Important
OLS estimators of multivariate models are unbiased if the following conditions are satisfied.
Condition 1
Your model is correct (Assumption \(MLR.1\))
Condition 2
Random sampling (Assumption \(MLR.2\))
Condition 3
No perfect collinearity (Assumption \(MLR.3\))
Condition 4
Zero Conditional Mean (Assumption \(MLR.4\)) \(E[u|x_1,x_2,\dots,x_k]=0 \;\;\mbox{(Assumption MLR.4)}\)
No Perfect Collinearity (\(MLR.3\))
Any variable cannot be a linear function of the other variables
Example (silly)
\[\begin{align} wage = \beta_0 + \beta_1 educ + \beta_2 (3\times educ) + u \end{align}\]( More on this later when we talk about dummy variables)
Endogeneity: Definition
\[ E[u|x_1,x_2,\dots,x_k] = f(x_1,x_2,\dots,x_k) \ne 0 \]
What could cause endogeneity problem?
Condition 5
Error term is homoeskedastic (Assumption \(MLR.5\))
\[\begin{align} Var(u|x_1,\dots,x_k)=\sigma^2 \end{align}\]Under conditions \(MLR.1\) through \(MLR.5\), conditional on the sample values of the independent variables,
Variance of \(\widehat{\beta}_{OLS}\)
where
Just like uni-variate regression, you need to estimate \(\sigma^2\) if you want to estimate the variance (and standard deviation) of the OLS estimators.
uni-variate regression
\[\begin{align} \widehat{\sigma}^2=\sum_{i=1}^N \frac{\widehat{u}_i^2}{n-2} \end{align}\]multi-variate regression
A model with \(k\) independent variables with intercept.
\[\begin{align} \widehat{\sigma}^2=\sum_{i=1}^N \frac{\widehat{u}_i^2}{n-(k+1)} \end{align}\]You solved \(k+1\) simultaneous equations to get \(\widehat{\beta}_j\) \((j=0,\dots,k)\). So, once you know the value of \(n-k-1\) of the residuals, you know the rest.
Using the estimator of \(\sigma^2\) in place of \(\sigma^2\), we have the estimator of the variance of the OLS estimator.
Estimator of the variance of the OLS estimator
Consider the following simple model,
\[\begin{align} y_i = \beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i} + \beta_3 x_{3,i} + u_i \end{align}\]Suppose you are interested in estimating only \(\beta_1\).
Let’s consider the following two methods,
Method 1: Regular OLS
Regress \(y\) on \(x_1\), \(x_2\), and \(x_3\) with an intercept to estimate \(\beta_0\), \(\beta_1\), \(\beta_2\), \(\beta_3\) at the same time (just like you normally do)
Method 2: 3-step
Frisch-Waugh–Lovell theorem
Methods 1 and 2 produces the same coefficient estimate on \(x_1\)
\[\widehat{\beta}_1 = \widehat{\alpha_1}\]
Step 1
Regress \(y\) on \(x_2\) and \(x_3\) with an intercept and get residuals, which we call \(\widehat{u}_y\)
Step 2
Regress \(x_1\) on \(x_2\) and \(x_3\) with an intercept and get residuals, which we call \(\widehat{u}_{x_1}\)
Step 3
Regress \(\widehat{u}_y\) on \(\widehat{u}_{x_1}\), which produces an estimte of \(\beta_1\) that is identical to that you can get from regressin \(y\) on \(x_1\), \(x_2\), and \(x_3\)
Regressing \(y\) on all explanatory variables \((x_1\), \(x_2\), and \(x_3)\) in a multivariate regression is as if you are looking at the impact of a single explanatory variable with the effects of all the other effects partiled out
In other words, including variables beyond your variable of interest lets you control for (remove the effect of) other variables, avoiding confusing the impact of the variable of interest with the impact of other variables.