What is an estimator?
A function of data that produces an estimate (actual number) of a parameter of interest once you plug in actual values of data
OLS estimators: \(\widehat{\beta}_1=\frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2}\)
What is small sample property?
Properties that hold whatever the size of observation (small or large) is prior to obtaining actual estimates (before getting data)
Put more simply: what can you expect from the estimators before you actually get data and obtain estimates?
Difference between small sample property and the algebraic properties we looked at earlier?
OLS is just a way of using available information to obtain estimates. Does it have desirable properties? Why are we using it?
As it turns out, OLS is a very good way of using available information!!
What does unbiased even mean?
Let’s first look at a simple problem of estimating the expected value of a single variable (\(x\)) as a start.
R code: Sample Mean
Direction
Try running the codes multiple times and feel the tendency of the estimates.
Under certain conditions , OLS estimators are unbiased. That is,
\[ \def\sumn{\sum_{i=1}^{n}} E[\widehat{\beta}_1]=E\Big[\frac{\sumn (x_i-\bar{x})(y_i-\bar{y})}{\sumn (x_i-\bar{x})^2}\Big]=\beta_1 \]
(We do not talk about unbiasedness of \(\widehat{\beta}_0\) because we are almost never interested in the intercept. Given the limited time we have, it is not worthwhile talking about it)
Linear in Parameters
In the population model, the dependent variable, \(y\), is related to the independent variable, \(x\), and the error (or disturbance), \(u\), as
\[ y=\beta_0+\beta_1 x+u \]
Note: This definition is from the textbook by Wooldridge
Random sampling
We have a random sample of size \(n\), \({(x_i,y_i):i=1,2,\dots,n}\), following the population model.
Non-random sampling
Variation in covariates
The sample outcomes on \(x\), namely, \({x_i,i=1,\dots,n}\), are not all the same value.
Zero conditional mean
The error term \(u\) has an expected value of zero given any value of the explanatory variable. In other words,
\[ E[u|x]=0 \]
Along with random sampling condition, this implies that
\[ E[u_i|x_i]=0 \]
Roughly speaking
The independent variable \(x\) is not correlated with \(u\).
\[ \def\sumn{\sum_{i=1}^{n}} \begin{aligned} \widehat{\beta}_1 = & \frac{\sumn (x_i-\bar{x})(y_i-\bar{y})}{\sumn (x_i-\bar{x})^2} \\\\ = & \frac{\sumn (x_i-\bar{x})y_i}{\sumn (x_i-\bar{x})^2} \;\; \Big[\mbox{because }\sumn (x_i-\bar{x})\bar{y}=0\Big]\\\\ = & \frac{\sumn (x_i-\bar{x})y_i}{SST_x} \;\;\Big[\mbox{where,}\;\; SST_x=\sumn (x_i-\bar{x})^2\Big] \\\\ = & \frac{\sumn (x_i-\bar{x})(\beta_0+\beta_1 x_i+u_i)}{SST_x} \\\\ = & \frac{\sumn (x_i-\bar{x})\beta_0 +\sumn \beta_1(x_i-\bar{x})x_i+\sumn(x_i-\bar{x})u_i}{SST_x} \end{aligned} \]
\[ \begin{aligned} \widehat{\beta}_1 = & \frac{\sumn (x_i-\bar{x})\beta_0 + \beta_1 \sumn (x_i-\bar{x})x_i+\sumn (x_i-\bar{x})u_i}{SST_x} \end{aligned} \]
\[ \begin{aligned} \mbox{Since } & \sumn (x_i-\bar{x})=0\;\; \mbox{and}\\ & \sumn (x_i-\bar{x})x_i=\sumn (x_i-\bar{x})^2=SST_x, \end{aligned} \]
\[ \begin{aligned} \widehat{\beta}_1 = \frac{\beta_1 SST_x+\sumn (x_i-\bar{x})u_i}{SST_x} = \beta_1+(1/SST_x)\sumn (x_i-\bar{x})u_i \end{aligned} \]
\[\widehat{\beta}_1 = \beta_1+(1/SST_x)\sumn (x_i-\bar{x})u_i\]
Taking, expectation of \(\widehat{\beta}_1\) conditional on \(\mathbf{x}=\{x_1,\dots,x_n\}\),
\[ \begin{align} \Rightarrow E[\widehat{\beta}_1|\mathbf{x}] = & E[\beta_1|\mathbf{x}]+E[(1/SST_x)\sumn (x_i-\bar{x})u_i|\mathbf{x}] \\\\ = & \beta_1 + (1/SST_x)\sumn (x_i-\bar{x}) E[u_i|\mathbf{x}] \end{align} \]
So, if condition 4 \((E[u_i|\mathbf{x}]=0)\) is satisfied,
\[ \def\Ex{E_{x}} \begin{align} E[\widehat{\beta}_1|x] = & \beta_1 \\\\ \Ex[\widehat{\beta}_1|x] = & E[\widehat{\beta}_1] = \beta_1 \end{align} \]
Good empiricists
have ability to judge if the above conditions are satisfied for the particular context you are working on
have ability to correct (if possible) for the problems associated with the violations of any of the above conditions
knows the context well so you can make appropriate judgments
Reconsider the following example
\[ price=\beta_0+\beta_1\times lotsize + u \]
Questions
Unbiasedness property of OLS estimators says nothing about the estimate that we obtain for a given sample
It is always possible that we could obtain an unlucky sample that would give us a point estimate far from \(\beta_1\), and we can never know for sure whether this is the case.
OLS estimators are random variables because \(y\), \(x\), and \(u\) are random variables (this just means that you do not know the estimates until you get samples).
Variance of OLS estimators is a measure of how much spread in estimates (realized values) you will get.
We let \(Var(\widehat{\beta}_{OLS})\) denote the variance of the OLS estimators of \(\beta_0\) and \(\beta_1\).
Consider two estimators of \(E[x]\):
\[\begin{align} \theta_{smart} = & \frac{1}{n} \sum x_i \;\;(n=1000) \\\\ \theta_{naive} = & \frac{1}{10} \sum x_i \end{align}\]Variance of the estimators
(True) Variance of the OLS Estimator
If \(Var(u|x)=\sigma^2\) and the four conditions (we used to prove unbiasedness of the OLS estimator) are satisfied,
\[ \begin{align} Var(\widehat{\beta}_1) = \frac{\sigma^2}{\sumn (x_i-\bar{x})^2}=\frac{\sigma^2}{SST_x} \end{align} \]
(TRUE) Standard Error of the OLS Estimator
The standard error of the the OLS estimator is just a square root of the variance of the OLS estimator. We use \(se(\widehat{\beta}_1)\) to denote it.
\[ \begin{aligned} se(\widehat{\beta}_1) = \sqrt{Var(\widehat{\beta}_1)} = \frac{\sigma}{\sqrt{SST_x}} \end{aligned} \]
Variance of the OLS estimators
\[Var(\widehat{\beta}_1|x) = \sigma^2/SST_x\]
What can you learn from this equation?
the variance of OLS estimators is smaller (larger) if the variance of error term is smaller (larger)
the greater (smaller) the variation in the covariate \(x\), the smaller (larger) the variance of OLS estimators
Homoskedasticity
The error \(u\) has the same variance give any value of the covariate \(x\) \((Var(u|x)=\sigma^2)\)
Heterokedasticity
The variance of the error \(u\) differs depending on the value of \(x\) \((Var(u|x)=f(x))\)
Gauss-Markov Theorem
Under conditions \(SLR.1\) through \(SLR.4\) and the homoskedasticity assumption (\(SLR.5\)), OLS estimators are the best linear unbiased estimators (BLUEs)
In other words,
No other unbiased linear estimators have smaller variance than the OLS estimators (desirable efficiency property of OLS)
We do NOT need the homoskedasticity condition to prove that OLS estimators are unbiased
In most applications, homoskedasticity condition is not satisfied, which has important implications on:
( A lot more on this issue later )
Once you estimate \(Var(\widehat{\beta}_1|x)\), you can test the statistical significance of \(\widehat{\beta}_1\) (More on this later)
We know that \(Var(\widehat{\beta}_1|x) = \sigma^2/SST_x\).
You can calculate \(SST_x\) because \(x\) is observable. So, as long as we know \(\sigma^2\), which is \(Var(u)\) (the variance of the error term), then we know \(Var(\widehat{\beta}_1|x)\).
Since \(Var(u_i)=\sigma^2=E[u_i^2] \;\; \Big( Var(u_i)\equiv E[u_i^2]-E[u_i]^2 \Big)\), \(\frac{1}{n}\sum_{i=1}^n u_i^2\) is an unbiased estimator of \(Var(u_i)\)
Unfortunately, we don’t observe \(u_i\) (error)
But,
We observe \(\widehat{u_i}\) (residuals)!! Can we use residuals instead?
We know \(E[\widehat{u}_i-u_i]=0\) (see a mathematical proof here), so, why don’t we use \(\widehat{u}_i\) (observable) in place of \(u_i\) (unobservable)?
Proposed Estimator of \(\sigma^2\)
\(\frac{1}{n}\sum_{i=1}^n \widehat{u}_i^2\)
Unfortunately, \(\frac{1}{n}\sum_{i=1}^n \hat{u}_i^2\) is a biased estimator of \(\sigma^2\)
FOCs of the minimization problem OLS solves
\[\begin{align} \sum_{i=1}^n \widehat{u}_i=0\;\; \mbox{and}\;\; \sum_{i=1}^n x_i\widehat{u}_i=0\notag \end{align}\]Unbiased estimator of \(\sigma^2\)
\(\widehat{\sigma}^2=\frac{1}{n-2}\sum_{i=1}^n \widehat{u}_i^2\) \(\;\;\;\;\;\;\)(\(E[\frac{1}{n-2}\sum_{i=1}^n \widehat{u}_i^2]=\sigma^2\))
Hereafter we use \(\widehat{Var(\widehat{\beta}_1)}\) to denote the variance of the OLS estimator \(\widehat{\beta}_j\), and it is defined as
\[ \widehat{Var(\widehat{\beta}_1)} = \widehat{\sigma}^2/SST_x \]
Since \(se(\widehat{\beta}_1)=\sigma/\sqrt{SST_x}\), the natural estimator of \(se(\widehat{\beta_1})\) ( standard error of \(\widehat{\beta}_1\) ) is
\[ \widehat{se(\widehat{\beta}_1)} =\sqrt{\widehat{\sigma}^2}/\sqrt{SST_x}, \]
Note
Later, we use \(\widehat{se(\hat{\beta_1})}\) for testing.
Error and Residual
\[\begin{align} y_i = \beta_0+\beta_1 x_i + u_i \\ y_i = \hat{\beta}_0+\hat{\beta}_1 x_i + \hat{u}_i \end{align}\]Residuals as unbiased estimators of error
\[\begin{align} \hat{u}_i & = y_i -\hat{\beta}_0-\hat{\beta}_1 x_i \\ \hat{u}_i & = \beta_0+\beta_1 x_i + u_i -\hat{\beta}_0-\hat{\beta}_1 x_i \\ \Rightarrow \hat{u}_i -u_i & = (\beta_0-\hat{\beta}_0)+(\beta_1-\hat{\beta}_1) x_i \\ \Rightarrow E[\hat{u}_i -u_i] & = E[(\beta_0-\hat{\beta}_0)+(\beta_1-\hat{\beta}_1) x_i]=0 \end{align}\]