01-3: Univariate Regression: OLS Small Sample Property

Small Sample Properties of OLS


Small sample property of OLS estimators

What is an estimator?

  • A function of data that produces an estimate (actual number) of a parameter of interest once you plug in actual values of data

  • OLS estimators: \(\widehat{\beta}_1=\frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2}\)


What is small sample property?

Properties that hold whatever the size of observation (small or large) is prior to obtaining actual estimates (before getting data)

  • Put more simply: what can you expect from the estimators before you actually get data and obtain estimates?

  • Difference between small sample property and the algebraic properties we looked at earlier?

OLS is just a way of using available information to obtain estimates. Does it have desirable properties? Why are we using it?

  • Unbiasedness
  • Efficiency

As it turns out, OLS is a very good way of using available information!!

Unbiasedness of OLS estimator

What does unbiased even mean?

Let’s first look at a simple problem of estimating the expected value of a single variable (\(x\)) as a start.

  • A good estimator of an expected value of a random variable is sample mean: \(\frac{1}{n}\sum_i^n x_i\)

R code: Sample Mean


Direction

Try running the codes multiple times and feel the tendency of the estimates.

Under certain conditions , OLS estimators are unbiased. That is,

\[ \def\sumn{\sum_{i=1}^{n}} E[\widehat{\beta}_1]=E\Big[\frac{\sumn (x_i-\bar{x})(y_i-\bar{y})}{\sumn (x_i-\bar{x})^2}\Big]=\beta_1 \]

(We do not talk about unbiasedness of \(\widehat{\beta}_0\) because we are almost never interested in the intercept. Given the limited time we have, it is not worthwhile talking about it)

Linear in Parameters

In the population model, the dependent variable, \(y\), is related to the independent variable, \(x\), and the error (or disturbance), \(u\), as

\[ y=\beta_0+\beta_1 x+u \]

Note: This definition is from the textbook by Wooldridge

Random sampling

We have a random sample of size \(n\), \({(x_i,y_i):i=1,2,\dots,n}\), following the population model.

Non-random sampling

  • Example: You observe income-education data only for those who have income higher than \(\$25K\)
  • Benevolent and malevolent kinds:
    • exogenous sampling
    • endogenous sampling
  • We discuss this in more detial later

Variation in covariates

The sample outcomes on \(x\), namely, \({x_i,i=1,\dots,n}\), are not all the same value.

Zero conditional mean

The error term \(u\) has an expected value of zero given any value of the explanatory variable. In other words,

\[ E[u|x]=0 \]

Along with random sampling condition, this implies that

\[ E[u_i|x_i]=0 \]

Roughly speaking

The independent variable \(x\) is not correlated with \(u\).

\[ \def\sumn{\sum_{i=1}^{n}} \begin{aligned} \widehat{\beta}_1 = & \frac{\sumn (x_i-\bar{x})(y_i-\bar{y})}{\sumn (x_i-\bar{x})^2} \\\\ = & \frac{\sumn (x_i-\bar{x})y_i}{\sumn (x_i-\bar{x})^2} \;\; \Big[\mbox{because }\sumn (x_i-\bar{x})\bar{y}=0\Big]\\\\ = & \frac{\sumn (x_i-\bar{x})y_i}{SST_x} \;\;\Big[\mbox{where,}\;\; SST_x=\sumn (x_i-\bar{x})^2\Big] \\\\ = & \frac{\sumn (x_i-\bar{x})(\beta_0+\beta_1 x_i+u_i)}{SST_x} \\\\ = & \frac{\sumn (x_i-\bar{x})\beta_0 +\sumn \beta_1(x_i-\bar{x})x_i+\sumn(x_i-\bar{x})u_i}{SST_x} \end{aligned} \]

\[ \begin{aligned} \widehat{\beta}_1 = & \frac{\sumn (x_i-\bar{x})\beta_0 + \beta_1 \sumn (x_i-\bar{x})x_i+\sumn (x_i-\bar{x})u_i}{SST_x} \end{aligned} \]

\[ \begin{aligned} \mbox{Since } & \sumn (x_i-\bar{x})=0\;\; \mbox{and}\\ & \sumn (x_i-\bar{x})x_i=\sumn (x_i-\bar{x})^2=SST_x, \end{aligned} \]

\[ \begin{aligned} \widehat{\beta}_1 = \frac{\beta_1 SST_x+\sumn (x_i-\bar{x})u_i}{SST_x} = \beta_1+(1/SST_x)\sumn (x_i-\bar{x})u_i \end{aligned} \]

\[\widehat{\beta}_1 = \beta_1+(1/SST_x)\sumn (x_i-\bar{x})u_i\]

Taking, expectation of \(\widehat{\beta}_1\) conditional on \(\mathbf{x}=\{x_1,\dots,x_n\}\),

\[ \begin{align} \Rightarrow E[\widehat{\beta}_1|\mathbf{x}] = & E[\beta_1|\mathbf{x}]+E[(1/SST_x)\sumn (x_i-\bar{x})u_i|\mathbf{x}] \\\\ = & \beta_1 + (1/SST_x)\sumn (x_i-\bar{x}) E[u_i|\mathbf{x}] \end{align} \]

So, if condition 4 \((E[u_i|\mathbf{x}]=0)\) is satisfied,

\[ \def\Ex{E_{x}} \begin{align} E[\widehat{\beta}_1|x] = & \beta_1 \\\\ \Ex[\widehat{\beta}_1|x] = & E[\widehat{\beta}_1] = \beta_1 \end{align} \]

Unbiasedness of OLS in practice

Good empiricists

  • have ability to judge if the above conditions are satisfied for the particular context you are working on

  • have ability to correct (if possible) for the problems associated with the violations of any of the above conditions

  • knows the context well so you can make appropriate judgments

Reconsider the following example

\[ price=\beta_0+\beta_1\times lotsize + u \]

  • \(price\): house price (USD)
  • \(lotsize\): lot size
  • \(u\): error term (everything else)

Questions

  • What’s in \(u\)?
  • Do you think \(E[u|x]\) is satisfied? In other words (roughly speaking), is \(u\) uncorrelated with \(x\)?
  • Unbiasedness property of OLS estimators says nothing about the estimate that we obtain for a given sample

  • It is always possible that we could obtain an unlucky sample that would give us a point estimate far from \(\beta_1\), and we can never know for sure whether this is the case.

Variance of OLS estimator

  • OLS estimators are random variables because \(y\), \(x\), and \(u\) are random variables (this just means that you do not know the estimates until you get samples).

  • Variance of OLS estimators is a measure of how much spread in estimates (realized values) you will get.

  • We let \(Var(\widehat{\beta}_{OLS})\) denote the variance of the OLS estimators of \(\beta_0\) and \(\beta_1\).

Consider two estimators of \(E[x]\):

\[\begin{align} \theta_{smart} = & \frac{1}{n} \sum x_i \;\;(n=1000) \\\\ \theta_{naive} = & \frac{1}{10} \sum x_i \end{align}\]

Variance of the estimators

(True) Variance of the OLS Estimator

If \(Var(u|x)=\sigma^2\) and the four conditions (we used to prove unbiasedness of the OLS estimator) are satisfied,

\[ \begin{align} Var(\widehat{\beta}_1) = \frac{\sigma^2}{\sumn (x_i-\bar{x})^2}=\frac{\sigma^2}{SST_x} \end{align} \]


(TRUE) Standard Error of the OLS Estimator

The standard error of the the OLS estimator is just a square root of the variance of the OLS estimator. We use \(se(\widehat{\beta}_1)\) to denote it.

\[ \begin{aligned} se(\widehat{\beta}_1) = \sqrt{Var(\widehat{\beta}_1)} = \frac{\sigma}{\sqrt{SST_x}} \end{aligned} \]

Variance of the OLS estimators

\[Var(\widehat{\beta}_1|x) = \sigma^2/SST_x\]


What can you learn from this equation?

  • the variance of OLS estimators is smaller (larger) if the variance of error term is smaller (larger)

  • the greater (smaller) the variation in the covariate \(x\), the smaller (larger) the variance of OLS estimators

    • if you are running experiments, spread the value of \(x\) as much as possible
    • you will rarely have this luxury

Efficiency of OLS Estimators

Homoskedasticity

The error \(u\) has the same variance give any value of the covariate \(x\) \((Var(u|x)=\sigma^2)\)


Heterokedasticity

The variance of the error \(u\) differs depending on the value of \(x\) \((Var(u|x)=f(x))\)

Gauss-Markov Theorem

Under conditions \(SLR.1\) through \(SLR.4\) and the homoskedasticity assumption (\(SLR.5\)), OLS estimators are the best linear unbiased estimators (BLUEs)


In other words,

No other unbiased linear estimators have smaller variance than the OLS estimators (desirable efficiency property of OLS)

  • We do NOT need the homoskedasticity condition to prove that OLS estimators are unbiased

  • In most applications, homoskedasticity condition is not satisfied, which has important implications on:

    • estimation of variance (standard error) of OLS estimators
    • significance test

( A lot more on this issue later )

Estimating the variance of error

Once you estimate \(Var(\widehat{\beta}_1|x)\), you can test the statistical significance of \(\widehat{\beta}_1\) (More on this later)

  • We know that \(Var(\widehat{\beta}_1|x) = \sigma^2/SST_x\).

  • You can calculate \(SST_x\) because \(x\) is observable. So, as long as we know \(\sigma^2\), which is \(Var(u)\) (the variance of the error term), then we know \(Var(\widehat{\beta}_1|x)\).

  • Since \(Var(u_i)=\sigma^2=E[u_i^2] \;\; \Big( Var(u_i)\equiv E[u_i^2]-E[u_i]^2 \Big)\), \(\frac{1}{n}\sum_{i=1}^n u_i^2\) is an unbiased estimator of \(Var(u_i)\)

  • Unfortunately, we don’t observe \(u_i\) (error)

But,

We observe \(\widehat{u_i}\) (residuals)!! Can we use residuals instead?

We know \(E[\widehat{u}_i-u_i]=0\) (see a mathematical proof here), so, why don’t we use \(\widehat{u}_i\) (observable) in place of \(u_i\) (unobservable)?


Proposed Estimator of \(\sigma^2\)

\(\frac{1}{n}\sum_{i=1}^n \widehat{u}_i^2\)


Unfortunately, \(\frac{1}{n}\sum_{i=1}^n \hat{u}_i^2\) is a biased estimator of \(\sigma^2\)

FOCs of the minimization problem OLS solves

\[\begin{align} \sum_{i=1}^n \widehat{u}_i=0\;\; \mbox{and}\;\; \sum_{i=1}^n x_i\widehat{u}_i=0\notag \end{align}\]
  • this means that once you know the value of \(n-2\) residuals, you can find the value of the other two by solving the above equations
  • so, it’s almost as if you have \(n-2\) value of residuals instead of \(n\)

Unbiased estimator of \(\sigma^2\)

\(\widehat{\sigma}^2=\frac{1}{n-2}\sum_{i=1}^n \widehat{u}_i^2\) \(\;\;\;\;\;\;\)(\(E[\frac{1}{n-2}\sum_{i=1}^n \widehat{u}_i^2]=\sigma^2\))


Hereafter we use \(\widehat{Var(\widehat{\beta}_1)}\) to denote the variance of the OLS estimator \(\widehat{\beta}_j\), and it is defined as

\[ \widehat{Var(\widehat{\beta}_1)} = \widehat{\sigma}^2/SST_x \]


Since \(se(\widehat{\beta}_1)=\sigma/\sqrt{SST_x}\), the natural estimator of \(se(\widehat{\beta_1})\) ( standard error of \(\widehat{\beta}_1\) ) is

\[ \widehat{se(\widehat{\beta}_1)} =\sqrt{\widehat{\sigma}^2}/\sqrt{SST_x}, \]

Note

Later, we use \(\widehat{se(\hat{\beta_1})}\) for testing.

Error and Residual

\[\begin{align} y_i = \beta_0+\beta_1 x_i + u_i \\ y_i = \hat{\beta}_0+\hat{\beta}_1 x_i + \hat{u}_i \end{align}\]

Residuals as unbiased estimators of error

\[\begin{align} \hat{u}_i & = y_i -\hat{\beta}_0-\hat{\beta}_1 x_i \\ \hat{u}_i & = \beta_0+\beta_1 x_i + u_i -\hat{\beta}_0-\hat{\beta}_1 x_i \\ \Rightarrow \hat{u}_i -u_i & = (\beta_0-\hat{\beta}_0)+(\beta_1-\hat{\beta}_1) x_i \\ \Rightarrow E[\hat{u}_i -u_i] & = E[(\beta_0-\hat{\beta}_0)+(\beta_1-\hat{\beta}_1) x_i]=0 \end{align}\]