01-3: Univariate Regression: OLS Small Sample Property

Small Sample Properties of OLS

Small sample property of OLS estimators

Small sample property (in general)
OLS?

What is an estimator?

A function of data that produces an estimate (actual number) of a parameter of interest once you plug in actual values of data
OLS estimators: $\widehat{\beta}_1=\frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2}$

What is small sample property?

Properties that hold whatever the size of observation (small or large) is prior to obtaining actual estimates (before getting data)

Put more simply: what can you expect from the estimators before you actually get data and obtain estimates?
Difference between small sample property and the algebraic properties we looked at earlier?

OLS is just a way of using available information to obtain estimates. Does it have desirable properties? Why are we using it?

Unbiasedness
Efficiency

As it turns out, OLS is a very good way of using available information!!

What does unbiased even mean?

Let’s first look at a simple problem of estimating the expected value of a single variable ($x$) as a start.

A good estimator of an expected value of a random variable is sample mean: $\frac{1}{n}\sum_i^n x_i$

R code: Sample Mean

Direction

Try running the codes multiple times and feel the tendency of the estimates.

Under certain conditions , OLS estimators are unbiased. That is,

\[ \def\sumn{\sum_{i=1}^{n}} E[\widehat{\beta}_1]=E\Big[\frac{\sumn (x_i-\bar{x})(y_i-\bar{y})}{\sumn (x_i-\bar{x})^2}\Big]=\beta_1 \]

(We do not talk about unbiasedness of $\widehat{\beta}_0$ because we are almost never interested in the intercept. Given the limited time we have, it is not worthwhile talking about it)

SLR.1
SLR.2
SLR.3
SLR.4

Linear in Parameters

In the population model, the dependent variable, $y$, is related to the independent variable, $x$, and the error (or disturbance), $u$, as

\[ y=\beta_0+\beta_1 x+u \]

Note: This definition is from the textbook by Wooldridge

Random sampling

We have a random sample of size $n$, ${(x_i,y_i):i=1,2,\dots,n}$, following the population model.

Non-random sampling

Example: You observe income-education data only for those who have income higher than $\$25K$
Benevolent and malevolent kinds:
- exogenous sampling
- endogenous sampling
We discuss this in more detial later

Variation in covariates

The sample outcomes on $x$, namely, ${x_i,i=1,\dots,n}$, are not all the same value.

Zero conditional mean

The error term $u$ has an expected value of zero given any value of the explanatory variable. In other words,

\[ E[u|x]=0 \]

Along with random sampling condition, this implies that

\[ E[u_i|x_i]=0 \]

Roughly speaking

The independent variable $x$ is not correlated with $u$.

\[ \def\sumn{\sum_{i=1}^{n}} \begin{aligned} \widehat{\beta}_1 = & \frac{\sumn (x_i-\bar{x})(y_i-\bar{y})}{\sumn (x_i-\bar{x})^2} \\\\ = & \frac{\sumn (x_i-\bar{x})y_i}{\sumn (x_i-\bar{x})^2} \;\; \Big[\mbox{because }\sumn (x_i-\bar{x})\bar{y}=0\Big]\\\\ = & \frac{\sumn (x_i-\bar{x})y_i}{SST_x} \;\;\Big[\mbox{where,}\;\; SST_x=\sumn (x_i-\bar{x})^2\Big] \\\\ = & \frac{\sumn (x_i-\bar{x})(\beta_0+\beta_1 x_i+u_i)}{SST_x} \\\\ = & \frac{\sumn (x_i-\bar{x})\beta_0 +\sumn \beta_1(x_i-\bar{x})x_i+\sumn(x_i-\bar{x})u_i}{SST_x} \end{aligned} \]

\[ \begin{aligned} \widehat{\beta}_1 = & \frac{\sumn (x_i-\bar{x})\beta_0 + \beta_1 \sumn (x_i-\bar{x})x_i+\sumn (x_i-\bar{x})u_i}{SST_x} \end{aligned} \]

\[ \begin{aligned} \mbox{Since } & \sumn (x_i-\bar{x})=0\;\; \mbox{and}\\ & \sumn (x_i-\bar{x})x_i=\sumn (x_i-\bar{x})^2=SST_x, \end{aligned} \]

\[ \begin{aligned} \widehat{\beta}_1 = \frac{\beta_1 SST_x+\sumn (x_i-\bar{x})u_i}{SST_x} = \beta_1+(1/SST_x)\sumn (x_i-\bar{x})u_i \end{aligned} \]

\[\widehat{\beta}_1 = \beta_1+(1/SST_x)\sumn (x_i-\bar{x})u_i\]

Taking, expectation of $\widehat{\beta}_1$ conditional on $\mathbf{x}=\{x_1,\dots,x_n\}$,

\[ \begin{align} \Rightarrow E[\widehat{\beta}_1|\mathbf{x}] = & E[\beta_1|\mathbf{x}]+E[(1/SST_x)\sumn (x_i-\bar{x})u_i|\mathbf{x}] \\\\ = & \beta_1 + (1/SST_x)\sumn (x_i-\bar{x}) E[u_i|\mathbf{x}] \end{align} \]

So, if condition 4 $(E[u_i|\mathbf{x}]=0)$ is satisfied,

\[ \def\Ex{E_{x}} \begin{align} E[\widehat{\beta}_1|x] = & \beta_1 \\\\ \Ex[\widehat{\beta}_1|x] = & E[\widehat{\beta}_1] = \beta_1 \end{align} \]

Unbiasedness of OLS in practice

Good empiricists
Unbiasedness of OLS estimators
Let me reiterate

Good empiricists

have ability to judge if the above conditions are satisfied for the particular context you are working on
have ability to correct (if possible) for the problems associated with the violations of any of the above conditions
knows the context well so you can make appropriate judgments

Reconsider the following example

\[ price=\beta_0+\beta_1\times lotsize + u \]

$price$: house price (USD)
$lotsize$: lot size
$u$: error term (everything else)

Questions

What’s in $u$?
Do you think $E[u|x]$ is satisfied? In other words (roughly speaking), is $u$ uncorrelated with $x$?

Unbiasedness property of OLS estimators says nothing about the estimate that we obtain for a given sample
It is always possible that we could obtain an unlucky sample that would give us a point estimate far from $\beta_1$, and we can never know for sure whether this is the case.

Variance of OLS estimator

Introduction
Variance (example)
Variance of OLS estimator
What affects $Var(\widehat{\beta}_{OLS})$?

OLS estimators are random variables because $y$, $x$, and $u$ are random variables (this just means that you do not know the estimates until you get samples).
Variance of OLS estimators is a measure of how much spread in estimates (realized values) you will get.
We let $Var(\widehat{\beta}_{OLS})$ denote the variance of the OLS estimators of $\beta_0$ and $\beta_1$.

Consider two estimators of $E[x]$:

\[\begin{align} \theta_{smart} = & \frac{1}{n} \sum x_i \;\;(n=1000) \\\\ \theta_{naive} = & \frac{1}{10} \sum x_i \end{align}\]

Variance of the estimators

(True) Variance of the OLS Estimator

If $Var(u|x)=\sigma^2$ and the four conditions (we used to prove unbiasedness of the OLS estimator) are satisfied,

\[ \begin{align} Var(\widehat{\beta}_1) = \frac{\sigma^2}{\sumn (x_i-\bar{x})^2}=\frac{\sigma^2}{SST_x} \end{align} \]

(TRUE) Standard Error of the OLS Estimator

The standard error of the the OLS estimator is just a square root of the variance of the OLS estimator. We use $se(\widehat{\beta}_1)$ to denote it.

\[ \begin{aligned} se(\widehat{\beta}_1) = \sqrt{Var(\widehat{\beta}_1)} = \frac{\sigma}{\sqrt{SST_x}} \end{aligned} \]

Variance of the OLS estimators

\[Var(\widehat{\beta}_1|x) = \sigma^2/SST_x\]

What can you learn from this equation?

the variance of OLS estimators is smaller (larger) if the variance of error term is smaller (larger)
the greater (smaller) the variation in the covariate $x$, the smaller (larger) the variance of OLS estimators
- if you are running experiments, spread the value of $x$ as much as possible
- you will rarely have this luxury

Efficiency of OLS Estimators

Nature of error term
Visualization
House Price Example
Gauss-Markov Theorem
Notes

Homoskedasticity

The error $u$ has the same variance give any value of the covariate $x$ $(Var(u|x)=\sigma^2)$

Heterokedasticity

The variance of the error $u$ differs depending on the value of $x$ $(Var(u|x)=f(x))$

Gauss-Markov Theorem

Under conditions $SLR.1$ through $SLR.4$ and the homoskedasticity assumption ($SLR.5$), OLS estimators are the best linear unbiased estimators (BLUEs)

In other words,

No other unbiased linear estimators have smaller variance than the OLS estimators (desirable efficiency property of OLS)

We do NOT need the homoskedasticity condition to prove that OLS estimators are unbiased
In most applications, homoskedasticity condition is not satisfied, which has important implications on:
- estimation of variance (standard error) of OLS estimators
- significance test

( A lot more on this issue later )

Estimating the variance of error

Why?
Problem
Proposal
Algebraic property of OLS
Unbiased estimator
R code
(Math)

Once you estimate $Var(\widehat{\beta}_1|x)$, you can test the statistical significance of $\widehat{\beta}_1$ (More on this later)

We know that $Var(\widehat{\beta}_1|x) = \sigma^2/SST_x$.
You can calculate $SST_x$ because $x$ is observable. So, as long as we know $\sigma^2$, which is $Var(u)$ (the variance of the error term), then we know $Var(\widehat{\beta}_1|x)$.
Since $Var(u_i)=\sigma^2=E[u_i^2] \;\; \Big( Var(u_i)\equiv E[u_i^2]-E[u_i]^2 \Big)$, $\frac{1}{n}\sum_{i=1}^n u_i^2$ is an unbiased estimator of $Var(u_i)$
Unfortunately, we don’t observe $u_i$ (error)

But,

We observe $\widehat{u_i}$ (residuals)!! Can we use residuals instead?

We know $E[\widehat{u}_i-u_i]=0$ (see a mathematical proof here), so, why don’t we use $\widehat{u}_i$ (observable) in place of $u_i$ (unobservable)?

Proposed Estimator of $\sigma^2$

$\frac{1}{n}\sum_{i=1}^n \widehat{u}_i^2$

Unfortunately, $\frac{1}{n}\sum_{i=1}^n \hat{u}_i^2$ is a biased estimator of $\sigma^2$

FOCs of the minimization problem OLS solves

\[\begin{align} \sum_{i=1}^n \widehat{u}_i=0\;\; \mbox{and}\;\; \sum_{i=1}^n x_i\widehat{u}_i=0\notag \end{align}\]

this means that once you know the value of $n-2$ residuals, you can find the value of the other two by solving the above equations
so, it’s almost as if you have $n-2$ value of residuals instead of $n$

Unbiased estimator of $\sigma^2$

$\widehat{\sigma}^2=\frac{1}{n-2}\sum_{i=1}^n \widehat{u}_i^2$ $\;\;\;\;\;\;$($E[\frac{1}{n-2}\sum_{i=1}^n \widehat{u}_i^2]=\sigma^2$)

Hereafter we use $\widehat{Var(\widehat{\beta}_1)}$ to denote the variance of the OLS estimator $\widehat{\beta}_j$, and it is defined as

\[ \widehat{Var(\widehat{\beta}_1)} = \widehat{\sigma}^2/SST_x \]

Since $se(\widehat{\beta}_1)=\sigma/\sqrt{SST_x}$, the natural estimator of $se(\widehat{\beta_1})$ ( standard error of $\widehat{\beta}_1$ ) is

\[ \widehat{se(\widehat{\beta}_1)} =\sqrt{\widehat{\sigma}^2}/\sqrt{SST_x}, \]

Note

Later, we use $\widehat{se(\hat{\beta_1})}$ for testing.

Error and Residual

\[\begin{align} y_i = \beta_0+\beta_1 x_i + u_i \\ y_i = \hat{\beta}_0+\hat{\beta}_1 x_i + \hat{u}_i \end{align}\]

Residuals as unbiased estimators of error

\[\begin{align} \hat{u}_i & = y_i -\hat{\beta}_0-\hat{\beta}_1 x_i \\ \hat{u}_i & = \beta_0+\beta_1 x_i + u_i -\hat{\beta}_0-\hat{\beta}_1 x_i \\ \Rightarrow \hat{u}_i -u_i & = (\beta_0-\hat{\beta}_0)+(\beta_1-\hat{\beta}_1) x_i \\ \Rightarrow E[\hat{u}_i -u_i] & = E[(\beta_0-\hat{\beta}_0)+(\beta_1-\hat{\beta}_1) x_i]=0 \end{align}\]