0% found this document useful (0 votes)
135 views12 pages

OLS Estimator: Key Statistical Insights

The document summarizes key statistical properties of the ordinary least squares (OLS) estimator: 1. The OLS estimator (b1) is an unbiased estimate of the true population parameter (β1) if the independent variable is non-stochastic and the expected value of the error term is zero. 2. The variance of the OLS coefficient estimate depends on the variance of the error term and the variation in the independent variable. 3. An unbiased estimator of the variance of the error term (σ2) can be obtained using the mean squared residual from OLS regression. This estimator is important for computing the variance of the OLS slope coefficient.

Uploaded by

Archivo RC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views12 pages

OLS Estimator: Key Statistical Insights

The document summarizes key statistical properties of the ordinary least squares (OLS) estimator: 1. The OLS estimator (b1) is an unbiased estimate of the true population parameter (β1) if the independent variable is non-stochastic and the expected value of the error term is zero. 2. The variance of the OLS coefficient estimate depends on the variance of the error term and the variation in the independent variable. 3. An unbiased estimator of the variance of the error term (σ2) can be obtained using the mean squared residual from OLS regression. This estimator is important for computing the variance of the OLS slope coefficient.

Uploaded by

Archivo RC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Econ 329 - Statistical Properties of the OLS

estimator

Sanjaya DeSilva

September 22, 2008


1 Overview
Recall that the true regression model is

Yi = β0 + β1 Xi + ui (1)

Applying the OLS method to a sample of data, we estimate the sample


regression function
Yi = b0 + b1 Xi + ei (2)

where the OLS estimators are,

Pn
xi yi
b1 = Pi=1
n 2
i=1 xi
b0 = Y − b1 X

2 Unbiasedness
The OLS estimate b1 is simply a sample estimate of the population parameter
β1 . For every random sample we draw from the population, we will get a
different b1 . What then is the relationship between the b1 we obtain from a
random sample and the underlying β1 of the population?
To see this, start by rewriting the OLS estimator as follows;
Pn
xi yi
b1 = Pi=1
n 2
(3)
i=1 xi
n
1 X
= Pn 2
xi (Yi − Y ) (4)
i=1 xi i=1
n n
1 X X
= Pn 2
( x i Yi − xi Y ) (5)
i=1 xi i=1 i=1

1
n n
1 X X
= Pn 2
( x i Yi − Y xi ) (6)
i=1 xi i=1 i=1
n
1 X
= Pn 2
( x i Yi ) (7)
i=1 xi i=1

For Yi , we can substitute the expression for the true regression line in
order to obtain a relationship between b1 and β1 .

n
1 X
b1 = Pn ( xi (β0 + β1 Xi + ui )) (8)
i=1 x2i i=1
n n n
1 X X X
= Pn 2
(β0 xi + β1 xi Xi + xi ui ) (9)
i=1 xi i=1 i=1 i=1
Xn
= β1 + ki u i (10)
i=1

where
xi
ki = Pn (11)
i=1 x2i
From this expression, we see that b1 and β1 are in fact different. How-
ever, we can demonstrate that, under certain assumptions, the average b1 in
repeated sampling would equal β1 . To see this, take the expectation of both
sides of the above expression,

n
X
E(b1 ) = β1 + E( ki u i ) (12)
i=1

If we assume that Xi , and therefore ki is non-stochastic, we can rewrite


this as
n
X
E(b1 ) = β1 + ki E(ui ) (13)
i=1

2
If we also assume that E(ui ) = 0, we get

E(b1 ) = β1 (14)

When the expectation of the sample estimate equals the true parameter,
we say that the estimator is unbiased. To recap, we find that if Xi is non-
stochastic and E(ui ) = 0, the OLS estimator is unbiased.
However, note that these two conditions are not necessary for unbiased-
ness. Suppose ki is not stochastic. Then, b1 is an unbiased if,
n
X n
X
E( ki ui ) = ( E(ki ui )) = 0 (15)
i=1 i=1

That is, if X and u are uncorrelated, the OLS estimator is unbiased.

3 Variance of the Coefficient Estimate


The variance of the b1 sampling distribution is, by definition

V ar(b1 ) = E[b1 − E(b1 )]2 (16)

We showed in the previous section that, under certain classical assump-


tions, E(b1 ) = β1 .
Then,
n
V ar(b1 ) = E[b1 − β1 ]2 = E[ ki ui ]2
X
(17)
i=1
Expanding terms, we get
n n X
ki2 u2i +
X X
V ar(b1 ) = E[ 2ki kj ui uj ] (18)
i=1 i=1 j6=i
n n X
ki2 E(u2i ) +
X X
= 2ki kj E(ui uj ) (19)
i=1 i=1 j6=i

3
If we make the following two additional assumptions,

1. The variance of the error term is constant, i.e. V ar(ui ) = E[u2i ] = σ 2

2. The error terms of different observations are not correlated with each
other or the covariance between all error terms is zero, i.e. E(ui uj ) = 0
for all i 6= j

The expression for the variance of b1 reduces to the following elegant form,

n
σ2
ki2 σ 2 = Pn
X
V ar(b1 ) = (20)
i=1 i=1 x2i
Note that the variance of the slope coefficient depends on two things.
Variance of the slope coefficient increases as

1. The variance of the error term increases

2. The sum of squared variation in the independent variable decreases,


i.e. the X variable is clustered around the mean.

3.1 Estimate of The Variance of the Error Term

Even though the above expression is elegant, it is impossible to compute


the variance of the slope estimate because we don’t know the variance of
the underlying error term. We get around this problem by estimating the
variance of the error term, i.e. σ 2 using the residuals obtained from OLS. It
can be shown that, under certain classical assumptions,

Pn 2
i=1 ei
σˆ2 = (21)
n−2

4
is an unbiased estimator of σ 2 , i.e.
Pn 2
i=1 ei
E[σˆ2 ] = E[ ] = σ2 (22)
n−2

.
For the formal proof, see Gujarati Appendix. Note that this proof also
depends crucially on the classical assumptions.
Note that the denominator of this unbiased estimator is the SSR. The
estimator itself is ofren called the Mean Square Residual. The square root
of the estimator is called the standard error of the regression (SER) and is
typically used as an estimate of the standard deviation of the error term.

4 The Efficiency of the OLS estimator


Under the classical assumptions, the OLS estimator b1 can be written as a
linear function of Y;
X
b1 = k i Yi (23)

where
xi
ki = P 2 (24)
xi
Our goal now is to show that this OLS estimator has a lower variance
than any other linear estimator, i.e. the OLS estimator is efficient or best.
To do so, consider any other linear unbiased estimator,

b∗1 =
X
wi Yi (25)

where wi is some other function of the two variables.

5
The expected value of this estimator is,

E(b∗1 ) =
X X X
wi E(Yi ) = β0 wi + β1 wi Xi (26)

Because b∗1 is unbiased,


E(b∗1 ) = β1 (27)

For this to be the case, it follows that,


X
wi = 0 (28)
X
wi Xi = 1 (29)

It follows from these two identities that,


X X X X
wi xi = wi (Xi − X) = wi Xi − X wi = 1 (30)

The variance of b∗1 is

V ar(b∗1 ) = V ar( wi2 V ar(Yi ) = σ 2 wi2


X X X
wi Yi ) = (31)

If we rewrite the variance as,

V ar(b∗1 ) = σ 2 (wi − ki + ki )2
X
(32)

and expand this expression

X 
V ar(b∗1 ) = σ 2 (wi − ki )2 + ki2 + 2
X X
ki (wi − ki ) (33)

Note that, P
X xi wi 1
ki wi = P 2 = P 2 (34)
xi xi
under the unbiasedness assumption made earlier.

6
In addition,
x2i
P
ki2
X
=P =1 (35)
x2i
Therefore, the variance of b∗1 simplifies to,
X 
V ar(b∗1 ) = σ 2 (wi − ki )2 + ki2
X
(36)

This expression is minimized when

wi = ki (37)

and the minimum variance is,

V ar(b∗1 ) = σ 2 ki2
X
(38)

This completes the proof that, under the classical assumptions, the OLS
estimator is has the least variance among all linear unbiased estimators.

4.1 Consistency

We established that the OLS estimator is unbiased and efficient under classi-
cal assumptions. We can also show easily that the OLS estimator is consistent
under the same assumption.
An estimator is consistent if its variances reaches zero as the sample size
increases. In order to see this, start with the expression for the variance,

σ2
V ar(b1 ) = P 2 (39)
xi
Divide both the denominator and numerator by n.
σ 2 /n
V ar(b1 ) = P 2 (40)
xi /n

7
As n → ∞, the numerator approaches zero whereas the denominator
remains positive. Therefore,

lim V ar(b1 ) = 0
n→∞
(41)

5 Gauss-Markov Theorem and Classical As-


sumptions
To recap, we have demonstrated that the OLS estimator,
P
xi yi X
b1 = P 2 = k i Yi (42)
xi

has the following properties;

1. Unbiased, i.e.
E(b1 ) = β1 (43)

2.
σ2
V ar(b1 ) = P 2 = σ 2 ki2
X
(44)
xi

3. Best or efficient, i.e. has lower variance than any other linear unbiased
estimator, i.e.
V ar(b1 ) < V ar(b∗1 ) (45)

where b∗1 =
P
wi Yi and wi is any other function of xi .

4. Consistent, i.e.
lim V ar(b1 ) = 0 (46)
n→∞

8
if the following classical assumptions are satisfied,

1. The underlying regression model is linear in parameters, has an additive


error and is correctly specified, i.e.

Yi = β0 + β1 f (Xi ) + ui (47)

2. The X variable is non-stochastic, i.e. fixed in repeated sampling.

3. The expected value of the error term is zero, i.e.

E(ui ) = 0 (48)

Note that the intercept term, β0 ensures that this condition is met.
Consider

Yi = β0 + β1 Xi + ui (49)

E(ui ) = k (50)

This is equivalent to a model where

Yi = β0∗ + β1 Xi + u∗i (51)

β0∗ = β0 + 3 (52)

E(ui ) = 0 (53)

Note also that the first three conditions are sufficient for OLS to be
unbiased.

4. The explanatory variable, Xi is uncorrelated with the error term ui ,


i.e.
Cor(X, e) = E[xi ui ] = 0 (54)

9
Note that this assumption is necessary for OLS to be unbiased. Even
if xi is non-stochastic, we can obtain unbiased coefficients if xi is un-
correlated with the error term. Such correlation occurs typically if Xi
is endogenous, i.e. determined by other variables. If both Xi and Yi
are determined by the same unobserved variables, this assumption is
violated. If Xi and Yi are determined by each other, i.e. simultaneous
equations, this assumption is also violated. For example, if

Y = β0 + β1 Xi + ui (55)

Xi = δ0 + δ1 Yi + i (56)

Cor(Xi , ui ) 6= 0 if δ1 6= 0 and/or Cor(ui , i ) 6= 0

5. The error term is homoskedastic, i.e. the conditional variance is a


constant.
V ar(ui |Xi ) = E(u2i |Xi ) = σ 2 (57)

6. The error term is serially uncorrelated, i.e. the error term of one obser-
vation is not correlated with the error term of any other observation.

Cov(ui , uj |Xi , Xj ) = E(ui uj |Xi , Xj ) = 0∀i 6= j (58)

The assumptions of serially uncorrelated and homoskedastic errors al-


low us to obtain an unbiased estimator for the variance of the error
term, and a simple OLS formula for the variance of the coefficient esti-
mate. In addition, we need these two assumptions to demonstrate that
OLS is efficient. In fact, we will see later that other GLS methods are
efficient when these assumptions are violated.

10
There are a few other assumptions that are necessary to obtain OLS
coefficients and standard errors;

1. At least one degree of freedom, i.e. the number of observations must


exceed the number of parameters (n > k + 1) where k is the number of
X variables. In the simple regression with one X variable, this means
there should be at least three observations.

2. No X variable should be a deterministic linear function of other X vari-


ables, i.e. no multicollinearity. This condition applies only to multiple
regressions where there are more than one X variable, and is discussed
later.

3. There should be some variation in the X variable. If the X variable


does not vary, it is impossible to estimate the slope of a regression line.

11

You might also like