0% found this document useful (0 votes)
64 views20 pages

OLS Estimates: Finite-Sample Properties

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views20 pages

OLS Estimates: Finite-Sample Properties

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ECON 120B Econometrics

Lecture 2 Part 3: Properties of the OLS Estimates

Xinwei Ma

Department of Economics
UC San Diego

Spring 2021

c Xinwei Ma 2021 0/16


Outline

Finite-Sample Properties of β̂0 and β̂1

Expected Values of β̂0 and β̂1

c Xinwei Ma 2021 0/16


Finite-Sample Properties of β̂0 and β̂1

 We started from the zero conditional mean assumption

E[u|X ] = 0,

and obtained the two population conditions

0 = E[Y − β0 − β1 X ], 0 = E[X (Y − β0 − β1 X )].


 In order to estimate β0 and β1 , we rely on a sample, (Xi , Yi ) : i = 1, 2, · · · , n, and replace
the infeasible population conditions by their sample analogues
n n n n n
! ! ! ! !
1X 1X 1X 1X 1X 2
0= Yi −β̂0 −β̂1 Xi , 0 = Xi Yi −β̂0 Xi −β̂1 X .
n i=1 n i=1 n i=1 n i=1 n i=1 i

 Finally, the slope estimate β̂1 is solved as


n
 n  n  n
1 P 1 P 1 P 1 P
Xi Yi − Xi Yi (Xi − X )(Yi − Y )
n i=1 n i=1 n i=1 n i=1
β̂1 = 2
= n .
1 Pn

1 P n
 1 P 2
Xi2 − Xi Xi − X
n i=1 n i=1 n i=1

 Once we have β̂1 , we compute β̂0 = Y − β̂0 X . This is the OLS intercept estimate.

 We also define the residuals, ûi = Yi − β̂0 − β̂1 Xi .


c Xinwei Ma 2021 1/16
Finite-Sample Properties of β̂0 and β̂1
Pn
 The OLS residuals always add up to zero: i=1 ûi = 0.

 Because Yi = Ŷi + ûi by definition, Y = Ŷ . In other words, the sample average of the
actual Yi is the same as the sample average of the fitted values Ŷi .

 The sample covariance (and therefore the sample correlation) between the explanatory
variables and the residuals is always zero:
n
X
Xi ûi = 0.
i=1

 Because Ŷi = β̂0 + β̂1 Xi , the fitted value and the residual are uncorrelated in the sample,
too:
X n
Ŷi ûi = 0.
i=1

 The point (X , Y ) is always on the OLS regression line. That is, if we plug in the average
for X , we predict the sample average for Y : Y = β̂0 + β̂1 X .

 These properties hold by construction: β̂0 and β̂1 were chosen to make them true.
c Xinwei Ma 2021 2/16
Finite-Sample Properties of β̂0 and β̂1

 For each observation, write Yi = Ŷi + ûi .

 Define the total sum of squares (SST), explained sum of squares (SSE) – Stata calls this
the “model sum of squares” – and residual sum of squares (or sum of squared residuals)
as
Xn X n X n
SST = (Yi − Y )2 SSE = (Ŷi − Y )2 SSR = ûi2 .
i=1 i=1 i=1

 Each of these is a sample variance when divided by n (or n − 1). SST/n is the sample
variance of Yi , SSE/n is the sample variance of Ŷi , and SSR/n is the sample variance of
ûi .

 By writing
n
X n h
X i2 n h
X i2
SST = (Yi − Y )2 = (Yi − Ŷi ) + (Ŷi − Y ) = ûi + (Ŷi − Y )
i=1 i=1 i=1

and using the fact that the fitted value and the residual are uncorrelated, we can show

SST = SSE + SSR.

c Xinwei Ma 2021 3/16


Finite-Sample Properties of β̂0 and β̂1

 Recall:
SST = SSE + SSR.

 Assuming SST > 0 (so that Yi is not constant), we can define the fraction of the total
variation in Yi that is explained by Xi (or the OLS regression line) as

SSE SSR
R2 = =1− .
SST SST
This is called the R-squared of the regression.

 It can be shown to equal the square of the correlation between Yi and Ŷi . Therefore,
0 ≤ R 2 ≤ 1.

 R 2 = 0 means no linear relationship between Yi and Xi . R 2 = 1 means a perfect linear


relationship.

 As R 2 increases, the Yi are closer and closer to falling on the OLS regression line.

 Do not overemphasize the importance of R 2 . It is a useful summary measure but tells us


nothing about causality. Having a “high” R-squared is neither necessary nor sufficient
to infer causality.

c Xinwei Ma 2021 4/16


Finite-Sample Properties of β̂0 and β̂1

 EXAMPLE. Class size and test score


• Test scores and class sizes in 1990 in 420 California school districts that serve kindergarten
through eighth grade.

 Stata output
. reg testScore stuTeacherRatio

Source | SS df MS Number of obs = 420


-------------+---------------------------------- F(1, 418) = 22.56
Model | 7789.39296 1 7789.39296 Prob > F = 0.0000
Residual | 144312.057 418 345.244156 R-squared = 0.0512
-------------+---------------------------------- Adj R-squared = 0.0489
Total | 152101.45 419 363.010621 Root MSE = 18.581

---------------------------------------------------------------------------------
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -2.279063 .4798082 -4.75 0.000 -3.2222 -1.335925
_cons | 698.9222 9.467187 73.83 0.000 680.3129 717.5314
---------------------------------------------------------------------------------

c Xinwei Ma 2021 5/16


Finite-Sample Properties of β̂0 and β̂1

 EXAMPLE. Class size and test score


• Test scores and class sizes in 1990 in 420 California school districts that serve kindergarten
through eighth grade.

 Plot the residuals


720

50
700
680

Residuals
660

0
640
620
600

10 15 20 25 30
-50

Student-Teacher Ratio
10 15 20 25 30
Average Test Score Fitted values Student-Teacher Ratio

c Xinwei Ma 2021 6/16


Finite-Sample Properties of β̂0 and β̂1

 DIGRESSION. Where does the name “ordinary least squares” come from?

 For any candidates β̃0 and β̃1 , define a fitted value for each data point i as
Ỹi = β̃0 + β̃1 Xi , i = 1, 2, · · · , n.
It is the value we predict for Yi given the value Xi and the candidates β̃0 and β̃1 .

 The mistake we make is the residual:


ũi = Yi − Ỹi = Yi − β̃0 − β̃1 Xi ,
and we have n residuals.

 Suppose we measure the size of the mistake, for each i, by squaring the residual: ũi2 .
Then we add them all up:
n n
X X 2
ũi2 = Yi − β̃0 − β̃1 Xi .
i=1 i=1

This quantity is called the sum of squared residuals.

 Choosing β̃0 and β̃1 to minimize the sum of squared residuals, it can be shown (using
calculus or other arguments) that the solutions are the slope and intercept estimates we
obtained before. That is, β̃0 = β̂0 and β̃1 = β̂1 .
c Xinwei Ma 2021 7/16
Outline

Finite-Sample Properties of β̂0 and β̂1

Expected Values of β̂0 and β̂1

c Xinwei Ma 2021 7/16


Expected Values of β̂0 and β̂1

 We motivated simple regression using a population model.

 Analysis so far purely algebraic, based on a sample of data.

 We have to study statistical properties of the OLS estimator, referring to a population


model and assuming random sampling.

 Mathematical statistics: How do our estimators behave across different samples of data?
On average, would we get the right answer if we could repeatedly sample?

 We need to find the expected value of the OLS estimators — in effect, the average
outcome across all possible random samples — and determine if we are right on average.

 Leads to the notion of unbiasedness.

c Xinwei Ma 2021 8/16


Expected Values of β̂0 and β̂1

 Assumption 1: Linear Model


The population model can be written as

Y = β0 + β1 X + u,

where β0 and β1 are the (unknown) population parameters.

• We view X and u as outcomes of random variables; thus, Y is of course random.

• Stating this assumption formally shows that our goal is to estimate β0 and β1 .

 Assumption 2: Random Sampling


We have a random sample, {(Xi , Yi ) : i = 1, ..., n}, following the population model.

• The observations are independently and identically distributed (iid).

• We know how to use these data to estimate β0 and β1 by the OLS.

• Because each i is a draw from the population, we can write


Yi = β0 + β1 Xi + ui .
• Notice that ui here is the unobserved error for observation i. It is not the residual that we
compute from the data!

c Xinwei Ma 2021 9/16


Expected Values of β̂0 and β̂1

 Assumption 3: Zero Conditional Mean


In the population, the error term has zero mean given any value of the explanatory
variable: E[u|X ] = 0.

• This is the key assumption for showing that OLS is unbiased.

• We emphasized its importance if we would like to draw causal relationship form data.

 Assumption 4: Sample Variation in the Regressor


Variance of X is strictly positive: V[X ] > 0.

• This implies that, with high probability, we will not observe all Xi taking the same value.

• In practice, this is hardly an assumption at all, unless the sample size is extremely small.

 Assumption 5: Finite Moments


Both the regressor and the error term have finite fourth moments: E[X 4 ] < ∞ and
E[u 4 ] < ∞.

• This is a technical condition which allows us to compute the variance of the OLS estimates.

• This condition can be understood as “no outliers.”


c Xinwei Ma 2021 10/16
Expected Values of β̂0 and β̂1

 How do we show β̂1 is unbiased for β1 ? What we need to show is

E[β̂1 ] = β1

where the expected value means averaging across random samples.

 We prove this in four steps.

 Step 1: Write down a formula for β̂1 . We have


1 Pn 1 Pn
(Xi − X )(Yi − Y ) (Xi − X )Yi
n i=1 n i=1
β̂1 = n = n .
1 P 2 1 P 2
Xi − X Xi − X
n i=1 n i=1
The second equality comes from
n n n
" # " #
1X 1X 1X
(Xi − X )(Yi − Y ) = (Xi − X )Yi − (Xi − X )Y
n i=1 n i=1 n i=1
n n
" # " #
1 X 1X
= (Xi − X )Yi − Y (Xi − X ) .
n i=1 n i=1
| {z }
=0
c Xinwei Ma 2021 11/16
Expected Values of β̂0 and β̂1

 Step 2: Replace each Yi with Yi = β0 + β1 Xi + ui . Then the numerator in β̂1 becomes


n n
1X 1X  
(Xi − X )Yi = (Xi − X ) β1 Xi + ui
n i=1 n i=1
n n
" #
1X 1X
= β1 (Xi − X )Xi + (Xi − X )ui
n i=1 n i=1
n n
" #
1X 1X
= β1 (Xi − X )2 + (Xi − X )ui .
n i=1 n i=1
The third equality comes from
n n n
" # " #
1X 1X 1X
(Xi − X )2 = (Xi − X )Xi − (Xi − X )X
n i=1 n i=1 n i=1
n n
" # " #
1X 1X
= (Xi − X )Xi − X (Xi − X ) .
n i=1 n i=1
| {z }
=0
Therefore,
1 Pn
(Xi − X )ui
n i=1
β̂1 = β1 + n .
1 P 2
Xi − X
n i=1
c Xinwei Ma 2021 12/16
Expected Values of β̂0 and β̂1

 Step 3: Compute E[β̂1 ] using the law of iterated expectation:


 
E[β̂1 ] = E E[β̂1 |X1 , X2 , · · · , Xn ] ,

where the inner expectation is conditional on all the regressors, X1 , X2 , · · · , Xn . Formally,


we have
1 Pn
" (Xi − X )ui #
n i=1
E[β̂1 |X1 , X2 , · · · , Xn ] = E β1 + X , X , · · · , Xn
1 P n 2 1 2
Xi − X
n i=1
h1 Pn i
E (Xi − X )ui X1 , X2 , · · · , Xn
n i=1
= β1 + n
1 P 2
Xi − X
n i=1
n
1 P h i
(Xi − X )E ui X1 , X2 , · · · , Xn
n i=1
= β1 + n .
1 P 2
Xi − X
n i=1

c Xinwei Ma 2021 13/16


Expected Values of β̂0 and β̂1

 Step 4: Compute the conditional expectation of the error terms:


h i h i
E ui X1 , X2 , · · · , Xn = E ui Xi random sampling/iid
=0 zero conditional mean.

Therefore,
1 Pn h i
(Xi − X )E ui X1 , X2 , · · · , Xn
n i=1
E[β̂1 |X1 , X2 , · · · , Xn ] = β1 + n = β1 .
1 P 2
Xi − X
n i=1

And by the law of iterated expectation,

E[β̂1 ] = β1 .

c Xinwei Ma 2021 14/16


Expected Values of β̂0 and β̂1

 We showed that the slope estimate, β̂1 , is unbiased

E[β̂1 ] = β1 .

 The proof for β̂0 is similar, and is a good exercise! (Try not to look at the next slide.)

 Unbiasedness is a property of the procedure. After estimating an equation like

\
testScore = 698.9 − 2.28 stuTeacherRatio,

it is tempting to say −2.28 is an “unbiased estimate” of the effect of class size on test
score. Technically, this statement is incorrect. The procedure used to get β̂0 = 698.9 and
β̂1 = −2.28 is unbiased, so that if we were able to regenerate the data and recompute β̂0
and β̂1 , they will be unbiased on average.

 Among our assumptions, the focus should mainly be on the zero conditional mean
assumption. What are the omitted factors? Are they likely to be correlated with X ? If
so, this assumption fails and our estimates will be biased!

c Xinwei Ma 2021 15/16


Expected Values of β̂0 and β̂1

 To show that β̂0 is unbiased for β0 , we recall its formula


β̂0 = Y − β̂1 X .
 We will also use the earlier conclusion that the slope estimate β̂1 is unbiased:
E[β̂1 |X1 , X2 , · · · , Xn ] = β1 .
 The conditional expectation of β̂0 is
E[β̂0 |X1 , X2 , · · · , Xn ] = E[Y − β̂1 X |X1 , X2 , · · · , Xn ]

= E[ β0 + β1 X + u −β̂1 X |X1 , X2 , · · · , Xn ]
| {z }
=Y

= E[β0 − (β̂1 − β1 )X + u|X1 , X2 , · · · , Xn ]

= β0 − E[β̂1 − β1 |X1 , X2 , · · · , Xn ] X + E[u|X1 , X2 , · · · , Xn ]


| {z }
=0
n
" #
1X
= β0 + E ui X1 , X2 , · · · , Xn = β0 .
n i=1
| {z }
=0

 By the law of iterated expectation, E[β̂0 ] = β0 .


c Xinwei Ma 2021 16/16
The lectures and course materials, including slides, tests, outlines, and similar
materials, are protected by U.S. copyright law and by University policy. You may take
notes and make copies of course materials for your own use. You may also share those
materials with another student who is enrolled in or auditing this course.

You may not reproduce, distribute or display (post/upload) lecture notes or


recordings or course materials in any other way – whether or not a fee is charged –
without my written consent. You also may not allow others to do so.

If you do so, you may be subject to student conduct proceedings under the UC San
Diego Student Code of Conduct.

c Xinwei Ma 2021
x1ma@[Link]

c Xinwei Ma 2021 16/16

You might also like