程序代寫案例-ECO00005M

時間：2021-07-11

Endogeneity Issues

ECO00005M

Applied Microeconometrics

Professor Cheti Nicoletti

cheti.nicoletti@york.ac.uk

1

What are we going to learn about endogeneity

?Definition of the endogeneity issue.

?Consequences of this issue for the OLS estimation.

?Potential causes of endongeneity.

?Methods to solve endogeneity issues.

? Instrumental variable estimation (two-stage least

squares, 2SLS) and the conditions that the instruments

should satisfy.

?Formulas for the computation of instrumental variable

estimation and two-stage least squares estimation

?How to choose between OLS and 2SLS estimation.

2

References for endogeneity

?BASIC STARTING KNOWLEDGE

Wooldrige J.M. Introductory Econometrics: A Modern

Approach, Sixth Edition,

Chapter 15 Instrumental Variables Estimation and Two Stage Least Squares

?MORE ADVANCED KNOWLEDGE

Wooldrige J.M. Econometric Analysis of Cross Section and

Panel Data, Second Edition,

Chapter 5 Instrumental Variables Estimation of Single-Equation Linear Models,

Chapter 6 Additional Single-Equation Topics *

3

Definition of endogeneity

?Suppose we have a linear regression model:

?Definition: Exogeneity and Endogeneity of Independent

Variables.

? is exogenous if it is uncorrelated with u.

? is endogenous if it is correlated with u.

?OLS (Ordinary least squares) estimation of the linear

regression model requires exogeneity for consistency.

?Homework: Assuming that be identically and

independently distributed as N(0,1) but correlated with

one of the explanatory variables, show why endogeneity

implies that the OLS estimation is biased.

4

0 1 1y k kx x u? ? ?= + +?+ +

Causes of endogeneity

? Endogeneity can be caused by many things.

? An important variable that is not observed and omitted

? Functional form specification

? Reverse causality

? Simultaneity

? Measurement error in the regressors

? ...

? Endogeneity is present in most applications in applied

economic research.

5

Omitted Variables

?Let us begin with the case whether the true regression

model has only two explanatory variables k=2

y = 0 + 11 + 22 + where E 1, 2 = 0

?But we omit the variable 2 and estimate the model

y = 0 + 11 + ? where ? = 22 +

?Now, if (1, 2) ≠ 0 and 2 ≠ 0, we do not have

E ? 1 = 0 and we would have an omitted variable bias.

?Solution: Instrumental Variable, Proxy Variable

6

Consequence of the omission of a relevant variables

What happens if we omit 2 i.e. a variable that actually

belongs to the true model?

True model y = 0 + 11 + 22 +

?Consider the linear projection of 2 on 1

2 = 0 + 11 +

?Then by definition E() = 0 and cov , 1 = 0

?Let plug in the true model the equation for 2

y = 0 + 11 + 2(0 + 11 + ) +

y = 0 + 20 + (1+21)1 + 2 +

y = 0 + 11 + so ?1 = 1 + 21

?Asymptotic bias (21)

7

Omission of a relevant variable when the true

regression model has of k independent variables

? if the true parameter for the omitted variable is zero, then

estimates are still unbiased.

? If the omitted variable is uncorrelated with all other independent

variables, the estimators are still unbiased

? If the omitted variable is correlated with at least one independent

variable, this can cause a bias for all estimates.

Homework: Compute the asymptotic bias caused by the omission

of xk.

Notice that

? If we include irrelevant variables in the model, the estimators are

still unbiased, but the variance of the estimation increases.

8

0 1 1y k kx x u? ? ?= + +?+ +

Using a Proxy Variable

for Unobserved Explanatory Variables

?A more difficult problem arises when a model excludes a

key variable, usually because of data unavailability.

?Example: Return to Education

? the population (true) model is:

? Suppose we do not observe the ability (abil). Ignoring abil would

generally give biased and inconsistent estimates of the return to

education.

? We expect an upward bias for the estimated return to education.

Why?

? How can we solve or at least mitigate this omitted variable

problem?

9

( ) 0 1 2 3log wage educ exper abil u? ? ? ?= + + + +

?One possibility is to use a proxy variable for the omitted

variable.

? Something that is related to the unobserved variable.

? In the wage equation one could use the intelligence

quotient, or IQ as a proxy for ability. IQ and ability do not

need to be the same, but they need to be correlated.

?Suppose we have the model

y = 0 + 11 + 22 + 33

? +

with 3

? being unobserved. We have a proxy variable 3

?What do we require of 3?

? 3 must be relevant in explaining 3

?, i.e. in regression

3

?= 0 + 33 + 3 we have 3 ≠ 0

? If 3 = 0 , the proxy is not good.

10

? Replace 3

? with 3, i.e. just regress y on 1, 2 and 3.

This is called the plug-in solution to the omitted

variables problem.

? Since 3 and 3

? are not the same: when does this

procedure give consistent estimators for 1 and 2?

? The assumptions are with respect to u and 3:

Assumption 1. In addition to assuming that u and 1, 2 and 3

? are

uncorrelated, we need that u and 3 be uncorrelated. This means

that 3 is irrelevant in the population model once 1, 2 and 3

? are

included.

Assumption 2. The error 3 is uncorrelated with 1, 2 and 3.

This means that 3 is a good proxy for 3

?: (3

? 1, 2, 3 =

(3

? 3

11

? From the latter assumption follows, that

(3

? 1, 2, 3 =0 + 33

? In terms of our wage equation this means:

, , = = 0 + 3

thus the mean value of ability only changes with IQ.

? More formally, what are the implications of the two

assumptions?

12

?By combining

y = 0 + 11 + 22 + 33

? +

3

? = 0 + 33 + 3

we obtain:

y = 0 + 11 + 22 + 3(0 + 33 + 3) +

y = 0 + 30 + 11 + 22 + 333 + 33 +

? Now let us denote e = 33 + as the composite error.

? Note that u and 3 have both zero mean and each is uncorrelated

with 1, 2 and 3 (see assumptions 1 and 2 in slide 11) . Then e

has zero mean and is uncorrelated with 1, 2 and 3.

?For this reason, we can write

y = 0 + 11 + 22 + 33 +

? The OLS estimation of the above equation is consistent for

0, 1, 2 and 3.

? We do not get unbiased estimators for 0 and 3.

? Empirically 3may even be of more interest than 3.

Functional form misspecification

?Special case: omission of a relevant variable 1

2.

?Suppose y = 0 + 11 + 21

2 + with (|1, 1

2) = 0

?But we estimate y = 0 + 11 + ? where ? = 21

2 +

?Now, since cov(1, 1

2) ≠ 0 and if 2 ≠ 0, we do not have

? 1 = 0 and we would have a bias due to functional

form misspecification.

?Solution: Test for functional form (RESET), Non-

parametric and semiparametric methods, more flexible

parametric specifications of the model.

14

Simultaneity

? If an explanatory variable is determined simultaneously

with the dependent variable, then it is correlated with the

error term.

? In this case OLS is biased and inconsistent.

?As an example we consider two equations (structural

equations) without an intercept:

1 = 12 + 11 + 1

2 = 21 + 22 + 2

the variables 1 and 2 are exogenous.

?We focus on estimation of the first equation.

15

1 = 12 + 11 + 1

2 = 21 + 22 + 2

?To show that the dependent variables are generally

correlated with the error terms (e.g. 2 with 1), we

replace into the second equation 1 with the right hand

side of the second equation:

2 = 2(12 + 11 + 1) + 22 + 2

(1 ? 21)2= 211 + 22 + 21 + 2

? In order to solve for 2 we have to assume 21 ≠ 1.

? It depends on the application whether this is restrictive.

16

(1 ? 21)2= 211 + 22 + 21 + 2

Can be rewritten as:

2 = 211 + 222 + 2

where

21=21/(1 ? 21)

22=2/(1 ? 21)

2=(21 + 2)/(1 ? 21)

This is a reduced form equation for 2.

? 21 and 22 are reduced form parameters.

? 2 is linear in 1and 2. For this reason, it is uncorrelated with 1

and 2. We can apply OLS to estimate 21 and 22.

? There is an equivalent reduced form equation for 1.

17

2 = 211 + 222 + 2

?We can use this equation to show that OLS estimation of

the structural equations will generally result in biased

and inconsistent estimates for and :

1 = 12 + 11 + 1

?From the reduced form equation, we see that 2 and 1

are correlated if 2 and 1 are correlated. Since 2

linearly depends on 1, it is generally correlated with 1.

?When is it not correlated?

? If 2 = 0 and if 1 and 2 are uncorrelated.

? In this case 2 is not simultaneously determined with 1.

18

?When 2 is correlated with 1 because of simultaneity,

the OLS suffers from simultaneity bias and it is

inconsistent.

?Obtaining the direction of the bias is generally

complicated. Simple expressions of the bias can be

derived under additional assumptions but this is not

covered here.

?Solution: Instrumental Variable estimation

19

Measurement error in an explanatory variable

?We consider the simple regression model:

y = 0 + 11

? +

and assume that it satisfies the Gauss Markov

assumptions.

?We do not observe 1

? but 1(e.g. actual and reported

income).

?The measurement error in the population is: 1 = 1 ? 1

?

?We assume: (1) = 0

?Moreover, we assume that is uncorrelated with 1

? and

1:

E y 1, 1

? = E(y|1

?)

20

?The model can be written as: y = 0 + 11 + ? 11

?The classical errors-in-variables (CEV) assumption is

that the measurement error is uncorrelated with the

unobserved explanatory variable: (1, 1

?) = 0

? This has the meaning that the observed measure 1 consists of

two uncorrelated components: 1 = 1

? + 1

? (We still assume that is uncorrelated with 1 and 1

?.)

? The above assumption implies that 1 and 1 must be correlated:

(1, 1) = (1, 1) = (1, 1

?) + 1

2 = 0 + 1

2

? This correlation causes problems for the OLS estimation.

21

? This implies for our model y = 0 + 11 + ( ? 11) that

since and 1 are uncorrelated, the covariance between 1 and the

composite error ( ? 11) is:

1, ? 11 = ?1(1, 1)=?11

2

? Note also that (1) = (1

?) + (1) = 1?

2 + 1

2

? Then one can show:

?1 = 1 +

(1, ? 11)

(1)

= 1 ?

11

2

1

?

2 +1

2 =1 1 ?

1

2

1

?

2 +1

2

=1

1

?

2

1

?

2 +1

2

? This equation is very interesting: ?1 is always closer to

zero than 1: attenuation bias 22

? OLS is biased in the classical error in variables model:

? If 1 is positive, it will underestimate 1 and vice versa.

? Things are more complicated if we look at the multiple

regression model but again OLS will be biased and inconsistent.

?Solution: Instrumental Variable estimation, ....

23

Instrumental Variable Estimation

? Suppose we have an endogenous independent variable.

? How can we obtain a good estimate for the coefficient on the

endogenous variable?

? This can be achieved if there is an instrumental variable available.

24

? First, we look at the simple regression model:

y = 0 + 1 +

? Now take another variable with cov(x, ) ≠ 0. Then,

, = 1(, )+(, )

1 =

, ? (, )

(, )

? Under the additional assumption , = 0, we

have

1 =

,

(, )

25

?A natural estimator for 1 is therefore

,

(,)

with the

population covariances replaced by their sample

analogues:

?1 =

σ=1

( ? ?)( ? ?)

σ=1

( ? ?) ( ? ?)

?This estimator is consistent for 1 but it is inconsistent if

, ≠ 0.

?The estimator can be biased in small samples even if

, = 0.

? If x is exogenous, it can be used as an instrument and

then the IV estimator is identical to OLS.

?The natural estimator for 0 is simply:

?0 = ? ? ?1 ?

26

? A variable z is a candidate for an instrument for a variable x if it

satisfies the following conditions:

cov(x, ) ≠ 0 and , = 0

? Some remarks on the choice of an instrument:

? It is often difficult to find a good instrument.

? A proxy variable does not make a good instrument as it is supposed to

be correlated with the error term.

? Example: Consider the regression of log wage on education and ability

with error term u. Ability is not observed but IQ is observed and highly

correlated with ability. IQ is a potential candidate for a proxy for ability

but clearly violates the condition , = , =0. (HOME

WORK: Explain why IQ is a potential candidate for a proxy for ability but

clearly violates the condition cov(IQ,u)=0. Explain why IQ cannot be

used an instrument for education.)

? Instruments for education: One may use family background variables

such as the number of siblings: negatively correlated with education but

maybe uncorrelated with ability. The latter, however, is unclear as ability

is not observed.

27

Inference with the IV estimator

?The IV estimator has an asymptotic normal distribution.

?When we impose a homoscedasticity assumption

conditional to the instrument,

E 2| = () = 2,

one can derive the asymptotic variance of ?1 which is

( ?1) =

2

2,

2

?This provides us a standard error for the IV estimator.

?As with the OLS estimator, the asymptotic variance of

the IV estimator decreases to zero at the rate 1/n.

? If ,

2 is small, the variance of the IV estimator is large.

?The asymptotic variance of the IV estimator is always

larger and sometimes much larger than the asymptotic

variance of the OLS estimator. 28

Estimating (?) =

,

?The population variance of the error term 2 can be

estimated just like in the case of the OLS regression:

?2 =

1

? 2

?

=1

?

2

where ? are now the residuals from the IV regression.

?The population variance of x can be estimated by the

sample variance /:

?The square of the population correlation between x and z

can be estimated by the R-squared of the regression of

x on z: ,

2 .

?Then a consistent estimator is: ?( ?1) =

?2

,

2

?Example: Return to Education for Married Women

? Data: MROZ.dta

? Simple log – level regression model:

log(wage) = 0 + 1 +

? We obtain OLS estimates:

?log(wage) = ?0.185 + 0.109

0.205 (0.014)

= 428, 2 = 0.118

? We use father’s education as an instrument for education.

? We cannot empirically check whether ability and father’s

education are uncorrelated. However, we can test, whether

education and father’s education are correlated.

30

?Example: cont.

? When we regress educ on fatheduc, we obtain

?educ = 10.240 + 0.269 ?

0.105 (0.011)

= 428, 2= 0.173

This suggests that there is significant positive correlation and

about 17% of the variation in educ is explained by father’s educ.

? When we use father’s educ as instrument for educ, we obtain:

?log(wage) = 0.441 + 0.059

0.446 (0.035)

= 428, 2 = 0.093

? The IV estimate of the return of education is about one half of the

OLS estimate, suggesting that there is omitted ability bias.

? The IV standard error is much larger than the OLS standard error

and the IV 95% Conf. Interval contains the OLS estimate.

? While this empirical example suggests that differences in

estimates between IV and OLS are practically large, they are not

statistically significant.

31

?There are similar IV applications with other data sets

which yield larger IV estimates than OLS estimates.

?Larger IV estimates than OLS estimates may suggest

some measurement error issues that cause an

underestimation of the OLS or might suggest that the IV

is invalid because correlated with the error term.

?Since just a little correlation between z and u can already

cause serious problems for the IV estimator, this is an

important issue.

? IV estimation can be also applied in case of a binary

endogenous regressor or a binary instrumental variable.

32

IV Estimation with a poor Instrumental Variable

? IV estimates can have large standard errors if x and z are only

weakly correlated. (Don’t use IV in this case.)

? IV estimates can have a large asymptotic bias if z and u are only

weakly correlated:

?1 = 1 +

(, )

(, )

This implies that the bias can be large if the population correlation

between z and x is small even if the population correlation between

z and u is small. (HOME WORK: prove the equality above. HINTS:

plim ?1 =

(,)

(,)

and y = 0 + 1 + )

? For this reason IV can be worse in terms of consistency than OLS

even if Corr(z,u) is small (provided that (, ) is also small).

? One can show that IV is only superior in terms of asymptotic bias if

(, ) / (, ) < (, )

33

R-Squared and IV Estimation

2 = 1 ? /

?SSR (sum of squared IV residuals) can be larger than

SST (total sum of squares). For this reason the R-

squared can be become negative and it is smaller than

for OLS.

? It is not clear whether the IV R-squared should be

reported after IV estimation.

34

IV Estimation of the Multiple Regression Model

? The idea of IV estimation can be easily extended to the multiple

regression case.

? For this purpose we change a bit the notation.

? The model is now: y = + with = 0,

? is Kx1 vector

? = (1, 2, 3, … , ?1 , ) is a 1xK vector

and is endogenous i.e. (, ) ≠ 0 .

? We need an instrument for to obtain consistent estimates.

? We need another exogenous variable 1 with cov(1, ) = 0.

? Let = (1, 2, 3, … , ?1 , 1), then E(

′) = and is

exogenous.

35

? The instrument 1 must be relevant to explain the endogenous

variable once controlled for all remaining exogenous

explanatory variables, i.e.

= 1 + 22 + 33 +?+?1?1 +11 + ,

where 1 ≠ 0 and by defintion is uncorrelated with all

exogenous variables and () = 0.

? This implies that (′) has full rank (rank condition)

? We can test 1 ≠ 0 using a t-test.

y = +

′y = ′ + ′

(′y) = (′) + (′)

(′y) = (′)

= [(′)]?1(′y)

? (′y) and (′) can be consistently estimated using the

corresponding sample moments.

36

?Given a random sample i=1,....,N the instrumental

variables estimator of is

? =

1

?

=1

′

?1

1

?

=1

′

= ′ ?1 ′

where =

1 2,1 3,1

? ?

1 2, 3,

… ?1,1 1,1

? ? ?

… ?1, 1,

is a N x K matrix

=

1 2,1 3,1

? ?

1 2, 3,

… ?1,1 ,1

? ? ?

… ?1, ,

is a N x K matrix

=

1

?

is N x 1 vector, = 1 2, 3, … ?1, ,

′

= 1 2, 3, … ?1, 1, 37

?Example: College Proximity as IV for Education

? Data: Card.dta See do file in the VLE

? Log(wage) is dependent variable, several controls (exper expersq

black smsa south smsa66 reg662-reg669) plus the endogenous

education

? Instrument for education: dummy if someone grew up near a four

year college (nearc4).

? We assume that nearc4 is uncorrelated with the error. Moreover, to

be a valid instrument it has to be partially correlated with educ.

? We can test this by estimating in stata the equation:

regress educ nearc4 exper expersq black smsa south smsa66 reg662-reg669

? The t-statistic is 3.64 and therefore if nearc4 is uncorrelated with the

error term, we can use it as IV for educ.

38

_cons 16.63825 .2406297 69.14 0.000 16.16644 17.11007

reg669 .210271 .2024568 1.04 0.299 -.1866975 .6072395

reg668 .5238914 .2674749 1.96 0.050 -.0005618 1.048344

reg667 -.2168177 .2343879 -0.93 0.355 -.6763953 .2427598

reg666 -.3028147 .2370712 -1.28 0.202 -.7676536 .1620242

reg665 -.2726165 .2184204 -1.25 0.212 -.7008858 .1556528

reg664 .117182 .2172531 0.54 0.590 -.3087984 .5431624

reg663 -.027939 .1833745 -0.15 0.879 -.3874918 .3316139

reg662 -.0786363 .1871154 -0.42 0.674 -.4455241 .2882514

smsa66 .0254805 .1057692 0.24 0.810 -.1819071 .2328682

south -.0516126 .1354284 -0.38 0.703 -.3171548 .2139296

smsa .4021825 .1048112 3.84 0.000 .1966732 .6076918

black -.9355287 .0937348 -9.98 0.000 -1.11932 -.7517377

expersq .0008686 .0016504 0.53 0.599 -.0023674 .0041046

exper -.4125334 .0336996 -12.24 0.000 -.4786101 -.3464566

nearc4 .3198989 .0878638 3.64 0.000 .1476194 .4921785

educ Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 21562.0801 3,009 7.16586243 Root MSE = 1.9405

Adj R-squared = 0.4745

Residual 11274.4622 2,994 3.76568542 R-squared = 0.4771

Model 10287.6179 15 685.841194 Prob > F = 0.0000

F(15, 2994) = 182.13

Source SS df MS Number of obs = 3,010

. regress educ nearc4 exper expersq black smsa south smsa66 reg662-reg669

39

? The following table reports OLS and IV estimates.

? IV estimate is almost twice as large as the OLS estimate.

? SE of the IV estimate is 18 times larger. This is the price we have to

pay if we use an instrument to obtain a consistent estimator.

Dependent Variable: log(wage)

Independent

Variable

(1) OLS (2) IV

Educ 0.075

(0.003)

0.132

(0.055)

Exper 0.085

(0.007)

0.108

(0.024)

Exper^2 -0.0023

(0.0003)

-0.0023

(0.0003)

…other controls … …

Observations

R-squared

3,010

0.300

3,010

0.238

40

Two Stage Least Squares (2SLS)

?Sometimes there are multiple valid IVs for an

endogenous explanatory variable.

?Suppose the variables 1, 2, … , satisfy

(?, ) = 0 for ? = 1,… ,

?We could simply use all of them as instruments and

obtain multiple IV estimators.

?The idea is to use all IV together to obtain a more

efficient estimator:

? Let = 1, 1, … , ?1 , 1, 2, … , be 1x L with L=K+M.

? As each element of z is uncorrelated with u, any linear

combination is also uncorrelated with u.

41

? The linear combination of z which is most highly correlated with is

the linear projection of on z

= 1 + 22 + 33 +?+?1?1 +11 +?+ +

where by definition is uncorrelated with all exogenous variables

? and and () = 0

? is correlated with u if is endogenous but

[1 + 22 + 33 +?+?1?1 +11 +?+ ]

is not correlated with u.

? This means that we can replace the endogenous variable with it

prediction (using OLS estimation of the linear model for ):

? = ?1 + ?22 + ?33 +?+ ??1?1 + ?11 +?+ ?

Which is exogenous

? We require that at least one is non-zero. Use F-test to test the null

hypothesis the all instruments have zero effects.

42

?Now, let and use it as the

instruments for :

? It can be shown that and thus

?This estimator is consistent under the conditions:

, , ,

?The last condition suggests that we need at least as

many instruments as explanatory variables in the model

(order condition)

43

44

?Under homoscedasticity it is possible to show

that

?The variance matrix can be estimated by with

?Econometric packages have 2SLS implemented. There

is no need to perform the two stages manually. If you

compute it manually, OLS standard errors and statistics

for the second stage are not valid (as there is a

composite error in the second regression).

?The IV estimator with multiple instruments is also called

two stage least squares (2SLS) estimator:

? One can show that the IV estimates are identical to OLS

estimates from the regression of

This is the second stage.

? The first stage is the regression of

?Multicollinearity is a bigger problem for 2SLS than for

OLS . This is for two reasons:

? has less variation than .

? has more correlation with than .

45

Some Remarks:

? If the R-squared of on the exogenous variables

[1, 1, … , ?1 ] is very large, the standard error of 2SLS

explodes. Can be verified with data at hand.

?2SLS can be also used in models with more than one

endogenous variable.

? We need more candidates for instruments to achieve

identification.

? The sufficient condition for identification is the rank condition.

?Since R-squared after 2SLS cannot be compared to OLS

R-squared we must be careful.

?The standard errors and tests for a 2SLS estimation

produced manually will be wrong.

? It is possible to derive a statistic with an approximate F-

distribution in large samples. Use econometric packages

to run the 2SLS with correct standard errors.

46

Testing for Endogeneity

? It is useful to have a test for endogeneity of an

explanatory variable to show whether 2SLS is even

necessary.

?Suppose we have the structural equation

1 = 12 + 11 + 22 + 1

where 2 is endogenous and 3 and 4 are two

exogenous variables (i.e. uncorrelated with 1) but

relevant to explain 2.

?Hausman (1978) suggests a test which directly

compares OLS and 2SLS estimates and determines

whether differences are statistically significant (Hausman

Test).

47

?The idea behind the test is as follows:

? We have:

1 = 0 + 12 + 21 + 32 + 1 (main regression)

and

2 = 0 + 11 + 22 + 33 + 44 + 2 (first stage equation)

? Each is uncorrelated with 1.

? 2 is uncorrelated with 1 if and only if 2 is uncorrelated with

1. This is what we want to test.

? Write 1 = 0 + 12 + 1, then by definition 1 is uncorrelated

with 2.

? If 1=0 then 1 and 2 are uncorrelated.

? To test for such correlation (endogeneity of 2) we can test

0: 1 = 0 in the following model

1 = 0 + 12 + 21 + 32 + 1 ?2 +

? where ?2 is the residuals from the reduced form equation.

48

1 = 0 + 12 + 21 + 32 + 1 ?2 +

? We then test 0: 1 = 0 using a t-test.

? If we reject it at a small significance level, we

conclude 2 is endogenous because 1 and 2 are

correlated.

Practical guideline for the Hausman test:

1. Estimate the first stage equation for 2 and compute

the residual ?2.

2. Add ?2 as explanatory variable in the main regression

and estimate it by OLS. You may want to use a

heteroscedasticity robust version of the t-test for

testing whether the coefficient on ?2 is significant. If it

is statistically significantly different from zero, we

conclude that 2 is indeed endogenous.

A disadvantage of this test is that it reliability requires the

use of a valid instrument.

49

More remarks on IV estimation:

? If we have more instruments for one endogenous explanatory

variable, we can also test whether at least some of them are not

correlated with 1 (validity of the instrument).

? We have to assume that at least one of the IVs is exogenous.

Then we can test the overidentifying restrictions that are used in

2SLS. No details presented here.

? Heteroscedasticity in the context of 2SLS raises the same issues

as with OLS.

? There are standard errors and test statistics available which are

robust with respect to heteroscedasticity.

? There are also tests for heteroscedasticity available.

? 2SLS can be also applied to pooled cross section and panel data

(e.g. first differencing). This does not rise new difficulties.

50

Summary

?We have seen the method of instrumental variables as a

way to consistently estimate the parameters in a linear

model when there are endogenous explanatory

variables.

?When instruments are poor IV estimates can be worse

than OLS.

?2SLS is routinely used in economics and social sciences

alike.

?Hausman-Test for endogeneity.

? IV estimation can be used for cross section, pooled

cross section and panel data.

51

學霸聯盟

ECO00005M

Applied Microeconometrics

Professor Cheti Nicoletti

cheti.nicoletti@york.ac.uk

1

What are we going to learn about endogeneity

?Definition of the endogeneity issue.

?Consequences of this issue for the OLS estimation.

?Potential causes of endongeneity.

?Methods to solve endogeneity issues.

? Instrumental variable estimation (two-stage least

squares, 2SLS) and the conditions that the instruments

should satisfy.

?Formulas for the computation of instrumental variable

estimation and two-stage least squares estimation

?How to choose between OLS and 2SLS estimation.

2

References for endogeneity

?BASIC STARTING KNOWLEDGE

Wooldrige J.M. Introductory Econometrics: A Modern

Approach, Sixth Edition,

Chapter 15 Instrumental Variables Estimation and Two Stage Least Squares

?MORE ADVANCED KNOWLEDGE

Wooldrige J.M. Econometric Analysis of Cross Section and

Panel Data, Second Edition,

Chapter 5 Instrumental Variables Estimation of Single-Equation Linear Models,

Chapter 6 Additional Single-Equation Topics *

3

Definition of endogeneity

?Suppose we have a linear regression model:

?Definition: Exogeneity and Endogeneity of Independent

Variables.

? is exogenous if it is uncorrelated with u.

? is endogenous if it is correlated with u.

?OLS (Ordinary least squares) estimation of the linear

regression model requires exogeneity for consistency.

?Homework: Assuming that be identically and

independently distributed as N(0,1) but correlated with

one of the explanatory variables, show why endogeneity

implies that the OLS estimation is biased.

4

0 1 1y k kx x u? ? ?= + +?+ +

Causes of endogeneity

? Endogeneity can be caused by many things.

? An important variable that is not observed and omitted

? Functional form specification

? Reverse causality

? Simultaneity

? Measurement error in the regressors

? ...

? Endogeneity is present in most applications in applied

economic research.

5

Omitted Variables

?Let us begin with the case whether the true regression

model has only two explanatory variables k=2

y = 0 + 11 + 22 + where E 1, 2 = 0

?But we omit the variable 2 and estimate the model

y = 0 + 11 + ? where ? = 22 +

?Now, if (1, 2) ≠ 0 and 2 ≠ 0, we do not have

E ? 1 = 0 and we would have an omitted variable bias.

?Solution: Instrumental Variable, Proxy Variable

6

Consequence of the omission of a relevant variables

What happens if we omit 2 i.e. a variable that actually

belongs to the true model?

True model y = 0 + 11 + 22 +

?Consider the linear projection of 2 on 1

2 = 0 + 11 +

?Then by definition E() = 0 and cov , 1 = 0

?Let plug in the true model the equation for 2

y = 0 + 11 + 2(0 + 11 + ) +

y = 0 + 20 + (1+21)1 + 2 +

y = 0 + 11 + so ?1 = 1 + 21

?Asymptotic bias (21)

7

Omission of a relevant variable when the true

regression model has of k independent variables

? if the true parameter for the omitted variable is zero, then

estimates are still unbiased.

? If the omitted variable is uncorrelated with all other independent

variables, the estimators are still unbiased

? If the omitted variable is correlated with at least one independent

variable, this can cause a bias for all estimates.

Homework: Compute the asymptotic bias caused by the omission

of xk.

Notice that

? If we include irrelevant variables in the model, the estimators are

still unbiased, but the variance of the estimation increases.

8

0 1 1y k kx x u? ? ?= + +?+ +

Using a Proxy Variable

for Unobserved Explanatory Variables

?A more difficult problem arises when a model excludes a

key variable, usually because of data unavailability.

?Example: Return to Education

? the population (true) model is:

? Suppose we do not observe the ability (abil). Ignoring abil would

generally give biased and inconsistent estimates of the return to

education.

? We expect an upward bias for the estimated return to education.

Why?

? How can we solve or at least mitigate this omitted variable

problem?

9

( ) 0 1 2 3log wage educ exper abil u? ? ? ?= + + + +

?One possibility is to use a proxy variable for the omitted

variable.

? Something that is related to the unobserved variable.

? In the wage equation one could use the intelligence

quotient, or IQ as a proxy for ability. IQ and ability do not

need to be the same, but they need to be correlated.

?Suppose we have the model

y = 0 + 11 + 22 + 33

? +

with 3

? being unobserved. We have a proxy variable 3

?What do we require of 3?

? 3 must be relevant in explaining 3

?, i.e. in regression

3

?= 0 + 33 + 3 we have 3 ≠ 0

? If 3 = 0 , the proxy is not good.

10

? Replace 3

? with 3, i.e. just regress y on 1, 2 and 3.

This is called the plug-in solution to the omitted

variables problem.

? Since 3 and 3

? are not the same: when does this

procedure give consistent estimators for 1 and 2?

? The assumptions are with respect to u and 3:

Assumption 1. In addition to assuming that u and 1, 2 and 3

? are

uncorrelated, we need that u and 3 be uncorrelated. This means

that 3 is irrelevant in the population model once 1, 2 and 3

? are

included.

Assumption 2. The error 3 is uncorrelated with 1, 2 and 3.

This means that 3 is a good proxy for 3

?: (3

? 1, 2, 3 =

(3

? 3

11

? From the latter assumption follows, that

(3

? 1, 2, 3 =0 + 33

? In terms of our wage equation this means:

, , = = 0 + 3

thus the mean value of ability only changes with IQ.

? More formally, what are the implications of the two

assumptions?

12

?By combining

y = 0 + 11 + 22 + 33

? +

3

? = 0 + 33 + 3

we obtain:

y = 0 + 11 + 22 + 3(0 + 33 + 3) +

y = 0 + 30 + 11 + 22 + 333 + 33 +

? Now let us denote e = 33 + as the composite error.

? Note that u and 3 have both zero mean and each is uncorrelated

with 1, 2 and 3 (see assumptions 1 and 2 in slide 11) . Then e

has zero mean and is uncorrelated with 1, 2 and 3.

?For this reason, we can write

y = 0 + 11 + 22 + 33 +

? The OLS estimation of the above equation is consistent for

0, 1, 2 and 3.

? We do not get unbiased estimators for 0 and 3.

? Empirically 3may even be of more interest than 3.

Functional form misspecification

?Special case: omission of a relevant variable 1

2.

?Suppose y = 0 + 11 + 21

2 + with (|1, 1

2) = 0

?But we estimate y = 0 + 11 + ? where ? = 21

2 +

?Now, since cov(1, 1

2) ≠ 0 and if 2 ≠ 0, we do not have

? 1 = 0 and we would have a bias due to functional

form misspecification.

?Solution: Test for functional form (RESET), Non-

parametric and semiparametric methods, more flexible

parametric specifications of the model.

14

Simultaneity

? If an explanatory variable is determined simultaneously

with the dependent variable, then it is correlated with the

error term.

? In this case OLS is biased and inconsistent.

?As an example we consider two equations (structural

equations) without an intercept:

1 = 12 + 11 + 1

2 = 21 + 22 + 2

the variables 1 and 2 are exogenous.

?We focus on estimation of the first equation.

15

1 = 12 + 11 + 1

2 = 21 + 22 + 2

?To show that the dependent variables are generally

correlated with the error terms (e.g. 2 with 1), we

replace into the second equation 1 with the right hand

side of the second equation:

2 = 2(12 + 11 + 1) + 22 + 2

(1 ? 21)2= 211 + 22 + 21 + 2

? In order to solve for 2 we have to assume 21 ≠ 1.

? It depends on the application whether this is restrictive.

16

(1 ? 21)2= 211 + 22 + 21 + 2

Can be rewritten as:

2 = 211 + 222 + 2

where

21=21/(1 ? 21)

22=2/(1 ? 21)

2=(21 + 2)/(1 ? 21)

This is a reduced form equation for 2.

? 21 and 22 are reduced form parameters.

? 2 is linear in 1and 2. For this reason, it is uncorrelated with 1

and 2. We can apply OLS to estimate 21 and 22.

? There is an equivalent reduced form equation for 1.

17

2 = 211 + 222 + 2

?We can use this equation to show that OLS estimation of

the structural equations will generally result in biased

and inconsistent estimates for and :

1 = 12 + 11 + 1

?From the reduced form equation, we see that 2 and 1

are correlated if 2 and 1 are correlated. Since 2

linearly depends on 1, it is generally correlated with 1.

?When is it not correlated?

? If 2 = 0 and if 1 and 2 are uncorrelated.

? In this case 2 is not simultaneously determined with 1.

18

?When 2 is correlated with 1 because of simultaneity,

the OLS suffers from simultaneity bias and it is

inconsistent.

?Obtaining the direction of the bias is generally

complicated. Simple expressions of the bias can be

derived under additional assumptions but this is not

covered here.

?Solution: Instrumental Variable estimation

19

Measurement error in an explanatory variable

?We consider the simple regression model:

y = 0 + 11

? +

and assume that it satisfies the Gauss Markov

assumptions.

?We do not observe 1

? but 1(e.g. actual and reported

income).

?The measurement error in the population is: 1 = 1 ? 1

?

?We assume: (1) = 0

?Moreover, we assume that is uncorrelated with 1

? and

1:

E y 1, 1

? = E(y|1

?)

20

?The model can be written as: y = 0 + 11 + ? 11

?The classical errors-in-variables (CEV) assumption is

that the measurement error is uncorrelated with the

unobserved explanatory variable: (1, 1

?) = 0

? This has the meaning that the observed measure 1 consists of

two uncorrelated components: 1 = 1

? + 1

? (We still assume that is uncorrelated with 1 and 1

?.)

? The above assumption implies that 1 and 1 must be correlated:

(1, 1) = (1, 1) = (1, 1

?) + 1

2 = 0 + 1

2

? This correlation causes problems for the OLS estimation.

21

? This implies for our model y = 0 + 11 + ( ? 11) that

since and 1 are uncorrelated, the covariance between 1 and the

composite error ( ? 11) is:

1, ? 11 = ?1(1, 1)=?11

2

? Note also that (1) = (1

?) + (1) = 1?

2 + 1

2

? Then one can show:

?1 = 1 +

(1, ? 11)

(1)

= 1 ?

11

2

1

?

2 +1

2 =1 1 ?

1

2

1

?

2 +1

2

=1

1

?

2

1

?

2 +1

2

? This equation is very interesting: ?1 is always closer to

zero than 1: attenuation bias 22

? OLS is biased in the classical error in variables model:

? If 1 is positive, it will underestimate 1 and vice versa.

? Things are more complicated if we look at the multiple

regression model but again OLS will be biased and inconsistent.

?Solution: Instrumental Variable estimation, ....

23

Instrumental Variable Estimation

? Suppose we have an endogenous independent variable.

? How can we obtain a good estimate for the coefficient on the

endogenous variable?

? This can be achieved if there is an instrumental variable available.

24

? First, we look at the simple regression model:

y = 0 + 1 +

? Now take another variable with cov(x, ) ≠ 0. Then,

, = 1(, )+(, )

1 =

, ? (, )

(, )

? Under the additional assumption , = 0, we

have

1 =

,

(, )

25

?A natural estimator for 1 is therefore

,

(,)

with the

population covariances replaced by their sample

analogues:

?1 =

σ=1

( ? ?)( ? ?)

σ=1

( ? ?) ( ? ?)

?This estimator is consistent for 1 but it is inconsistent if

, ≠ 0.

?The estimator can be biased in small samples even if

, = 0.

? If x is exogenous, it can be used as an instrument and

then the IV estimator is identical to OLS.

?The natural estimator for 0 is simply:

?0 = ? ? ?1 ?

26

? A variable z is a candidate for an instrument for a variable x if it

satisfies the following conditions:

cov(x, ) ≠ 0 and , = 0

? Some remarks on the choice of an instrument:

? It is often difficult to find a good instrument.

? A proxy variable does not make a good instrument as it is supposed to

be correlated with the error term.

? Example: Consider the regression of log wage on education and ability

with error term u. Ability is not observed but IQ is observed and highly

correlated with ability. IQ is a potential candidate for a proxy for ability

but clearly violates the condition , = , =0. (HOME

WORK: Explain why IQ is a potential candidate for a proxy for ability but

clearly violates the condition cov(IQ,u)=0. Explain why IQ cannot be

used an instrument for education.)

? Instruments for education: One may use family background variables

such as the number of siblings: negatively correlated with education but

maybe uncorrelated with ability. The latter, however, is unclear as ability

is not observed.

27

Inference with the IV estimator

?The IV estimator has an asymptotic normal distribution.

?When we impose a homoscedasticity assumption

conditional to the instrument,

E 2| = () = 2,

one can derive the asymptotic variance of ?1 which is

( ?1) =

2

2,

2

?This provides us a standard error for the IV estimator.

?As with the OLS estimator, the asymptotic variance of

the IV estimator decreases to zero at the rate 1/n.

? If ,

2 is small, the variance of the IV estimator is large.

?The asymptotic variance of the IV estimator is always

larger and sometimes much larger than the asymptotic

variance of the OLS estimator. 28

Estimating (?) =

,

?The population variance of the error term 2 can be

estimated just like in the case of the OLS regression:

?2 =

1

? 2

?

=1

?

2

where ? are now the residuals from the IV regression.

?The population variance of x can be estimated by the

sample variance /:

?The square of the population correlation between x and z

can be estimated by the R-squared of the regression of

x on z: ,

2 .

?Then a consistent estimator is: ?( ?1) =

?2

,

2

?Example: Return to Education for Married Women

? Data: MROZ.dta

? Simple log – level regression model:

log(wage) = 0 + 1 +

? We obtain OLS estimates:

?log(wage) = ?0.185 + 0.109

0.205 (0.014)

= 428, 2 = 0.118

? We use father’s education as an instrument for education.

? We cannot empirically check whether ability and father’s

education are uncorrelated. However, we can test, whether

education and father’s education are correlated.

30

?Example: cont.

? When we regress educ on fatheduc, we obtain

?educ = 10.240 + 0.269 ?

0.105 (0.011)

= 428, 2= 0.173

This suggests that there is significant positive correlation and

about 17% of the variation in educ is explained by father’s educ.

? When we use father’s educ as instrument for educ, we obtain:

?log(wage) = 0.441 + 0.059

0.446 (0.035)

= 428, 2 = 0.093

? The IV estimate of the return of education is about one half of the

OLS estimate, suggesting that there is omitted ability bias.

? The IV standard error is much larger than the OLS standard error

and the IV 95% Conf. Interval contains the OLS estimate.

? While this empirical example suggests that differences in

estimates between IV and OLS are practically large, they are not

statistically significant.

31

?There are similar IV applications with other data sets

which yield larger IV estimates than OLS estimates.

?Larger IV estimates than OLS estimates may suggest

some measurement error issues that cause an

underestimation of the OLS or might suggest that the IV

is invalid because correlated with the error term.

?Since just a little correlation between z and u can already

cause serious problems for the IV estimator, this is an

important issue.

? IV estimation can be also applied in case of a binary

endogenous regressor or a binary instrumental variable.

32

IV Estimation with a poor Instrumental Variable

? IV estimates can have large standard errors if x and z are only

weakly correlated. (Don’t use IV in this case.)

? IV estimates can have a large asymptotic bias if z and u are only

weakly correlated:

?1 = 1 +

(, )

(, )

This implies that the bias can be large if the population correlation

between z and x is small even if the population correlation between

z and u is small. (HOME WORK: prove the equality above. HINTS:

plim ?1 =

(,)

(,)

and y = 0 + 1 + )

? For this reason IV can be worse in terms of consistency than OLS

even if Corr(z,u) is small (provided that (, ) is also small).

? One can show that IV is only superior in terms of asymptotic bias if

(, ) / (, ) < (, )

33

R-Squared and IV Estimation

2 = 1 ? /

?SSR (sum of squared IV residuals) can be larger than

SST (total sum of squares). For this reason the R-

squared can be become negative and it is smaller than

for OLS.

? It is not clear whether the IV R-squared should be

reported after IV estimation.

34

IV Estimation of the Multiple Regression Model

? The idea of IV estimation can be easily extended to the multiple

regression case.

? For this purpose we change a bit the notation.

? The model is now: y = + with = 0,

? is Kx1 vector

? = (1, 2, 3, … , ?1 , ) is a 1xK vector

and is endogenous i.e. (, ) ≠ 0 .

? We need an instrument for to obtain consistent estimates.

? We need another exogenous variable 1 with cov(1, ) = 0.

? Let = (1, 2, 3, … , ?1 , 1), then E(

′) = and is

exogenous.

35

? The instrument 1 must be relevant to explain the endogenous

variable once controlled for all remaining exogenous

explanatory variables, i.e.

= 1 + 22 + 33 +?+?1?1 +11 + ,

where 1 ≠ 0 and by defintion is uncorrelated with all

exogenous variables and () = 0.

? This implies that (′) has full rank (rank condition)

? We can test 1 ≠ 0 using a t-test.

y = +

′y = ′ + ′

(′y) = (′) + (′)

(′y) = (′)

= [(′)]?1(′y)

? (′y) and (′) can be consistently estimated using the

corresponding sample moments.

36

?Given a random sample i=1,....,N the instrumental

variables estimator of is

? =

1

?

=1

′

?1

1

?

=1

′

= ′ ?1 ′

where =

1 2,1 3,1

? ?

1 2, 3,

… ?1,1 1,1

? ? ?

… ?1, 1,

is a N x K matrix

=

1 2,1 3,1

? ?

1 2, 3,

… ?1,1 ,1

? ? ?

… ?1, ,

is a N x K matrix

=

1

?

is N x 1 vector, = 1 2, 3, … ?1, ,

′

= 1 2, 3, … ?1, 1, 37

?Example: College Proximity as IV for Education

? Data: Card.dta See do file in the VLE

? Log(wage) is dependent variable, several controls (exper expersq

black smsa south smsa66 reg662-reg669) plus the endogenous

education

? Instrument for education: dummy if someone grew up near a four

year college (nearc4).

? We assume that nearc4 is uncorrelated with the error. Moreover, to

be a valid instrument it has to be partially correlated with educ.

? We can test this by estimating in stata the equation:

regress educ nearc4 exper expersq black smsa south smsa66 reg662-reg669

? The t-statistic is 3.64 and therefore if nearc4 is uncorrelated with the

error term, we can use it as IV for educ.

38

_cons 16.63825 .2406297 69.14 0.000 16.16644 17.11007

reg669 .210271 .2024568 1.04 0.299 -.1866975 .6072395

reg668 .5238914 .2674749 1.96 0.050 -.0005618 1.048344

reg667 -.2168177 .2343879 -0.93 0.355 -.6763953 .2427598

reg666 -.3028147 .2370712 -1.28 0.202 -.7676536 .1620242

reg665 -.2726165 .2184204 -1.25 0.212 -.7008858 .1556528

reg664 .117182 .2172531 0.54 0.590 -.3087984 .5431624

reg663 -.027939 .1833745 -0.15 0.879 -.3874918 .3316139

reg662 -.0786363 .1871154 -0.42 0.674 -.4455241 .2882514

smsa66 .0254805 .1057692 0.24 0.810 -.1819071 .2328682

south -.0516126 .1354284 -0.38 0.703 -.3171548 .2139296

smsa .4021825 .1048112 3.84 0.000 .1966732 .6076918

black -.9355287 .0937348 -9.98 0.000 -1.11932 -.7517377

expersq .0008686 .0016504 0.53 0.599 -.0023674 .0041046

exper -.4125334 .0336996 -12.24 0.000 -.4786101 -.3464566

nearc4 .3198989 .0878638 3.64 0.000 .1476194 .4921785

educ Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 21562.0801 3,009 7.16586243 Root MSE = 1.9405

Adj R-squared = 0.4745

Residual 11274.4622 2,994 3.76568542 R-squared = 0.4771

Model 10287.6179 15 685.841194 Prob > F = 0.0000

F(15, 2994) = 182.13

Source SS df MS Number of obs = 3,010

. regress educ nearc4 exper expersq black smsa south smsa66 reg662-reg669

39

? The following table reports OLS and IV estimates.

? IV estimate is almost twice as large as the OLS estimate.

? SE of the IV estimate is 18 times larger. This is the price we have to

pay if we use an instrument to obtain a consistent estimator.

Dependent Variable: log(wage)

Independent

Variable

(1) OLS (2) IV

Educ 0.075

(0.003)

0.132

(0.055)

Exper 0.085

(0.007)

0.108

(0.024)

Exper^2 -0.0023

(0.0003)

-0.0023

(0.0003)

…other controls … …

Observations

R-squared

3,010

0.300

3,010

0.238

40

Two Stage Least Squares (2SLS)

?Sometimes there are multiple valid IVs for an

endogenous explanatory variable.

?Suppose the variables 1, 2, … , satisfy

(?, ) = 0 for ? = 1,… ,

?We could simply use all of them as instruments and

obtain multiple IV estimators.

?The idea is to use all IV together to obtain a more

efficient estimator:

? Let = 1, 1, … , ?1 , 1, 2, … , be 1x L with L=K+M.

? As each element of z is uncorrelated with u, any linear

combination is also uncorrelated with u.

41

? The linear combination of z which is most highly correlated with is

the linear projection of on z

= 1 + 22 + 33 +?+?1?1 +11 +?+ +

where by definition is uncorrelated with all exogenous variables

? and and () = 0

? is correlated with u if is endogenous but

[1 + 22 + 33 +?+?1?1 +11 +?+ ]

is not correlated with u.

? This means that we can replace the endogenous variable with it

prediction (using OLS estimation of the linear model for ):

? = ?1 + ?22 + ?33 +?+ ??1?1 + ?11 +?+ ?

Which is exogenous

? We require that at least one is non-zero. Use F-test to test the null

hypothesis the all instruments have zero effects.

42

?Now, let and use it as the

instruments for :

? It can be shown that and thus

?This estimator is consistent under the conditions:

, , ,

?The last condition suggests that we need at least as

many instruments as explanatory variables in the model

(order condition)

43

44

?Under homoscedasticity it is possible to show

that

?The variance matrix can be estimated by with

?Econometric packages have 2SLS implemented. There

is no need to perform the two stages manually. If you

compute it manually, OLS standard errors and statistics

for the second stage are not valid (as there is a

composite error in the second regression).

?The IV estimator with multiple instruments is also called

two stage least squares (2SLS) estimator:

? One can show that the IV estimates are identical to OLS

estimates from the regression of

This is the second stage.

? The first stage is the regression of

?Multicollinearity is a bigger problem for 2SLS than for

OLS . This is for two reasons:

? has less variation than .

? has more correlation with than .

45

Some Remarks:

? If the R-squared of on the exogenous variables

[1, 1, … , ?1 ] is very large, the standard error of 2SLS

explodes. Can be verified with data at hand.

?2SLS can be also used in models with more than one

endogenous variable.

? We need more candidates for instruments to achieve

identification.

? The sufficient condition for identification is the rank condition.

?Since R-squared after 2SLS cannot be compared to OLS

R-squared we must be careful.

?The standard errors and tests for a 2SLS estimation

produced manually will be wrong.

? It is possible to derive a statistic with an approximate F-

distribution in large samples. Use econometric packages

to run the 2SLS with correct standard errors.

46

Testing for Endogeneity

? It is useful to have a test for endogeneity of an

explanatory variable to show whether 2SLS is even

necessary.

?Suppose we have the structural equation

1 = 12 + 11 + 22 + 1

where 2 is endogenous and 3 and 4 are two

exogenous variables (i.e. uncorrelated with 1) but

relevant to explain 2.

?Hausman (1978) suggests a test which directly

compares OLS and 2SLS estimates and determines

whether differences are statistically significant (Hausman

Test).

47

?The idea behind the test is as follows:

? We have:

1 = 0 + 12 + 21 + 32 + 1 (main regression)

and

2 = 0 + 11 + 22 + 33 + 44 + 2 (first stage equation)

? Each is uncorrelated with 1.

? 2 is uncorrelated with 1 if and only if 2 is uncorrelated with

1. This is what we want to test.

? Write 1 = 0 + 12 + 1, then by definition 1 is uncorrelated

with 2.

? If 1=0 then 1 and 2 are uncorrelated.

? To test for such correlation (endogeneity of 2) we can test

0: 1 = 0 in the following model

1 = 0 + 12 + 21 + 32 + 1 ?2 +

? where ?2 is the residuals from the reduced form equation.

48

1 = 0 + 12 + 21 + 32 + 1 ?2 +

? We then test 0: 1 = 0 using a t-test.

? If we reject it at a small significance level, we

conclude 2 is endogenous because 1 and 2 are

correlated.

Practical guideline for the Hausman test:

1. Estimate the first stage equation for 2 and compute

the residual ?2.

2. Add ?2 as explanatory variable in the main regression

and estimate it by OLS. You may want to use a

heteroscedasticity robust version of the t-test for

testing whether the coefficient on ?2 is significant. If it

is statistically significantly different from zero, we

conclude that 2 is indeed endogenous.

A disadvantage of this test is that it reliability requires the

use of a valid instrument.

49

More remarks on IV estimation:

? If we have more instruments for one endogenous explanatory

variable, we can also test whether at least some of them are not

correlated with 1 (validity of the instrument).

? We have to assume that at least one of the IVs is exogenous.

Then we can test the overidentifying restrictions that are used in

2SLS. No details presented here.

? Heteroscedasticity in the context of 2SLS raises the same issues

as with OLS.

? There are standard errors and test statistics available which are

robust with respect to heteroscedasticity.

? There are also tests for heteroscedasticity available.

? 2SLS can be also applied to pooled cross section and panel data

(e.g. first differencing). This does not rise new difficulties.

50

Summary

?We have seen the method of instrumental variables as a

way to consistently estimate the parameters in a linear

model when there are endogenous explanatory

variables.

?When instruments are poor IV estimates can be worse

than OLS.

?2SLS is routinely used in economics and social sciences

alike.

?Hausman-Test for endogeneity.

? IV estimation can be used for cross section, pooled

cross section and panel data.

51

學霸聯盟

- 留學生代寫
- Python代寫
- Java代寫
- c/c++代寫
- 數據庫代寫
- 算法代寫
- 機器學習代寫
- 數據挖掘代寫
- 數據分析代寫
- android/ios代寫
- web/html代寫
- 計算機網絡代寫
- 操作系統代寫
- 計算機體系結構代寫
- R代寫
- 數學代寫
- Finance 金融作業代寫
- Principles of Microeconomics 微觀經濟學代寫
- Accounting 會計代寫
- Statistics統計代寫
- 生物代寫
- 物理代寫
- 機械代寫
- Assignment代寫
- sql數據庫代寫
- analysis代寫
- Haskell代寫
- Linux代寫
- Shell代寫
- SPSS, SAS, R 數據分析代寫
- Principles of Macroeconomics 宏觀經濟學代寫
- Economics 經濟代寫
- Econometrics 計量經濟代寫
- Money and Banking 貨幣銀行學代寫
- Financial statistics 金融統計代寫
- Economic statistics 經濟統計代寫
- Probability theory 概率論代寫
- Algebra 代數代寫
- Engineering工程作業代寫
- Mechanical and Automation Engineering 機械與自動化工程代寫
- Actuarial Science 精算科學代寫
- JavaScript代寫
- Matlab代寫
- Unity代寫
- BigDate大數據代寫
- 匯編代寫
- stat代寫
- scala代寫
- OpenGL代寫
- CS代寫