Aktuellt

Course Page Timetable

Current Information Harald

Svar till tentan 9/6–10

Skattningen kommer att konvergera (i sannolikhet) mot cov[X,Y]/(V[X]+V[u]) i stället för det korrekta värdet cov[X,Y]/V[X].
Wald-testet ger W=1.995, som är en observation av en χ²-variabel med en frihetsgrad. Detta är alltså inte signifikant med felrisk 5%.
LAD korrigerar till stor del för problemet, men inte fullt ut. Anledningen är att LAD siktar på medianen i stället för medelvärdet. Om t.ex. 7 personer som haft X=5 nu är värda betygen 3,4,4,5,6,6,7 så kommer vi att observera värdena 3,4,4,5,5,5,5 som har medelvärde 4.4 fast det sanna medelvärdet är 5.0. Men LAD siktar in sig på medianen, och medianen för de observerade värdena är 5.0.
Vi bildar en ny variabel (x₁+x₂) och kör regressionen
y = b₀ + b₁(x₁+ x₂) + b₂x₂ + ε
Då är b₂ = β₂ - β₂.
Definiera hat_ε genom Y = (hat_X)(hat_β) + hat_ε. Det gäller att visa att detta löser normalekvationerna för OLS, dvs att (hat_X)'(hat_ε) = 0. Men
(hat_X)'(hat_ε) = (hat_X)'(Y - (hat_X)(hat_β)) = hat_Γ'Z'((hat_u)hat_β + hat_e) = 0
ty enligt normalekvationerna för OLS är Z'(hat_u) = 0 och Z'(hat_e) = 0.
Jag föreslår någonting i stil med
(liter/mil) = β₀ + (motorstyrka)β₁ + (vikt)β₂ + (automatväxel)β₃ + (hatchback)β₄ + e
där (automatväxel) och (hatchback) är "duummy"-variabler. β₄ mäter då hur mycket mer bränsle combi-modeller drar.

with short answers.

The Mars exam is ALREADY corrected!

I will hand in the minutes and the exams tomorrow, Thursday 2/3. The results were good: 88% passed, 15% A:s and 26% B:s.

Answers to the exam

I'm sorry that this comes a bit late.

Kristoffer Högbergs anteckningar

Detta är ett självextraherande arkiv. Ladda ner och lägg i någon lämplig mapp och klicka (eller dubbelklicka) på filen, så packas alla anteckningarna upp som pdf-filer (och en liten text-fil).

Wednesday 25/2-09

First I talked about omitted and irrelevant variables; sections 5.8 and 5.9 in Hansen. I also covered the situation when X₁ and X₂ are uncorrelated. Then I talked about model selection, and described BIC (section 5.10).

Finally, I talked about the Tobit model (censored data, section 12.3) and read aloud some portions from Kennedy's book.

Tuesday 24/2-09

I made a (final?) update of my comments. I added a section in "Self Selection Bias" The latest version is thus dated 24/2–09.

Monday 23/2-09

Last Friday I talked about Non Linear Least Squares (NLLS); Ch. 5.4 in Hansen. Today we talked about mine and Stefan Lundgren's article about estimating the demand elasticity for telephone calling time, as a case study. I also started to talk a little about model selection: Ch 5.8 in Hansen; I will continue with that on Wednesday.

Now I have updated my comments again, hopefully for the last time. I have added a section about model selection (Ch. 5.8, 5.9) and revised the section about Self Selection Bias, which was a bit confusing (to put it mildly.)

It is about time you register for the exam! You do that on "Mina sidor".

Wednesday 18/2-09

I solved problem 8, Ch.5.12 in Hansen. Then I talked about "logit" regression. Hansen has a very brief description of this in Ch. 12.1. I think it is at the very core of econometrics, so I put much more emphasis on this. There is a description in my comments (updated 15/2.)

The lecture Mars 4 canceled

I'm going to I:s condefernce instead. We have plenty of time for this cousre, so no harm done.

Monday 16/2-09

I have had some health problems, so I didn't feel that fit today. Anyway, I talked about Least Absolute Deviations and Quantile Regression (ch. 5.5, 5.6). I pointed out that beside the fact that it might be desirable to estimate the median—or some other quantile—rather than the mean (as in OLS), LAD and QR has some good features:

Robust to outliers,
Invariant under (monotonically increasing) transformations.

I recommend bootstrap in these cases for hypothesis testing and calculating confidence intervals, rather than the awkward method described in Hansen; see my comments.

I then mentioned the lemma in my comments (currently on p.2)—please read the proof there— and derived the expressions for Generalised Least Squares (ch. 5.1 in Hansen; see also my comments).

Sunday 15/2-09

Updated my comments with a few more lines on "NNLS with Instrumental Variables".

Saturday 14/2-09

I have now finished bootstrap. I advised a method to estimate a confidence region and do hypothesis testing in the setting of "Percentile Intervals" when the parameter under study is multi-dimensional. I have updated my comments accordingly.

NOTE: we agreed that the assingments should be presented Monday Mars 2:nd.

Wednesday 11/2-09

I have now finished Instrumental Variable Estimation (IV, 2SLS). We did some exercises in Hansen, notably 9.8:3, 9.8:6a,b, 4.18:1, 4.18:3, 4.18:7 (I said 4:18:3, then "corrected" to 4.18:5, but the exercise I did was in fact 4.18:7.)

I then talked about bootstrap and covered 6.1–6.5. I will continue with bootstrap on Friday.

Assignment

One and a half credit is given for the compulsory assignment. You can do this either alone or in collaboration with one one student, but you may not be more than two! You will later show it at some later lecture—we will agree later on when.

The task is as follows: Here are data on wages and personal characteristics for 2215 persons in the USA year 1976, downloaded from Bruce Hansen's web page, and here is a description of these data. You should do the following:

Do exercise 8a, b, c in chapter 9 in Hansens's Lecture Notes (2008).
Test for identification in part b) and c)
Now you should think of a regression model that answeres the question Is the return to schooling the same for blacks as for non-blacks? Try to maintain the assumption that all other effects (coefficients) are the same for all.
You may regard "education" as exogeneous or endogeneous, that is up to you, but you should motivate your choice.
Estimate the model.
Interpret your results.
The above is required. However, it would be nice if you choose to analyse the data in some other respect; some question you come up with yourself. But this is voluntary!
Write a short report, as if you were to publish it at least in some internal forum at some imaginary place of work. You might want to look at this short article (from Economics Letters 1991) to see an example. Of course, your report may be shorter.
Remember: You are doing this for your own sake, not mine!

Sunday 8/2-09

Last Wednesday and Friday I talked about the instrumental variable method (IV) and two stage least squares (2SLS); Chapter 9 in Hansen. Note that the IV and 2SLS estimators are consistent but biased, they are only asymptotically unbiased. (The bias can be estimated by bootstrap methods; I will talk about this later.) We also talked about identification and derived thee rank condition for identification. When there is only one endogeneous regressor, a test for identification is as follows:

Regress the endogeneous regressor on all exogeneous variables (exogeneous regressors plus all instrunemts), then test for all coefficients for the instruments be =0 (a Wald test, if more than one instrument.) If this null can not be rejected, there is a problem with identification. I have now included this test in my comments.

Tuesday 3/2-09

Last Friday I talked about prediction. There is a very short section about this in Hansen ch. 5.3. I described the trick with observational specific dummies as described in exercise 33. I also talked about outliers which is closely related to influencial observations which Hansen briefly mentions in ch. 3.12. I described a method to identify outliers (and influential observations) using the method of observational specific dummies. I also described "cross validadton" as a means of model selection for your common knowledge. We also looked at exercises 2–6 of ch. 3.14. (We will eventually also solve 6, 9 and 13.)

Yesterday I talked about the Instrumental Variable Method. First we identified the problem of "endogeneity", I classified tree types:

self selection
measurement errors in covariates
simultaneity

and I gave examples of these, many of which appear in the exercises. I started to describe the method of instrumental variable estimation, but will do this i much more detail tomorrow.

Thursday 29/1-09

This Monday I talket about "residual regression" i.e., the Frisch-Waugh theorem; ch. 3.7 in Hansen, and I gave a proof based on the "normal equations" X'ê = 0. I also talked about "goodness of fit": R² and adjusted R² (last part of ch. 3.3) and solved problem 3.14:6.

Yesterday I talked about confidence intervals and hypothesis testing; ch. 4.7–4.9. In particular, I made a very thorough derivation of the Wald test, and I solved exercises 15 och 23.

Friday 23/1-09

Wednesday and Friday. I wrote down in more rigour the assumptions we need for the classical regression model. They are essentially (2.8) and (2.9) in Hansen, plus a littel more: that the observations are essentially independent. In matrix form:

E[e|X] = 0
E[ee'] = diagonal_matrix(σ₁...σ_n)
The data in the covariates may be any mixture of constants and random variables.

Note that the last assumption is more general that that employed by Hansen! Indeed, we do not adopt Assumption 3.1.1 in Hansen! This renders the statement "u_i = x_ie_i which is iid. ..." on page 34 untrue—they are not iid, so the proof of theorem 4.3.1. is somewhat more complicated. We just accept this theorem at face value. The OLS estimate is determined by the normal equations X'ê = 0 which is the Method of Moments Estimator (MME) correspondning to the relation E[xe]=0.

We have previously seen that the assumption E[e|X] = 0 in many contexts where the equation has a structural interpretation may be violated. In some cases one or more relevant variables have been left out, in other situations one might use a different eatimator than OLS—more on this later in the course.

We have also seen previously that we get a more efficient estimate if the model is (nearly) homoskedastic, so we should try to formulate the model with this in mind, but in any case the covariance matrix of the estimated coefficients should always be estimated by White's method.

We looked a problems 3–10 of ch. 2 in Hansen. Then I talked about multicolinearity (ch. 3.11), and finally I commented on the model specification in ch. 4.16.

Monday 19/1-09

I have now covered essentially the following sections i Hansen: 1.1–1.3, 2.1, 3.2, 3.3, 3.5, 3.8, 3.9, 4.1. Today I explained somewhat sloppily how we should regard the covariates in an OLS regression (are they deterministic or random?) I will come back to this with more rigour on Thursday. You can also take a look in my comments to Hansen—I have just (19/1) updated them somewhat. I have spent some time on the "soft" parts of econometrics, more specifically I have given examples on possible problems:

heteroskedasticity
omitted relevant covariate
selection bias, in particular self selection bias
endogeneity, simultaneity

I will go on to talk also about

multicolinearity, and
omitted non-linearity

I will eventually update the excercises, but as you can see, we have already discussed some of them.

A possible remedy for the problem of endogeneity (simultaneity) and selection (self selection) bias is to employ an instrumental variable estimation, which we will deal with later.

Wednesday 14/1-09

I introduced the subject "econometrics". Then I went on to Ordinary Least Squares. For now we assume the the regressors are deterministic. That is not the typical situation in econometrics, but we start with this situation. I defined the OLS estimate of the regression equation

y_i = x_i'β + e_i

as the value of (the vector) β which minimises the sum of squared residuals

Σ e_i².

I then went on to prove that this is equivalent to solving the normal equations

∑ x_ie_i = 0

or, in matrix notation:

X'e = 0

which leads to the following expression for β:

β = (X'X)^-1X'Y.