| home time table current information
| |
![]() | Exercises |
This list will be augmented with more exercises later.
1. Simplify the expression [(AtBt)-1C]tBA where A and B are non-singular square matrices.
2. Same for the expression (AtB-1A)-1AtB-1C
3. Show that the matrix below is an orthogonal projection matrix, and determine
the dimension of the range (image).
4. Prove proposition A.2
5. Prove proposition A.3
6. Let A be a non-singular kxk matrix and e a kx1 matrix of independent, Normal N(0,1) random variables. Define u=Ae.
a) Prove that Cov(u) = AAt.
b) Prove that
ut Cov(u)-1 u
is a χ2(k) distributed random variable.
7. You have the following data from two normal distributions with the same standard deviations. All data are inedpendent of the other.
52.3 51.1 52.7 57.2 54.9 55.8 56.6 54.3 59.5 54.5
48.6 50.7 51.7 49.2 51.2 51.1 46.8 50.9 48.0 50.7
Run a linear regression in EXCEL to derive a conficence interval for the difference in mean values of the underlying distributions. Check your answer by doing the calculation on your pocket calculator (built in procedure.)
8. You have the same data as in the previous exercise, but now also these data from a third (normal) distribution (with the same standard deviation):
55.5 53.7 60.1 53.3 52.0 49.6
Run a linear regression in EXCEL to derive the p-value for the hypohtesis that all mean values of the underlying distributions are the same.
9. This is not an exercise, but here are data on wages. You may play with it to your own interest and enjoyment.
10. Use the data given above to test if the hypothesis that both the coefficient for “nonwhite” and that for “hispanic” are zero (i.e., no “racial” discrimination) in a wage equation.
11. Test the hypothesis that the two coefficients mentioned above are equal, and half of the value of the coefficient for “female”.
12. Use the data in exercise 9 to make a prediction of the hourly wage for a male, who lives in south, who is white, not Hispanic, with 15 years of experience and 10 years of education and who is a union member.
Make a prediction interval if his hourly wage with risk level 5%.
The answer you should get is a predicted wage of 7.05 dollars, and a prediction interval of (3.0, 16.6) dollars (sic!). Notice how useless the equation is for prediction. Try to figure out to which extent the uncertainty in the prediction depends on imprecise estimates of the coefficients and to which extent it depends on the error term.
13. You run a regression
Y = β0 + β1(R&D) + resid
where Y is log(GDP/capita) [BNP per kapita] and (R&D) is the expenditure per capita spent on Research and Development [forskning och utveckling]. You run the regression on a cross section of countries and get a value for β1. You want to interpret this value in this way; “If we increase the expenditure on R&D with Δx per capita, then the GDP will will go up by a factor of exp(β1Δx).”
Discuss this interpretation!
14. You want to know how the number of rooms in an apartment influences its price, and consider the two model specifications:
price = β0 + (no. of rooms)β1 + (floor area)β2 + ... + resid and
price = β0 + (no. of rooms)β1 + ... + resid (i.e., no floor area in the equation)
Discuss the interpretation of β1 in the two models.
15. You want to investigate how much wage increase a person may expect if he takes a university exam, compared to if he does not. You run a regression on some data:
log(wage) = β0 + (univ.exam)β1 + (work_experience)β2 + (female)β3 + (immigrant)β4 + residual
Here all variables except (work_experience) are dummies. Do you see any problems with this? If we replace (work_experience) with (age), does that in any way change the interpretation of β1?
16. Prove the statements about the true and estimated variances on page 19 (near the end of the section) in my booklet.
17. This exercise is your first assignment project.
(I have taken idea of this exercise from Bruce Hansen's text on Econometrics.)
The data file card_data.xls is taken from David Card “Using Geographic Variation in College Proximity to Estimate the Return to Schooling” in Aspects of Labour Market Behavior (1995). There are 2215 observations with 29 variables listed in card.xls. We want to test if returns to schooling is the same for whites and blacks. To this end, estimate the wage equation
log(Wage) = β0 + (Educ)β1 +
(Exper)β2 + (Exper2)β3
+
(South)β4 + (Black)β5 +(Black)*(Educ)β6 + e
where (Educ) = Eduation in years, Exper = Experience in (years), and (South) and (Black) are regional and racial dummy variables.
Estimate the model by OLS. Report estimates and standard errors.
Now treat Education as endogenous, and the remaining variables as exogenous. Estimate the model by 2SLS, using the four instruments near4 (a dummy indicating that the observation lives near a 4-year college), near2 (a dummy indicating that the observation lives near a 2-year college), fatheduc (the education, in years, of the father) and motheduc (the education, in years, of the mother). Report estimates and standard errors.
Report your conclusion about returns to schooling for blacks and whites. Discuss the appropriateness of the choise of instruments.
Here is the description of the data.
Some further explanation of the data:
Nota Bene: variable 28 is strangely coded. It is not a simple dummy for “married”. I don't know how to intyerpret it. (It therefore doesn't enter the equation.)
NLS means “National Longitudinal Surveys”. These are data that the US Department of Labor, Bureau of Labor Statistics, collects. I don't know what “weight” means.
SMSA means “Standard Metropolitan Statistical Areas” (a standard Census Bureau designation of the region around a city in the United States)
18. Assume that you have the following demand – supply system for (retail) coffee:
Qd = α0 + (retail price)α1 + e1
Qs = β0 + (retail price)β1 + (market price of coffee beans)β2 + e2.
a) Show that (retail price) is endogeneous (in both equations).
b) Show that (market price of coffee beans) is a possible instrumental variable for (retail price) in the demand equation.
19. Let
be the estimated
residuals of a regression of yi onto some covariates. Show that
.
20. Consider again the statements about the true and estimated variances on page 19 (near bottom) in my booklet. Prove that White's robust errors gives (in expectation, or asymptotically) the correct variance.
21.
22. Use the data in exercise 9. Run the regression of log(wage) on the other covariates, including experience2. Note the reported standard error for the coefficient for female. (Answer: 0.0375.) Next compute White's robust estimate of this standard error (formula 3.3; answer: 0.0374.) No problem with heteroskedasticity shows up here!
23. You want to estimate a prediction model for the duration of an unemployment spell for an individual, depending on his (her) characteristics, like education, age, experience, gender, immigrant, income from spouse, eligible to benefit, and some more. You pick out 675 people at random that were registered as unemployed at a certain date two years ago, note the date they got employed (we assume they all did) and record the length of their unemployment spells. You run the regression and are happy.
Do you see any problem with this approach?
Do you suggest a different approach?
24. We know that if we run an OLS regression (with intercept), then the sum of the residuals is equal to zero. (Prove that!) Is this aslo true if we run a 2SLS regression?.
25. Suppose you have 22 anual data on output Y, capital K and labour L and want to estimate a Cobb-Douglas production function
Y=c·KαLβ.
You suspect that there is “constant returns to scale” (mening that α + β =1) but you want to test that hypothesis before imposing it as a restriction.
How do you test this hypothesis?
Assume that the p-value for the hypothesis comes out as p=0.624, so you accept it. How do you proceed to estimate the Cobb-Douglas production function?
26. A Korean friend has regressed the Korean won / US$ exchange rate on its lagged value, the Korean trade balance, the difference between the Korean and US inflation rates and the difference between the Korean and US real interest rates. To his surprise, the coefficient on the trade balance comes out negative, although all other coefficients get the expected sign.
Can you explain to him what is going on? What remedy do you suggest to him? (You might need some basic insight in macro economics for this exercise.)
27. This exercise is your second assignment project. You should write this as a little report on the experiment.
The table below is from an experiment where the humidity of paper, produced by a paper machine, was measured. Two measurment were made for each of two different levels of speed and three different mixtures if ingrediences.
a) construct an ANOVA table with sources speed, ingrediences, interactions and error (i.e, four sources.)
b) See if it is reasonable to reduce the model to have zero interactions. Motivate your choice!
c) See if there is any more or other reduction of the model that is motivated.
d) Write the ANOVA table for the final model.
e) Write down the final model with your estimated coefficients.
f) According to your final model, which combination of speed and ingrediences gives the lowest humidity?
| ingredience 1 | ingredience 2 | ingredience 3 | ||||
| speed 1 | 7.2 | 7.2 | 7.8 | 7.2 | 8.4 | 7.8 |
| speed 2 | 6.4 | 6.8 | 7.2 | 7.4 | 7.8 | 7.6 |
28. You want to see if males and females differ after three years of study at KTH:s programme on computer science. The issue is if they differ in the number of credit points they have managed to collect, on average.
For the purpose, you collect data on 50 male students and 50 male students, and run the regression
(credit points) = β0 + (female) β1 + error
(female) is a dummy for female.
a) The fraction of females at KTH:s computer science programme is only 15%. You have chosen 50 males and 50 females, hence not a random sample of students. Explain why this does not cause a “selection bias” in this case.
b) Prove that if you had chosen a random sample of students, then the standard error of the estimated β1 would almost certainly (in the common sense) be greater by a factor of about 1.4.
29. Prove the formula for the F-statistic on p.9 (testing all slope coefficients) in my booklet.
30. Assume that you run a regression of y onto two covariates x1 and x2:
y = β0 + x1 β1 + x2 β2 + error.
a) If the sample correlation between x1 and x2 is zero, then the estimated value of β1 will be the same as when you leave out x2 from the regression. Prove that!
b) Prove that this is not true if the sample correlation is different from zero.
c) If the sample correlation is equal to zero, is there any reason to include the x2 covariate in the regression? (We are only interested in the β1 coefficient.)