| Course Page Timetable | |
| Current Information | ![]() |
12/4/2012. The exam is corrected
You can get it at the students' expedition. However, the course secretary is on vacation, so the result will not be reported until next week.
with answers.
Bonferroni revisited
After the last lecture, I had a discussion with a student again (not the same as once before) about Bonferroni's method. Since the logic obviously is tricky, I report the discussion here, so that you can benefit from it.
He maintained that the following is a valid procedure:
Say that we test a joint hypothesis consisting of three separate hypotheses. We want to have an experimental error rate of 5%. We test the three hypotheses separately, and get the three p-values 23%, 2.4% and 2.1%. Then we can do this: we reject the two hypotheses that have p-values 2.4% and 2.1%, since 5%/2 = 2.5%, and these two p-values fall below 2.5%. We divide by 2, since we reject two hypotheses.
This is wrong. Consider the following example.
Assume that all three hypotheses are true, and that the individual tests are independent. We choose 5% experimental error rate.
The probability thet exactly one p-value falls below 5% is 13.54%. So with probability 13.54% the following will happen:
The student rejects the joint hypothesis, arguing “I reject that single hypothesis whose p-value is less than 5%. The error risk is 5% divided by one, since I reject one hypothesis”.
He would also reject the joint hypothesis if exactly two p-values fall below 2.5%. The probability for that is 0.18%. He would also reject the joint hypothesis if all three p-values fall below 1.666%. The probability for that is negligible.
These outcomes are exclusive, so the probability that the student erroneously rejects the joint hypothesis, i.e., the experimental error rate, is hence 13.72%, not 5%.
If you have not registered for the exam
Don't e-mail me about it. I will not, nor can I, do anything about it. Contact Viviana Wallin and ask her about it. There might be some slack in some of the rooms.
Recent lectures
have been about the most recent exercises and also random models. I demonstrated both the classical approach and the ML approach in a simple case (the Mississippi River again.)
Here are now all exercises we have looked at in a pdf file.
The lecture Monday 5/5 will be held as scheduled.
I have recovered enough.
Wednesday 29/2/2012 (updated)
I talked about Problems and ”alternatives” with ANOVA. I find that I want to change a few details on page 8, so here is an updated version of that page.
Here are now four more exercises for your entertainment during the weekend. You can also of course do your secont assignment project now (exercise 27).
Monday 27/2/2012
We finished exercise 31 and liiked at exercise 32, which is a simple one way ANOVA. We can agree that in this course we remove a source iff. the F-value is less than one. This is equivalent to the regression standard error be smaller after the reduction of the model. I don't think this is always a good decision rule, but let us stick to it in this course.
Then we started to look at the same exercise where we do away with the assumptions about homoskedasticity and normal distributions. More next time.
Thursday 23/2/2012
We talked (and discussed) Bon Ferroni's method for simultaneous hypotheses and confidence intervals. We looked at exercise 33 and started to look at exercise 31.
One way ANOVA can be handeled by TI82 STATS: Put the data in lists, say L1, L2 and L3 (if three treatments). Then do STAT – TESTS – ANOVA(L1, L2, L3), and Whalla! (English for Voilà).
The probability that X>x for X∈F(r,n) is computed as DISTR – Fcdf(x, E99, r, n). (this corresponds to the EXCEL formula Fdist(x; r; n)) In order to find the 0.05-quantile of an F(r, n) distribution, we involve SOLVER: MATH – Solver 0 = Fcdf(X, E99, r, n) – 0.05 and start with X = 4. Go out with QUIT, and you have the 0.05 quantile in the X register.
Wednesday 22/2/2012
I did the exercise 25. We are now done with the 30 exercises, at least for the moment. Here are three exercises on ANOVA, there will be more eventually.
I talked about “one way anova” and started with “multiple comparisons”.
Thursday 16/2 and Monday 20/2/2012
We finished the exercises except number 25. We will took at that on Wednesday, and start with experimental design and ANOVA.
Wednesday 15/2/2012
I finished my presentation of the bootstrap.
Finally, I managed to mess up the solution of exercise 30. Here are the formulae we need for that exercise. These formulae are sometimes illumunating in similar contexts.
The lecture Wednesday 15/2/2012 will be given as scheduled
I am fairly recovered. Hope to see you tomorrow!
We will look at the exercises, and for exercise 30 the following is useful:
Let X be the covariate matrix as in the booklet, and X0 the covariate matrix with no intercept, i.e., the first column of X is deleted. Then the following holds:
Here a is a number and v a row vector. The sample covariance of two vectors is defined as
These are thus the entries in the covariance matrix whose inverse appears in the first formula.
The lecture Monday 13/2/2012 is cancelled
I'm ill. I hope I can make it until Wednesday. Stay tuned!
Thursday 9/2/2012
I showed the solution to exercise 22. Then I solved exercise 6b and demonstrated the F-test with White's robust covarlance matrix.
Then I talked about the bootstrap, and explained the fundamental ideas behind it. I didn't finish it, so I will tell more about bootstrapping next time.
Wednesday 8/2/2012
I talked a little more about the first assignment project and 2SLS. (I'm not quite well, currently, so I must have appeared somewhat lacking in concentration.)
Next I talked about heteroskadasticity and White's robust standard errors, expandaing the text in the booklet somewhat.
The first assignment project: You can do it alone or two together (but no more than two.) You can discuss the project with anyone who are currently reading the course, and you may also take help from them if something doesn't work out as expected. But it goes without saying that “cribbing” is not allowed.
Monday 6/2/2012
Yet a couple of examples of models with “endogeneity” (one of them was sample selection bias in “duration models”.)
Next I described the 2SLS method. Now you can start a little on the first assignment project. Don't forget to adjust the standard errors with the square root of the ratio of the proper sums of squares; see the boolket. Square root, since we are talking of the standard errors (not the variances.)
I didn't have time to say this: you should test if the given instrumental variables will work as follows: when you run the first regression, do an F-test to see if you can reject the hypothesis that all four coefficients for the instrumental variables are zero. If you can not reject that at a reasonable p-value, you are in trouble – the instrumental variables are not sufficiently well correlated with the endogeneous one.
Wednesday 1/2 and Thursday 2/2/2012
More discussions on “endogeneity”. In particular, we solved exercise 18, 26 and two problems from the exam last june (I will publish them soon.)
I then proved that non-systematic measurement errors of the covariates cause “endogeneity” (see the booklet.)
Finally, we solved exercise 28.
Monday 30/1/2012
I finished the description of Logit and Probit, showing that if we observe cohorts, rather than individual events, then we cen use OLS to estimate these models.
Next I proved the BLUES as in the booklet, but with much more detail.
Now we have finished chapter 1 in the econometrics booklet, so I started to talk a little on “endogeneity” i.e., the situatuion when there is a correlation between the error term and some of the covariates. I gave examples of “self selection bias” and “simultaneity” – in particular we ”solved” problem 13.
Thursday 26/1/2012
First we discussed exercises 14 and 15. I also gave a summary of what I expect you to be familiar with at this stage of the course, and I derived the formula at the bottom of page 9.
Then I talked about logit (and probit) regression. I go a little deeper into the subject at the lectures than in the booklet. In particular, I described the “Random Utility Model”. I will talk a little more about this on Tuesday.
Wednesday 25/1/2112
I proved, in all nasty detail, that the “trick” to calculate prediction intervals – described on p. 11, 12 – actually produces the correct prediction interval according to the formula just above the headline “A useful trick”.
Then we talked a little on model specification – including square of covariates and interaction terms, taking log of the dependent variable. We also talked about the interpretation of the regression coefficients.
Monday 23/1/2112
I did exercise 11 and 12 in accordance with the booklet:
I also derived some preliminary matrix formulas for the variance of a linear combination of random variables. Next wednesday I will continue, and prove mathematically that the “trick” to calculate the prediction interval actually produces the right thing.
Three typos in the exercises (fixed now)
Exercise 12: the numbers were slightly wrong, here is a (correct?) print out of the exercise
Exercise 20: “page 21” should read “page 19”
Exercise 22: “exercise 3” should read “exercise 9”
Exercises 9 and 10
printout of “gender discrimination”
printout of exercise 10.
Thursday 19/1/2012
First I showed that F calculated as
is an observation of an F-distributed random variable with 1 numerator df, n-k-1 denominator df (this is the df reported by EXCEL). The numerator inside the parenthesis is the difference between the estimated coefficient and the true coefficient, the denominator the standard error of the estimate.
Hence, if we plug in a value for β (e.g. zero,) then we can compute the p-value for the hypothesis that this is the true value as
p = fdist(F;1;df).
If we want to compute a confidence interval, the formula is
where Fα is the α quantile of the above F-distribution. In EXCEL it is computed as FInv(α;1;df).
Usually one doesn't take the square, and employ the t-distribution, which is equivalent to the above approach. I prefer the F-distribution, since it makes the procedures more unified, since we use the F-distribution in all other contexts.
We looked at the wage equation from yesterday again.
Then I talked about the F-test for joint hypotheses as (1.6) in the booklet, and I also showed an example on “Other Linear Restrictions” (p. 10 in the booklet.)
“Hans Roslings statistik” på SvT Play
Wednesday 18/1/2112
I started to describe in more detail the regression equation and the assumptions related to it. I introduced the matrix notation and derived the expression for the point estimates of the coefficients. I also showed an example using the data in exercise 9. We found evidence for gender discrimination related to wages in these data.
Tuesday 17/1/2112
I gave a brief introduction to multiple regression, and how to do that with “LINEST” in EXCEL. I also introduced the χ2-distribution, the t-distribution and the F-distribution.
Now you should be able to to exercise 7 and 8.
Here is the display you should get
from exercise 7. The number 5.103E-005 is the p-value
for the hypothesis that the two data sets come from distrinbutions
with equal means.
Here
is the display of exercise 8.