KTH Mathematics


Mathematical Statistics

SF2930 Regression Analysis - Course log and updates

Spring, 2018


This page presents the latest information about what is addressed in lectures and schedule changes. During the lectures the basic theory will be presented according to the plan. Observe that not all topics will be covered during the lectures and additional reading is required. Reading instructions will be provided below after each lecture.


Lecture 4: summary and reading instructions.

We continue discussion of the test procedures in the multiple linear regression. We start by repetition of the global test on the model adequacy and turn to the test procedures for individual regression coefficients, testing a subset of coefficient and test in the general linear hypothesis, see sections 3.3.1-3.3.4. It is important to understand why the partial F-test, presented in the section 3.3.2, measures the contribution of the subset of regressors into the model given that the the other regressors are included in the model. Check Appendix C.3.3-C.3.4 for details and go through example 3.5 where the partial F-test is illustrated. Go through the examples 3.6 and 3.7 of section 3.3.4 which demonstrate the unified approach for testing linear hypothesis about regression coefficients.

We further discuss the confidence intervals for the coefficients and the mean response. Read yourself sections 3.4.1-3.5. It is important to understand the difference between one-at-a-time confidence interval (marginal inference) for a single regression coefficient, and a simultaneous (or joint) confidence set for the whole vector of coefficients. Go through example 3.11, think about advantages and disadvantages of the two methods which have been presented: the joint confidence set given by (3.50) (confidence ellipse, see Fig. 3.8) and the Bonferroni-type correction strategy.

Standardization (centering and scaling) of the regression coefficients is presented in section 3.9. Check yourself the two approaches for standardization and the interpretation of the standardized regression coefficients. One application of the standardization step is presented further in section 3.10 where the problem of multicollinearity is presented. Check why and how the standardization is applied here, we will discuss the problem of multicollinearity in detail during lectures 8 and 9.

The phenomena of hidden extrapolation in prediction of a new observation using the fitted model will be discussed in detail in the beginning of Lecture 5. Go through the section 3.8, it is important to understand the structure of RVH and the role of hat matrix H in specifying the location of the new data point in the x-space. Go through the example 3.13 and inspect the related figures.


Lecture 3: summary and reading instructions.

Test and confidence regions for various parameters in the simple linear model were discussed with specific focus on the confidence region for the mean response and prediction interval. Be sure that you understand the difference between these type of intervals.

Multiple linear regression model was introduced, starting with matrix notations and turning then to the LS normal equations, their solutions and geometrical interpretation of the LS estimators. It is important to remember that, in general, any regression model that is linear in the coefficients (beta's) is a linear regression model, regardless of the shape of the surface it generates. Go through section the 3.2.1, be sure that understand the structure of the matrix X'X and the structure and the role of the hat matrix H. Go through the example 3.1 and graphical data presentation in section 3.2.1.

Go through the sections 3.2.3- 3.2.6 and check the properties of the parameter estimators obtained by both LS and ML approaches. Check also Appendix C.4 where the optimality of the LS estimators are stated in Gauss-Markov theorem.

We discuss shortly the global test in multiple linear regression. Go through the section 3.3.1, check the assumptions for constructing the tests of significance, computation formulas for ANOVA representation and read about checking the model adequacy using adjusted coefficient of determination. Think why this adjustment is needed? I will preset the details during the next lecture.

The exercises selected for the second exercise session on Monday 24th of January are on the home page, see link Exercises.


Lecture 2: summary and reading instructions.

Tests of significance and confidence intervals for the slope, intercept and the variance of the error term were discussed for the simple linear regression model. Go through numerical examples and check graphs in Sections 2.3.1-2.3.2 of MPV,. Fundamental analysis-of-variance (ANOVA) identity was presented along with the test of significance of regression. It is very important to understand how the partition of the total variability in the response variable is obtained and how the ANOVA-based F-test is derived, this strategy will be used through the whole course, specifically in the multiple linear regression models which will be presented during the next two lectures. Go through the Section 2.3.3 and check why F-test is equivalent to the t-test when testing significance of regression in the simple regression model.

The concepts of confidence interval for the mean response and prediction interval for the future observation were presented. Go through the Section 2.4.2, check numerical examples 2.6 and 2.7, it is important to understand what is the principle difference between these two types of intervals and how they suppose to be used in the regression analysis.

Read yourself Section 2.9 where some abuse of regression modeling are discussed and Section 2.10 where no-intercept regression model is presented as a special type of modeling (the idea is to force the intercept to be zero). Check numerical examples of Section 2.10 and think about the differences with previously presented model (that includes intercept term), focus specifically on the properties of the coefficient of determination.

Go through the Section 2.11 and convince yourself that the ML estimators of the slope and intercept are identical to those obtained by LS approach, this does not hold for the variance estimator, check why.

A short discussion of the case of the random regressor is presented in Section 2.12, check it, I will shortly discuss it during the next lecture.

Observe that the exercises selected for the first exercise session on Friday 19th of January are on the home page, see link Exercises.


Lecture 1: summary and reading instructions.

Introduction to the regression analysis was presented. See slides below: The simple linear regression model was discussed in detail, including basic assumptions on equal variance of the error term, linearity and independence. LS fitting strategy was discussed along with the properties of the obtained estimators of the regression coefficients. Go through these properties once again, read Sections 2.2.2--2.2.3 of MPV, check normal equations given by (2.5), p. 14 and their solutions, show that both LS estimators of the slope and intercept are unbiased and find their variances. There are three sources

Go through the Ex 2.1 and Ex 2.2 to see the numerical calculations for the LS fit, read about residual properties and check the general properties 1.--5. of the LS fit presented on p. 20 of MPV.

Go through the Section 2.3.1--2.3.2 and check which additional assumptions are needed to perform the tests of significance on the slope and intercept. I will discuss this in detail during the next lecture.

To Mathematical Statistics
To Mathematical Statistics Courses
Published by: Tatjana Pavlenko.
Uppdated: 07/12-2017