KTH Mathematics  


Regression analysis SF2930
Course content and objectives:

This course offers an introduction to regression modeling with applications. The presentation begins with linear (single and multiple) models as they are simple yet tremendously useful in many applications. For these models, fitting, parametric and model inference as well as prediction will be explained. A special attention will be paid to the diagnostic strategies which are key components of good model fitting. Further topics include transformations and weightings to correct model inadequacies, the multicollinearity issue and shrinkage regression methods, variable selection and model building techniques. Later in the course, some general strategies for regression modeling will be presented with a particular focus on the generalized linear models (GLM) using the examples with binary and count response variables.

As the high-dimensional data, order of magnitude larger than those that the classic regression theory is designed for, are nowadays a rule rather than an exception in computer-age practice (examples include information technology, finance, genetics and astrophysics, to name just a few), regression methodologies which can deal with high-dimensional scenarios are presented.

The twenty-first century has been an efflorence of computer-based regression techniques which are integrated into the course based on the statistical software package R.

The overall goal of the course is twofold: to acquaint students with the statistical methodology of the regression modeling and to develop advanced practical skills that are necessary for applying regression analysis to a real world data analytics problem The course is lectured and examined in English.

Recommended prerequisites:

  • SF1901 or equivalent course of the type 'a first course in probability and statistics'.
  • Multivariate normal distribution.
  • Basic differential and integral calculus, basic linear algebra.
Guest lecturers :
  • Filip Allard, Analyst atIf P&C Insurance, click
  • Sara Aschan, Analyst at If P&C Insurance, click
  • Marianne Fjelberg, Analyst at If P&C Insurance, click
  • Ekaterina Kruglov, Data Analyst at Intrum Justitia AB, click

Course literature and supplementary reading:

  • D. Montgomery, E. Peck, G. Vining: Introduction to Linear Regression Analysis. Wiley-Interscience, 5th Edition (2012). ISBN-10: 978-0-470-54281-1. 645 pages. Acronym below: MPV.
The textbook MPV can be bought at THS Kårbokhandel, Drottning Kristinas väg 15-19. There is a number of other books that cover the topics of the course. Here are some recommendations
  • G. James, D. Witten, T. Hastie, R. Tibshirani: An introduction to Statistical Learning.Web page for the book by the publisher Springer.
  • A. J. Izenman: Modern Multivariate Statistical Techniques. Regression, Classification, and Manifold Learning.Web page for the book by the publisher Springer. Acronym below: Iz.
  • T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning. Web page for the book. Springer, 2ed Edition, 2017.

Preliminary plan of lectures and exercises sessions.

  • Lecturers (in alphabetic order) AH=Alexandre Chotard, (guest lecture from KTH), TP=Tatjana Pavlenko, DB=Danie Berglund. Guest lecturers from If: Guest(If). The addresses of the lecture halls and guiding instructions are found by clicking on the Hall links below
  • Problems to be solved during the exercise sessions and recommended exercises to be solved on your own are found here.


Day Date Time Hall Topic Lecturer
1. Wed 17/01 10-12 M1 Lecture 1: Introduction (the course work and computer projects). Introduction of If P&C Insurance. Introduction to regression modeling. Simple linear regression: model fitting and inference. Chapter 2 in MPV.
TP
2. Thu 18/01 13-15 M1 Lecture 2: Simple linear regression: inference and prediction. Chapter 2 in MPV.
TP
3. Fri
19/01
8-10 D1 Exercise 1: Simple regression. Problem solving at the board and applications with R.
DB
4. Mon
22/01 13-15 M1 Lecture 3: Multiple linear regression: matrix notations, model fitting and properties of the estimates. Chapter 3 in MPV.
TP
5. Tue
23/01 15-17 M1 Lecture 4: Multiple linear regression: inference and prediction. Chapter 3 in MPV. Project I handout. TP
6. Wed
24/01 08-10 E1 Exercise 2: Multiple regression. Problem solving at the board and applications with R.

DB
7. Fri
26/01 8-10 F2 Lecture 5: Model adequacy checking. Residual analysis. Chapter 4 in MPV. TP
8. Mon
29/01 08-10 E1 Lecture 6: Model adequacy checking (cont.). Transformations to correct model model inadequacies. Chapters 4-5 in MPV.
TP
9. Wed
31/01 10-12 F2 Exercise 3: Model adequacy checking, theoretical exercises and applications with R. DB
10. Thu
01/02 10-12 F2
Lecture 7: Methods for detecting influential observations: leverage and measures of influence. Chapter 6 in MPV.
TP
11. Fre
02/02 08-10 D1 Lecture 8: Multicollinearity: sources and effects. Chapter 9 in MPV. TP
12. Mon
05/02 08-10 D1 Exercise 4: Diagnostic for leverage, influence and multicollinearity. Chapter 6 and 9 in MPV. Model diagnostics with R.
DB
13. Tue
06/02 13-15 D1 Lecture 9: Methods for dealing with multicollinearity. Model respecification: ridge and PCA regression. Chapter 9 in MPV.
TP
14. Thu
08/02 10-12 D1 Lecture 10: Variable selection and model building. Chapter 10 in MPV. Sparse modeling in high dimensions and the Lasso. Chapter 5 in Iz.
TP
15. Mon
12/02 13-15 M1
Exercise 5: Multicollinearity, ridge and Lasso regression, principal component regression (PCR). Ch. 10: Variable selection and model building with R. DB
16. Tue
13/02 13-15 M1 Lecture 11: Resampling techniques for model assessment and comparison. Chapter 5 in Iz. Bootstrapping in regression. Chapter 15.4 in MPV. TP
17. Thu
15/02 08-10 F2
Lecture 12: Relation to other methods of statistical machine learning: Regression and Classification, CART.
AC
18. Fri
16/02 08-10 M1
Lecture 13: Models with a binary response variable. Introduction to logistic regression.
EK
19. Mon
19/02 13-15 M1
Lecture 14: Generalized Linear Models (GLM) and exponential families. GLM modelling of binary response variables using logit-link functions. Project II handout.
Guest
(If)
20. Tue
20/02 13-15 D1
Exercise 6: GLM-modeling of Poisson regression. Hypotheses testing and model validation: Likelihood ratio test, Deviance and Wald test.
Guest
(If)
21. Wed
21/02 10-12 Baltzar, Christopher, Nils
Exercise 7: GLM-modeling with R.
Guest
(If)
22. Mon
26/02 10-12 M1
Lecture 15: Repetition/Reserve.

TP
23. Tue
27/02 13-15 M1
Lecture 16: Discussion on the Project II results. If presentation.
Guest
(If)
Tue
13/03 08-13 Q33 m.m. Exam. Deadline for Project I. TP
Fri
07/06 08-13 L51 m.m. Re-exam TP