Course content and objectives:
This course offers an introduction to regression modeling with applications. The presentation begins with linear (single and multiple) models as they are simple yet tremendously useful in many applications. For these models, fitting, parametric and model inference as well as prediction will be explained.
A special attention will be paid to the diagnostic strategies which are key components of good model fitting. Further topics include transformations and weightings to correct model inadequacies, the multicollinearity issue and shrinkage regression methods, variable selection and model building techniques. Later in the course, some general strategies for regression modeling will be presented with a particular focus on the generalized linear models (GLM) using the examples with binary and count response variables.
As the highdimensional data, order of magnitude larger than those that the classic regression theory is designed for, are nowadays a rule rather than an exception in computerage practice (examples include information technology, finance, genetics and astrophysics, to name just a few), regression methodologies which can deal with highdimensional scenarios are presented.
The twentyfirst century has been an efflorence of computerbased regression techniques which are integrated into the course based on the statistical software package R.
The overall goal of the course is twofold: to acquaint students with the statistical methodology of the regression modeling and to develop advanced practical skills that are necessary for applying regression analysis to a real world data analytics problem
The course is lectured and examined in English.
Recommended prerequisites:
 SF1901 or equivalent course of the type 'a first course in probability and statistics'.
 Multivariate normal distribution.
 Basic differential and integral calculus, basic linear algebra.
Guest lecturers :
 Filip Allard, Analyst atIf P&C Insurance, click
 Sara Aschan, Analyst at If P&C Insurance, click
 Marianne Fjelberg, Analyst at If P&C Insurance, click
 Ekaterina Kruglov, Data Analyst at Intrum Justitia AB, click
Course literature and supplementary reading:
 D. Montgomery, E. Peck, G. Vining: Introduction to Linear Regression Analysis.
WileyInterscience, 5th Edition (2012). ISBN10: 9780470542811. 645 pages. Acronym below: MPV.
The textbook MPV can be bought at THS Kårbokhandel, Drottning Kristinas väg 1519.
There is a number of other books that cover the topics of the course. Here are some recommendations
 G. James, D. Witten, T. Hastie, R. Tibshirani: An introduction to Statistical Learning.Web page for the book by the publisher Springer.
 A. J. Izenman: Modern Multivariate Statistical Techniques. Regression, Classification, and Manifold Learning.Web page for the book by the publisher Springer. Acronym below: Iz.
 T. Hastie, R. Tibshirani, J. Friedman: The Elements of
Statistical Learning. Web page for the book. Springer, 2ed Edition, 2017.
Preliminary plan of lectures and exercises sessions.
 Lecturers (in alphabetic order) AH=Alexandre Chotard, (guest lecture from KTH), TP=Tatjana Pavlenko, DB=Danie Berglund. Guest lecturers from If: Guest(If). The addresses of the lecture halls and guiding instructions are found by clicking on the Hall links below
 Problems to be solved during the exercise sessions and recommended exercises to be solved on your own are found here.
Day 
Date 
Time 
Hall 
Topic 
Lecturer 
1. Wed 
17/01 
1012 
M1

Lecture 1: Introduction (the course work and computer projects). Introduction of If P&C Insurance.
Introduction to regression modeling. Simple linear regression: model fitting and inference. Chapter 2 in MPV.

TP 
2. Thu 
18/01 
1315 
M1

Lecture 2: Simple linear regression: inference and prediction. Chapter 2 in MPV.

TP 
3. Fri

19/01

810 
D1 
Exercise 1: Simple regression. Problem solving at the board and applications with R.

DB

4. Mon

22/01 
1315 
M1 
Lecture 3: Multiple linear regression: matrix notations, model fitting and properties of the estimates.
Chapter 3 in MPV.

TP

5. Tue

23/01 
1517 
M1 
Lecture 4: Multiple linear regression: inference and prediction. Chapter 3 in MPV. Project I handout. 
TP

6. Wed

24/01 
0810 
E1 
Exercise 2: Multiple regression. Problem solving at the board and applications with R. 
DB

7. Fri

26/01 
810 
F2 
Lecture 5: Model adequacy checking. Residual analysis. Chapter 4 in MPV. 
TP

8. Mon

29/01 
0810 
E1 
Lecture 6: Model adequacy checking (cont.). Transformations to correct model model inadequacies.
Chapters 45 in MPV.

TP

9. Wed

31/01 
1012 
F2 
Exercise 3: Model adequacy checking, theoretical exercises and applications with R.

DB

10. Thu

01/02 
1012 
F2

Lecture 7: Methods for detecting influential observations: leverage and measures of influence. Chapter 6 in MPV.

TP

11. Fre

02/02 
0810 
D1 
Lecture 8: Multicollinearity: sources and effects. Chapter 9 in MPV.

TP

12. Mon

05/02 
0810 
D1 
Exercise 4: Diagnostic for leverage, influence and multicollinearity.
Chapter 6 and 9 in MPV. Model diagnostics with R.
 DB

13. Tue
 06/02 
1315 
D1 
Lecture 9: Methods for dealing with multicollinearity. Model respecification: ridge and PCA regression. Chapter 9 in MPV.

TP

14. Thu

08/02 
1012 
D1 
Lecture 10: Variable selection and model building. Chapter 10 in MPV.
Sparse modeling in high dimensions and the Lasso. Chapter 5 in Iz.

TP

15. Mon

12/02 
1315 
M1

Exercise 5: Multicollinearity, ridge and Lasso regression, principal component regression (PCR). Ch. 10:
Variable selection and model building with R. 
DB

16. Tue

13/02 
1315 
M1 
Lecture 11: Resampling techniques for model assessment and comparison. Chapter 5 in Iz. Bootstrapping in regression. Chapter 15.4 in MPV.

TP

17. Thu

15/02 
0810 
F2

Lecture 12: Relation to other methods of statistical machine learning:
Regression and Classification, CART.

AC

18. Fri

16/02 
0810 
M1

Lecture 13: Models with a binary response variable. Introduction to logistic regression.

EK

19. Mon

19/02 
1315 
M1

Lecture 14: Generalized Linear Models (GLM) and exponential families.
GLM modelling of binary response variables using logitlink functions. Project II handout.

Guest (If)

20. Tue

20/02 
1315 
D1

Exercise 6: GLMmodeling of Poisson regression. Hypotheses testing and model validation: Likelihood
ratio test, Deviance and Wald test.

Guest (If)

21. Wed

21/02 
1012 
Baltzar, Christopher, Nils

Exercise 7: GLMmodeling with R.

Guest (If)

22. Mon

26/02 
1012 
M1

Lecture 15: Repetition/Reserve.

TP

23. Tue

27/02 
1315 
M1

Lecture 16: Discussion on the Project II results. If presentation.

Guest (If)

Tue

13/03 
0813 
Q33 m.m. 
Exam. Deadline for Project I. 
TP

Fri

07/06 
0813 
L51 m.m. 
Reexam 
TP

