KTH Matematik  

Matematisk Statistik

Tid: 3 juni 2015 kl 13.15-14.00.

Seminarierummet 3424, Institutionen för matematik, KTH, Lindstedtsvägen 25, plan 4. Karta!

Föredragshållare: Anders Pettersson

Titel: High-dimensional classification models with applications to email targeting (Master's thesis)

Abstract Email communication is valuable for any modern company, since it offers an easy mean for spreading important information or advertising new products, features or offers and much more. To be able to identify which customers that would be interested in certain information would make it possible to significantly improve a company's email communication and as such avoiding that customers start ignoring messages and creating unnecessary badwill. This thesis focuses on trying to target customers by applying statistical learning methods to historical data provided by the music streaming company Spotify. An important aspect was the high-dimensionality of the data, creating certain demands on the applied methods. A binary classification model was created, where the target was whether a customer will open the email or not. Two approaches were used for trying to target the costumers, logistic regression, both with and without regularization, and random forest classifier, for their ability to handle the high-dimensionality of the data. Performance accuracy of the suggested models were then evaluated on both a training set and a test set using statistical validation methods, such as cross-validation, ROC curves and lift charts. The models were studied under both large-sample and high-dimensional scenarios. The high-dimensional scenario represents when the number of observations, N, is of the same order as the number of features, p and the large sample scenario represents when N  p. Lasso-based variable selection was performed for both these scenarios, to study the informative value of the features. This study demonstrates that it is possible to greatly improve the opening rate of emails by targeting users, even in the high dimensional scenario. The results show that increasing the amount of training data over a thousand fold will only improve the performance marginally. Rather efficient customer targeting can be achieved by using

The full report (pdf)

Till seminarielistan
To the list of seminars

Sidansvarig: Filip Lindskog
Uppdaterad: 25/02-2009