---
title: "Illustration of PCA"
author: "Pierre Nyquist"
date: "September 20, 2016"
output: html_document
---

```{r echo=FALSE}
# Initiate MASS library and remove list and graphics
# Clear workspace
rm(list=ls())
graphics.off()

library(MASS)
```

This is an illustration of the method known as Principal Component Analysis (PCA). You are encouraged to play around with the parameters, particularly sample size $N$ and correlation $\rho$, to see what effect any changes might have on the outcome.

As our data we draw a sample from the two-dimensional normal distribution with mean vector $\mu$
```{r}
mu <- c(2,4)
mu
````
and covariance matrix $\sigma$
```{r}
rho = 0.8
sigma <- matrix(c(1,rho, rho,1), nrow=2, ncol=2)
rho
sigma
```
where $\rho$ determins the correlation between the two components. We take $N$ copies of such two-dimensional random variables $X=(X_1, X_2)$; the following figure shows a scatterplot of such a sample. 
```{r}
N = 10000
X <- mvrnorm(N,mu,sigma)
# Scatter plot of X=(X_1,X_2)
plot(X[,1], X[,2], xlab="", ylab="", main="Correlation > 0")
```

Here $\rho > 0$ and we observe an ellipsodial shape with main axes not parallel to the coordinate axes. To compare we also plot a sample where $\rho = 0$
```{r}
rho = 0
sigma2 <- matrix(c(1,rho, rho,1), nrow=2, ncol=2)
X2 <- mvrnorm(N,mu,sigma2)
plot(X2[,1], X2[,2], xlab="", ylab="", main="No correlation")
```

## PCA
We now extract the principal components associated with our two data sets $\mathbf{X}_1$ (correlation >0) and $\mathbf{X}_2$ (no correlation). First, we estimate mean vector (same for both) and covariance matrices:
```{r}
muEst <- colMeans(X)
covEst <- cov(X)
covEst2 <- cov(X2)
# Display estimates
muEst
covEst
covEst2
```

Second, we find the eigenvectors associated with the two covariance matrices. 
```{r}
# Find eigenvectors & eigenvalues
# Columns are eigenvectors
eigen <- eigen(covEst)
# Define o1 and o2 as orthogonal vectors
o1 <- eigen$vectors[,1]
o2 <- eigen$vectors[,2]
eigen2 <- eigen(covEst2)
# Define o1 and o2 as orthogonal vectors
o1.nocorr <- eigen2$vectors[,1]
o2.nocorr <- eigen2$vectors[,2]
```
For the case where $\rho >0$ we have
```{r}
o1
o2
```
and for $\rho = 0$ the eigenvectors are
```{r}
o1.nocorr
o2.nocorr
```
The next two figures show the eigenvectors for the two cases:
```{r echo=FALSE} 
o = rbind(o1, o2)
plot(o, xlim = c(-1,1), ylim=c(-1, 1), main="Principal components, correlation > 0")
segments(0,0, o1[1], o1[2])
segments(0,0, o2[1], o2[2])
```

```{r echo=FALSE}
o.nocorr = rbind(o1.nocorr, o2.nocorr)
plot(o.nocorr, xlim = c(-1,1), ylim=c(-1, 1), main="Principal components, correlation = 0")
segments(0,0, o1.nocorr[1], o1.nocorr[2])
segments(0,0, o2.nocorr[1], o2.nocorr[2])
```

We now plot the original data and the transformed data - obtained by centering the data around 0 and then multiplying with the orthogonal matrices - in the same scatterplot.

```{r echo=FALSE}
# Center point cloud around 0
X[,1]=X[,1]-muEst[1]
X[,2]=X[,2]-muEst[2]

# Scatterplot
plot(X[,1], X[,2], xlab="X1", ylab="X2", main="Correlation > 0")
#lines(eigen$vectors[,2], type="o", col="black", xlab="", ylab="")
#lines(eigen$vectors[,1], type ="o", col="red")

# Compute transformed data
Xstar = t(eigen$vectors)%*%t(X)
points(t(Xstar), col="red")
```

```{r echo=FALSE}
# Center point cloud around 0
X2[,1]=X2[,1]-muEst[1]
X2[,2]=X2[,2]-muEst[2]

# Scatterplot
plot(X2[,1], X2[,2], xlab="X1", ylab="X2", main="No correlation")
#lines(eigen$vectors[,2], type="o", col="black", xlab="", ylab="")
#lines(eigen$vectors[,1], type ="o", col="red")

# Compute transformed data
Xstar2 = t(eigen2$vectors)%*%t(X2)
points(t(Xstar2), col="red")

```

Comparing the two we clearly see the effect the transformation has (or not) on the shape of the original data. Lastly, we can compute the ratio between the individal eigenvalues and the sum of the eigenvalues for both cases. For $\rho >0$,
```{r} 
eigen$values/sum(eigen$values)
```
and for $\rho = 0$.
```{r} 
eigen2$values/sum(eigen2$values)
```
This further illustrates how in the first case the PCA is able to determine directions of importance, whereas in the second case the analysis does not provide any new, useful information.