This is an illustration of the method known as Principal Component Analysis (PCA). You are encouraged to play around with the parameters, particularly sample size $$N$$ and correlation $$\rho$$, to see what effect any changes might have on the outcome.

As our data we draw a sample from the two-dimensional normal distribution with mean vector $$\mu$$

mu <- c(2,4)
mu
## [1] 2 4

and covariance matrix $$\sigma$$

rho = 0.8
sigma <- matrix(c(1,rho, rho,1), nrow=2, ncol=2)
rho
## [1] 0.8
sigma
##      [,1] [,2]
## [1,]  1.0  0.8
## [2,]  0.8  1.0

where $$\rho$$ determins the correlation between the two components. We take $$N$$ copies of such two-dimensional random variables $$X=(X_1, X_2)$$; the following figure shows a scatterplot of such a sample.

N = 10000
X <- mvrnorm(N,mu,sigma)
# Scatter plot of X=(X_1,X_2)
plot(X[,1], X[,2], xlab="", ylab="", main="Correlation > 0")

Here $$\rho > 0$$ and we observe an ellipsodial shape with main axes not parallel to the coordinate axes. To compare we also plot a sample where $$\rho = 0$$

rho = 0
sigma2 <- matrix(c(1,rho, rho,1), nrow=2, ncol=2)
X2 <- mvrnorm(N,mu,sigma2)
plot(X2[,1], X2[,2], xlab="", ylab="", main="No correlation")

## PCA

We now extract the principal components associated with our two data sets $$\mathbf{X}_1$$ (correlation >0) and $$\mathbf{X}_2$$ (no correlation). First, we estimate mean vector (same for both) and covariance matrices:

muEst <- colMeans(X)
covEst <- cov(X)
covEst2 <- cov(X2)
# Display estimates
muEst
## [1] 2.014046 4.009036
covEst
##           [,1]      [,2]
## [1,] 0.9947950 0.7916366
## [2,] 0.7916366 0.9862532
covEst2
##            [,1]       [,2]
## [1,] 1.00113238 0.01787527
## [2,] 0.01787527 1.02141634

Second, we find the eigenvectors associated with the two covariance matrices.

# Find eigenvectors & eigenvalues
# Columns are eigenvectors
eigen <- eigen(covEst)
# Define o1 and o2 as orthogonal vectors
o1 <- eigen$vectors[,1] o2 <- eigen$vectors[,2]
eigen2 <- eigen(covEst2)
# Define o1 and o2 as orthogonal vectors
o1.nocorr <- eigen2$vectors[,1] o2.nocorr <- eigen2$vectors[,2]

For the case where $$\rho >0$$ we have

o1
## [1] -0.7090116 -0.7051968
o2
## [1]  0.7051968 -0.7090116

and for $$\rho = 0$$ the eigenvectors are

o1.nocorr
## [1] 0.5032500 0.8641409
o2.nocorr
## [1] -0.8641409  0.5032500

The next two figures show the eigenvectors for the two cases: