How to interpret principle component analysis output (PCA)

I'm trying to understand how to interpret the output from PCA (prcomp)

       PC1   PC2

Variable1 0.777 -0.762
Variable2 -0.378 0.762
Variable3 -0.547 -1.934
Variable4 -1.085 -0.017

  1. What are the most important variables in PC1 - would it be Variable 1 and 4?
  2. Which variable is more important in PC2 - variable 1, or 2?
  1. Yes.
  2. Variable 3 is the most important for PC2. Variables 1 and 2 contribute equally (have equal projection) along PC2.
1 Like

However, take into account that the contribution of the variables to the principal components depends also on the variance of each variable. This can be misleading if different variables have different unit of measurements, because in that case you would be comparing apples with oranges. To avoid this problem, if your variables have different units of measurements, you should standardize the data first: see for example

https://onlinecourses.science.psu.edu/stat505/node/55/

In R, you can also achieve this simply by (X is your design matrix):

prcomp(X, scale = TRUE)

By the way, independently of whether you choose to scale your original variables or not, you should always center them before computing the PCA. I believe this should be done automatically by prcomp, but you can verify it by running

prcomp(X)

and

prcomp(X, center = TRUE)

and checking that the results are the same.

6 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.