Principal Component Question

statistics

#1

Hello

I am a relatively inexperienced user of R studio. I have performed a PCA using the princomp() function. From the loadings I can tell that the actual values of all the variables I used in the analysis are inversely related to Principal Component 1 scores (all scores are negative). This means when I then use PC1 scores to generate e.g. a boxplot, the higher PC1 scores indicate lower actual values of my variables, which is quite unintuitive for a reader. Is there any way I can reverse the 'direction' of the scores for PC1 to make it more intuitive?

Thank you in advance. Any advice much appreciated!


#2

One of the raps on PCA is the difficulty of intuitively understanding multiple dimensional projections onto the Cartesian plane. Our little heads can barely handle 3-D at rest or in motion (4-D).

From the princomp documentation

# The variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE)
summary(pc.cr <- princomp(USArrests, cor = TRUE))
loadings(pc.cr)  # note that blank entries are small but not zero
## The signs of the columns of the loadings are arbitrary
plot(pc.cr) # shows a screeplot.
biplot(pc.cr)

Shows the two standard representations. The first is of help really only to the analyst; the second may help non-technical audiences get a sense of the relative contribution of PCs.

In any non-trivial dataset, we're dealing with variances and so it will usually be the case were decomposition will result in negative values.


#3

Also note that the set of components are unique up to sign if the data are centered (which they should be). In other words, you can get the same scores solution using a rotation matrix of

[-1.0,  0.5,  0.3,
  2.1, -0.1,  1.1,
 -3.5,  9.1,  0.1]

as you would with

[ 1.0, -0.5, -0.3,
 -2.1,  0.1, -1.1,
  3.5, -9.1, -0.1]

This means that you can't really rely on the sign of a single component's loadings to make comments about direction. You can still draw conclusions from magnitude and similarity. For example, if two predictors have almost identical values in a loading vector, they are likely to have a high correlation.