Tidyverse solutions for Factor Analysis / Principal Component Analysis

tbradley · January 25, 2018, 5:08pm

You can do it similar to other types of modelling methods using list-columns and purrr, broom, and dplyr functions.

library(tidyverse)
library(broom)
library(ggfortify) # for plotting pca

iris_pca <- iris %>% 
  nest() %>% 
  mutate(pca = map(data, ~prcomp(.x %>% select(-Species), center = TRUE, scale = TRUE)), 
         pca_tidy = map2(pca, data, ~broom::augment(.x, data = .y)))

That will give you this:

> iris_pca
# A tibble: 1 x 3
  data                   pca          pca_tidy               
  <list>                 <list>       <list>                 
1 <data.frame [150 x 5]> <S3: prcomp> <data.frame [150 x 10]>

From there you can calculate the variance explained by each dimension like this:

iris_pca %>%
  unnest(pca_tidy) %>% 
  summarize_at(.vars = vars(contains("PC")), .funs = funs(var)) %>% 
  gather(key = pc, value = variance) %>% 
  mutate(var_exp = variance/sum(variance))

which gives you this:

# A tibble: 4 x 3
  pc         variance var_exp
  <chr>         <dbl>   <dbl>
1 .fittedPC1   2.92   0.730  
2 .fittedPC2   0.914  0.229  
3 .fittedPC3   0.147  0.0367 
4 .fittedPC4   0.0207 0.00518

You can also plot using this same setup:

iris_pca %>% 
  mutate(
    pca_graph = map2(
      .x = pca, 
      .y = data,
      ~ autoplot(.x, loadings = TRUE, loadings.label = TRUE, 
                 data = .y, colour = "Species")
    )
  ) %>% 
  pull(pca_graph)

which gives you this:

You can extend this as far as you would like, with multiple models by grouping prior to nesting. You can also extend the graph easily just like you would for any other ggplot. To see some of the options for the autoplot pca specific graph, you can look at ?ggfortify::autoplot.prcomp. It is worth noting that the autoplot.prcomp method only takes the UK spelling of colour = .

I am not that familiar with factor analysis but I would imagine you could use this same workflow with it. The big dependency is whether broom has a tidy/augment/glance method for that model type. But even if it doesn't, you can still access the model once it is in a list column, it just may look a little more hacky.