Understanding implications of factor_key TRUE/FALSE in gather


#1

Hi,
I just wanted to plot some boxplots for some 50 variables or so and wasn’t aware of the factor_key option in gather ;-( !
And so it happened that it messed up all the titles in my facet box-plot. Since I didn’t have domain knowledge I didn’t notice the error before it was (kind of) too late. (Nobody was hurt!)

I was just wondering what is the reasoning behind the factor_key option and it’s default FALSE setting? When do we not want to keep the ordering in the key columns?

I’ve attached the example code to clarify the problem.

Would be great to hear your thoughts on this.

Thanks,
Till

This is how it looks with the default factor_key=FALSE setting:

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
library(ggplot2)
library(reprex)
my_selection <- c("disp", "hp", "qsec", "wt", "vs", "carb")
p <- mtcars %>%
  select(mpg, cyl, my_selection) %>%
  gather(key = variables, value = var_values, my_selection, factor_key = FALSE)  %>%
  ggplot(aes(x = as.factor(cyl), y = var_values)) +
  geom_boxplot() +
  geom_point() +
  labs(x = "cyl") +
  labs(y = "var_values") +
  facet_wrap(~ variables) +
  theme_bw()
plot(p)


and this is how it looks with the optional factor_key=TRUE setting.

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
library(ggplot2)
library(reprex)
my_selection <- c("disp", "hp", "qsec", "wt", "vs", "carb")
p <- mtcars %>%
  select(mpg, cyl, my_selection) %>%
  gather(key = variables, value = var_values, my_selection, factor_key = TRUE)  %>%
  ggplot(aes(x = as.factor(cyl), y = var_values)) +
  geom_boxplot() +
  geom_point() +
  labs(x = "cyl") +
  labs(y = "var_values") +
  facet_wrap(~ variables) +
  theme_bw()
plot(p)

#2

Here comes the second plot! (wasn’t allowed to have two images in one post as a new user!)


#3

Just realized that this example doesn’t reproduce my problem since it reordered the facets and not only the titles as it happened in my original code. So, I’ll post the whole script and see if someone could spot the the real problem.

activating_receptors_blod = c("NKp46blod", "NKp30blod", "NKG2Dblod", "NKG2Cblod", "CD161blod", "CD16blod") 
activating_receptors_bonemarrow = c("NKp46BM", "NKp30BM", "NKG2DBM", "NKG2CBM", "CD161BM", "CD16BM")
inhibiting_receptors_blod = c("PD1blood", "NKG2Ablod")
inhibiting_receptors_bonemarrow = c("PD1BM", "NKG2ABM")
chemotactic_receptors_blod = c("CX3CR1blod", "CXCR4blod", "CXCR6blod", "CXCR3blod")
chemotactic_receptors_bonemarrow = c("CX3CR1BM", "CXCR4BM", "CXCR6BM", "CXCR3BM")
adhesion_receptors_blod = c("DNAMblod")
adhesion_receptors_bonemarrow= c("DNAMBM")
other_receptors_blod = c("CD8blod", "CD57blod")
other_receptors_bonemarrow= c("CD8BM", "CD57BM")

list_of_receptor_groups = list(activating_receptors_blod, activating_receptors_bonemarrow, inhibiting_receptors_blod, inhibiting_receptors_bonemarrow, chemotactic_receptors_blod, chemotactic_receptors_bonemarrow, adhesion_receptors_blod, adhesion_receptors_bonemarrow, other_receptors_blod, other_receptors_bonemarrow)

leukemi2 <- mutate(leukemi2, tp_controls = Timepoint) %>%
  mutate(tp_controls = replace(tp_controls, is.na(Timepoint), 0)) 

leukemi2 <- filter(leukemi2, Diagnose %in% c("0", "3")) %>%
  filter(tp_controls == 0) %>%
  filter(Celle == 3)

View(leukemi2)

make_receptor_facet <- function(dataset, receptor_list){
  dataset %>%
    select(ID, Riskgrp, receptor_list) %>%
    gather(key=receptor_name, value=percent_receptor_expression, receptor_list, factor_key=TRUE) %>%
    ggplot(aes_string(x=as.factor(.$Riskgrp), y=.$percent_receptor_expression )) +
    geom_boxplot() +
    geom_jitter() +
    labs(x = "Risikogruppe") +
    labs(y = "%-reseptor") +
    facet_wrap(~ receptor_name) +
    theme_bw()
}

plots <- lapply(list_of_receptor_groups, make_receptor_facet, data=leukemi2)

lapply(plots, plot)

#4

Just found out that the not consistent use of either standard or non standard evaluation inside the function caused my problems. Once I’ve cleanded this up, everything works as expected, what means that just the ordering of the graphs in the facets is affected by the value of factor_key.