How to re-order levels of a categorical variable (turn OFF alphabetic ordering??)

Hi-

I am looking to re-order a categorical variable. I have the dataframe ordered the way that I want,

Aminoacid_data <- read_csv("~/Desktop/Aminoacid_data.csv")

in this dataset, there is a categorical variable called "Species" and I viewed the labels in the correct order here:

Aminoacid_data$Species

[1] "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "At" "Ns" "Ns" "Ns"
[22] "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Ns" "Mc" "Mc" "Mc" "Mc" "Mc" "Mc"
[43] "Mc" "Mc" "Mc" "Mc" "Mc" "Mc" "Mc" "Mc" "Mc" "Mc" "Mc" "Mc" "Lj" "Lj" "Lj" "Lj" "Lj" "Lj" "Lj" "Lj" "Lj"
[64] "Lj" "Lj" "Lj" "Lj" "Lj" "Lj" "Lj" "Lj" "Lj" "Ps" "Ps" "Ps" "Ps" "Ps" "Ps" "Ps" "Ps" "Ps" "Ps" "Ps" "Ps"
[85] "Ps" "Ps" "Ps" "Ps" "Ps" "Ps"

OK the levels go in the order of At, Ns, Mc, Lj, Ps

So I next make a summary of the mean and SD of each group so that I can plot them, using this code:

AA_summary <- Aminoacid_data %>% 
  group_by(Species, Genotype, Timepoint) %>%  
  summarise(mean_Asn=mean(Asn), mean_Ile=mean(Ile), mean_Total=mean(Total_amino_acids), sd_Asn=sd(Asn), sd_Ile=sd(Ile), sd_Total=sd(Total_amino_acids))

I then check the order of the levels:

AA_summary$Species

AA_summary$Species
[1] At At At At At At Lj Lj Lj Lj Lj Lj Mc Mc Mc Mc Mc Mc Ns Ns Ns Ns Ns Ns Ps Ps Ps Ps Ps Ps

THE ORDER OF THE LEVELS CHANGED GRRRR

So I tried re-ordering the levels:

AA_summary$Species <- factor(AA_summary$Species, levels=c("At", "Ns", "Mc", "Lj", "Ps"))

and I checked the order again but still got the output above, with the levels in this order:

At, Lj, Mc, Ns, Ps

It looks like the levels are in alphabetical order..... is there some way to turn this off??

Thanks,

Erik

In the tidyverse, you can use fct_relevel from the forcats package to reorder factor levels however you like. See here for the help page.

1 Like

Running this code line-by-line, should provide you with some insight

set.seed(421267)
x <- sample(x = LETTERS[1:5], size = 20, replace = TRUE)
x
x <- factor(x)
x
factor(x, levels = rev(levels(x)))
factor(x, levels = c("A", "D", "E", "C", "B"))

Hope it helps! :slightly_smiling_face:

> set.seed(421267)
> x <- sample(x = LETTERS[1:5], size = 20, replace = TRUE)
> x
 [1] "E" "D" "E" "A" "C" "A" "C" "A" "D" "A" "C" "B" "C" "C" "E" "E" "B" "B" "C" "A"
> x <- factor(x)
> x
 [1] E D E A C A C A D A C B C C E E B B C A
Levels: A B C D E
> factor(x, levels = rev(levels(x)))
 [1] E D E A C A C A D A C B C C E E B B C A
Levels: E D C B A
> factor(x, levels = c("A", "D", "E", "C", "B"))
 [1] E D E A C A C A D A C B C C E E B B C A
Levels: A D E C B

Thanks for the suggestion. However, I am still a bit confused as it looks like the levels haven't changed in order, if you look at the last three commands, the levels are supposed to be in different order, but they are actually not in the output? Or am I missing something?

Thanks,

Erik

The levels dictate the order, when plotting.

Example:

library("tidyverse")
set.seed(774724)
d <- tibble(label = sample(x = LETTERS[1:3], size = 20, replace = TRUE),
            value = rnorm(n = 20, mean = 10, sd = 2))

d %>% 
  mutate(label = factor(label, levels = c("A", "B", "C"))) %>% 
  ggplot(aes(x = label, y = value, colour = label)) + 
  geom_point()

d %>% 
  mutate(label = factor(label, levels = c("C", "B", "A"))) %>% 
  ggplot(aes(x = label, y = value, colour = label)) + 
  geom_point()

So set the order prior to plotting.

Hope it helps! :slightly_smiling_face:

Hi Leon-

Getting closer... Here is what I did, just as a test:

AA_summary %>% 
  mutate(Species = factor(Species, levels = c("At", "Ns", "Mc", "Lj", "Ps"))) %>% 
  ggplot(aes(x = Species, y = mean_Asn, colour = Genotype)) + 
  geom_point()

and I got this error message:

> AA_summary %>% 
+   mutate(Species = factor(Species, levels = c("At", "Ns", "Mc", "Lj", "Ps"))) %>% 
+   ggplot(aes(x = Species, y = mean_Asn, colour = Genotype)) + 
+   geom_point()
Error: Column `Species` can't be modified because it's a grouping variable

So when I calculated the mean and SD earlier, I did this code:

AA_summary <- Aminoacid_data %>% 
  group_by(Species, Genotype, Timepoint) %>%  
  summarise(mean_Asn=mean(Asn), mean_Ile=mean(Ile), mean_Total=mean(Total_amino_acids), sd_Asn=sd(Asn), sd_Ile=sd(Ile), sd_Total=sd(Total_amino_acids))

I have to group it by the species or else it will just pool all the genotypes together and I won't be able to plot my data the way I want.

Thanks,

Erik

Either set the factor order of Species before grouping and summarizing by Species:

AA_summary <- Aminoacid_data %>% 
  mutate(Species = factor(Species, levels = c("At", "Ns", "Mc", "Lj", "Ps"))) %>% 
  group_by(Species, Genotype, Timepoint) %>%  
  ...

Or, after grouping and summarizing run ungroup and then you'll be able to set the factor order:

AA_summary %>% 
   ungroup() %>% 
   mutate(Species=factor(Species, levels=c("At", "Ns", "Mc", "Lj", "Ps"))) %>% 
   ...

You've grouped your data on your Species variable. Run ungroup() prior to mutate()

Hi Joels-

Thanks, that worked to re-order the levels of the Species variable, however when I use the newly made AA_summary to make a new figure, it STILL ordered the levels as before:

Rplot_Asn_Erik.pdf (12.2 KB)

Here is the code I used to make the new figure:

bar_Asn <- ggplot(AA_summary, aes(x=Genotype, y=mean_Asn, fill=Genotype))+ 
  geom_col(show.legend=FALSE, alpha=1/2, color="black")+ 
  geom_jitter(data=Aminoacid_data, aes(x=Genotype, y=Asn), show.legend=FALSE)+
  facet_grid(Species~Timepoint, scales="free_y")+
  geom_errorbar(aes(ymin = mean_Asn - sd_Asn, ymax = mean_Asn + sd_Asn), width=0.2)+
  labs(title="Asparagine", y="µmol/gFW", x="Genotype")+
  theme_classic()+
  theme(plot.title=element_text(hjust=0, vjust=1, face="bold", size=14))+
  theme(axis.title.x=element_text(size=12, face="bold"), axis.title.y=element_text(size=12, face="bold"))+
  theme(axis.text.x=element_text(size=12), axis.text.y=element_text(size=12))+
  theme(strip.text.x=element_text(size=12))+
  theme(strip.text.y=element_text(size=12))+
  scale_fill_manual(breaks = c("WT", "pgm"), values=c("black", "red"))

bar_Asn

Do I need to mutate the variable in the code to make the figure too??

Thanks,

Erik

Hi Leon and @joels -

So I used both methods of mutating the variables here, and it seemed to work to reorder the data in AA_summary

Here is the code I used:

AA_summary <- Aminoacid_data %>% 
  mutate(Species=factor(Species, levels=c("At", "Ns", "Mc", "Lj", "Ps"))) %>%
  group_by(Species, Genotype, Timepoint) %>%  
  summarise(mean_Asn=mean(Asn), mean_Ile=mean(Ile), mean_Total=mean(Total_amino_acids), sd_Asn=sd(Asn), sd_Ile=sd(Ile), sd_Total=sd(Total_amino_acids))

And the other method, ungrouping after making the AA_summary dataset and before the mutation:

AA_summary %>%
  ungroup() %>%
  mutate(Species=factor(Species, levels=c("At", "Ns", "Mc", "Lj", "Ps")))

I realize in my last comment I included a lot of superfluous code when making the figure, so here is a simplified version that hopefully you guys can help me troubleshoot:

test_Asn <- ggplot(AA_summary, aes(x=Genotype, y=mean_Asn, fill=Genotype))+ 
  geom_col(show.legend=FALSE, alpha=1/2, color="black")+ 
  geom_jitter(data=Aminoacid_data, aes(x=Genotype, y=Asn), show.legend=FALSE)+
  facet_grid(Species~Timepoint, scales="free_y")

I put this code in after checking to make sure the levels were in the correct order using View(AA_summary)

but it still gave the facet grid in alphabetical order, here is the figure:

test_Asn.pdf (12.5 KB)

Just to reiterate, I want the order of the horizontal panels to be (from top to bottom) "At", "Ns", "Mc", "Lj", "Ps"

Thanks again for all of your help!!

Erik

I was actually able to figure it out with a friend from my lab.

Basically, I used factor() instead of mutate()

Then I needed to make sure that both AA_summary and Aminoacid_data had $Species as a factor with the correct order of levels.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

OK so I tried using the forcats package, here is the code:

install.packages("forcats")

library(forcats)

fct_relevel(AA_summary$Species, c("At", "Ns", "Mc", "Lj", "Ps"))

So I went to check the levels again and they still were unchanged! BUT, here is the output:

[1] At At At At At At Lj Lj Lj Lj Lj Lj Mc Mc Mc Mc Mc Mc Ns Ns Ns Ns Ns Ns Ps Ps Ps Ps Ps Ps
Levels: At Ns Mc Lj Ps

So if you notice on the bottom line, the code worked, it's saying that the levels are in the correct order, but this somehow doesn't translate to the actual dataset..

Thanks,

Erik

The Levels: line tells you the levels of your factor variable, the line above that starting with [1] are the elements of the variable. If you look at the Levels: for the three last commands, they do change, from alphabetic to reversed-alphabetic to custom / hard-coded.

OK, that makes sense. Now comes the part why I want to change the order of levels... I'm trying to make a figure with a specific order of panels. Here is the code I am using:

bar_Asn <- ggplot(AA_summary, aes(x=Genotype, y=mean_Asn, fill=Genotype))+ 
  geom_col(show.legend=FALSE, alpha=1/2, color="black")+ 
  geom_jitter(data=Aminoacid_data, aes(x=Genotype, y=Asn), show.legend=FALSE)+
  facet_grid(Species~Timepoint, scales="free_y")

bar_Asn

That makes this figure (attached)
bar_Asn_test1.pdf (12.5 KB)

So basically the problem is that even though the levels are in the correct order, when I go to make the figure it reverts back to alphabetical order (i.e. At, Lj, Mc, Ns, Ps). Ideally the figure would contain panels in the order I mentioned above (going from top to bottom): At, Ns, Mc, Lj, Ps

Thanks to everyone for your help so far!!

Erik