How to get R to maintain the order of variables on the X axis?

Hi everyone. I've been trying to learn how to use ggplot2 to make a presentable dotplot, and for the most part it seems doable. However, one major issue so far is that R seems to automatically rearrange the variables on the X axis in alphabetical and/or numerical order.

So what I've tried so far is setting the scale to discrete(?):
scale_x_discrete(limits = c("PC50", "P50", "PC100", "P100", "OC50", "O50", "OC100", "O100", "POC50", "PO50", "POC100", "PO100", "Autofluor"))
(This is also the vertical order in which these categorical variables appear in my data file)

But the addition of this line doesn't seem to change anything. Am I missing a piece of code that tells R to use this particular order? (Whole code is below)

library(ggplot2)
library(readxl)

#Read data file
AT1910 <- read_excel("F:/Data/AT19-10/Final Analysis/AT1910.xlsx")
scale_x_discrete(limits = c("PC50", "P50", "PC100", "P100", "OC50", "O50", "OC100", "O100", "POC50", "PO50", "POC100", "PO100", "Autofluor"))

#Cd11c Data Live gate
dataLiveCd11c <- ggplot(AT1910, aes(x = Condition, y = `live Cd11c MFI Mean`)) + 
  geom_dotplot(binaxis = 'y', stackdir = 'center',
                stackratio=1.0, dotsize=0.8)
dataLiveCd11c

#Mhc2 Data Live gate
dataLiveMhc2 <- ggplot(AT1910, aes(x = Condition, y = `live Mhc2 MFI Mean`)) + 
  geom_dotplot(binaxis = 'y', stackdir = 'center',
                stackratio=1.0, dotsize=0.8)
dataLiveMhc2

ggplot2 coerces discrete axes to factors where the default order is alphabetical and/or numerical as you observed.

To specify a different order you could either do this by specifying your x axis column as a factor with a specified ordering or you could use the reorder() function depending on your scenario.

Thanks for answering. :slightly_smiling_face: Which of those two options would be preferable if I intend to reuse the script for multiple data sets?

EDIT: I tried out specifying the order, but it gives me this error message:

Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : 
  factor level [3] is duplicated

I'm guessing this is because my data set contains triplicates, which are detected as duplicates by R... is that's what's going on? How do I fix it if so?

It depends on what your ordering requirements are.

Here are some functions which provide various options:
https://forcats.tidyverse.org/reference/index.html

1 Like

Oh, that link is very helpful. Thanks! :grinning:

EDIT: I tried the fct_inorder(f) function, but it says:

Error in fct_inorder(f) : could not find function "fct_inorder"

Is it not part of the standard packages in RStudio?

You need to load the package (and possibly install it first): library(forcats)

Ah the forcats library, I see! Thanks.

Hmm... I installed the package and now the code seems to be working, but it is still reordering the variables on the X axis in alphabetical order. :frowning: Am I missing something really simple?

library(ggplot2)
library(readxl)
library(forcats)

#Read data file
AT1910 <- read_excel("F:/Data/AT19-10/Final Analysis/AT1910.xlsx")
f <- factor(c("PC50", "PC50", "PC50", "P50", "P50", "P50", 
              "PC100", "PC100", "PC100", "P100", "P100", "P100", 
              "OC50",  "OC50",  "OC50", "O50", "O50", "O50", 
              "OC100", "OC100", "OC100", "O100", "O100", "O100", 
              "POC50", "POC50", "POC50", "PO50", "PO50", "PO50",
              "POC100", "POC100", "POC100", "PO100", "PO100", "PO100",
              "Autofluor"))
fct_inorder(f)
#> [1] PC50 PC50 PC50 P50 P50 P50 PC100 PC100 PC100 P100 P100 P100 OC50 OC50 OC50 O50 O50 O50 OC100 OC100 OC100 O100 O100 O100 POC50 POC50 POC50 PO50 PO50 PO50 POC100 POC100 POC100 PO100 PO100 PO100 Autofluor
#> Levels: PC50 P50 PC100 P100 OC50 O50 OC100 O100 POC50 PO50 POC100 PO100 Autofluor

#Cd11c Data Live gate
dataLiveCd11c <- ggplot(AT1910, aes(x = Condition, y = `live Cd11c MFI Mean`)) + 
  geom_dotplot(binaxis = 'y', stackdir = 'center',
                stackratio=1.0, dotsize=0.8)
dataLiveCd11c

#Mhc2 Data Live gate
dataLiveMhc2 <- ggplot(AT1910, aes(x = Condition, y = `live Mhc2 MFI Mean`)) + 
  geom_dotplot(binaxis = 'y', stackdir = 'center',
                stackratio=1.0, dotsize=0.8)
dataLiveMhc2

EDIT: wait... maybe I should set the x variable in the ggplot function to the factor?
EDIT: Ah, nope. That only changes the x axis name it seems...

This should hopefully fix it, but it's impossible to know without your data:

dataLiveCd11c <- ggplot(AT1910, aes(x = fct_inorder(Condition), y = `live Cd11c MFI Mean`)) + 
  geom_dotplot(binaxis = 'y', stackdir = 'center',
                stackratio=1.0, dotsize=0.8)
1 Like

Oh that worked, thank you so much!
But now the X axis is labeled fct_inorder(Condition) as well. XD How do I fix that?

The easiest way is via the ggplot2 labs() function, which you can use for all titles and axis labels:

... + 
labs(x = "Condition")
1 Like

Aaah I've seen that before, now that I think about it! So labs is short for labels :o

EDIT: that worked perfectly! All that's left is fixing the aesthetics now. Thanks for your help and patience!

Using limits in scale_x_discrete() to set the axis order is a standard approach (and setting factor levels is another). However, when using the scale_*() position functions you will need to actually add them as a layer to the plot. Otherwise, as you saw, they won't change anything. :slightly_smiling_face:

Your code for one plot would then look something like:

ggplot(AT1910, aes(x = Condition, y = `live Cd11c MFI Mean`)) + 
  geom_dotplot(binaxis = 'y', stackdir = 'center',
                stackratio=1.0, dotsize=0.8) +
  scale_x_discrete(limits = c("PC50", "P50", "PC100", "P100", "OC50", 
                               "O50", "OC100", "O100", "POC50", "PO50", 
                               "POC100", "PO100", "Autofluor"))
1 Like

Ah I see, so it is an aesthetic layer. I heard about those in the DataCamp course, but of course the part in which they are actually covered wasn't freely available haha.
I will make a note of that!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.