How to bring box plots in the right order without changing colors or variables?

Hey all,

R beginner, no solution found by own research.

My code for box plots gives me the right colors, tick labels, and key legend, but not in the right order. The control should be the first one at the left, followed by coccidia, S.T, coccidia + S.T.:

disprt <- vegan::betadisper(clr_dist_matrix, phyloseq::sample_data(ps_clr)$treatment)
disprt
dis <- data.frame(group=disprt$group, distances=disprt$distances)
disg <- ggplot(dis, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
disg + 
  labs(fill="Treatment")

Beta D Distance to centroid

I can use the variable group to plot it the right way (1= control, 2= coccidia, 3= S.T., 4= both), but now the colors, tick labels and the key legend is off:

disprt
dis <- data.frame(group=disprt$group, distances=disprt$distances)
disg <- ggplot(dis, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
disg + 
  labs(fill="Treatment")

sortiert nach group

My attempt to reorder the plots was a complete fail:

disprt <- vegan::betadisper(clr_dist_matrix, phyloseq::sample_data(ps_clr)$treatment)
disprt
disprt$group <- factor(disprt$group , levels=c("Control", "Coccidia", "S. Typhimurium", "Coccidia and S. Typhimurium"))
dis <- data.frame(group=disprt$group, distances=disprt$distances)
disg <- ggplot(dis, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
disg + 
  labs(fill="Treatment")

reorder fail

Could you please help me to bring my box plots in the right order with the right colors and tick labels?

Something like this should be close

mutate(disprt, group = forcats::fct_relevel("Control", "Coccidia", "S. Typhimurium"))

If it's not quite right see ?forcats::fct_relevel

An error message popped up:

library(forcats)
disprt <- vegan::betadisper(clr_dist_matrix, phyloseq::sample_data(ps_clr)$treatment)
disprt

disprt_m <- mutate(disprt, group= forcats::fct_relevel("Control", "Coccidia", "S. Typhimurium", "Coccidia and S. Typhimurium"))

** Error in UseMethod("mutate_") :
method not applicable for 'mutate_' onto object of class "betadisper" **

My R knowledge is not sufficient to find the reason why.
levels(disprt)# NULL

And my powers of ESP are not sufficient to guess at your data. If the help page at ?forcats::fct_relevel, doesn't help perhaps consider posting a reproducible example (instructions).

If I had such a reprex, I would probably have caught the typo, and did you read the help page at ?forcats::fct_relevel? If so maybe you could catch the typo? Not being facetious - I know that when you are learning R the help files are very difficult to get to grips with, but it is well worth learning to read them as early as you can.

The problem might be that the column was missing from fct_relevel. Try this instead,

mutate(disprt, group = forcats::fct_relevel(group, "Control", "Coccidia", "S. Typhimurium"))

You could also try the base equivalent which will look something like this (again, I'm guessing because I don't have an example of your data)

disprt$group <- factor(disprt$group, levels(disprt$group)[c(3,1,4,2)])

Posting a reprex is a pain (I know) but you will get the correct help much quicker if you provide one for future queries (or if this still doesn't work).

This is the reprex I got using dput(head(disprt, 6):

library(ggplot2)
dis <- data.frame(group=disprt$group, distances=disprt$distances)
#> Error in data.frame(group = disprt$group, distances = disprt$distances): Objekt 'disprt' nicht gefunden
library(ggplot2)
disg <- ggplot(dis, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
#> Error in ggplot(dis, aes(x = group, y = distances, fill = group)): Objekt 'dis' nicht gefunden
disg + 
  labs(fill="Treatment")
#> Error in eval(expr, envir, enclos): Objekt 'disg' nicht gefunden

Created on 2020-11-11 by the reprex package (v0.3.0)

Lacona, I don't know what that code is. It seems only barely related to your original question. Please remember I have none of your context or data.

If this is still the same issue, please read the help file indicated and also read the instructions at the link posted above (here it is again).

I am trying to help, but I really need you to help me help you!

Can you perhaps post the output from dput(head(name_of_your_data))

jmcvw,
I am so sorry! I desperately try to create a reprex at the moment and I posted the outcome, to see if it works or not.
Just give me a little bit more time to figure out, how that reprex works. I really try.
Thank you for your help so far, I greatly appreciate it.

> dput(head(dis))
structure(list(group = structure(c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    distances = c(57.3039396454089, 48.6048873728772, 59.7256598344438, 
    70.6678419043269, 52.1992713821929, 50.5899888666069)), row.names = c("Hauck-013-1A31", 
"Hauck-014-1B31", "Hauck-015-1C31", "Hauck-016-2A31", "Hauck-017-2B31", 
"Hauck-018-2C31"), class = "data.frame")
> ```

No worries Lacona, like I said, I know this is a painful part of learning R. I remember well my own struggles.

Before you post your reprex, try it yourself to see if it reproduces the problem. This currently does not because there are still no groups (all are na_integer_). Also note that head will only give 6 rows, which may not be enough to reproduce the problem. You can do head(data, 20) for example to give more rows

How do I test my reprex?

I seems, that no approach I try will generate a reprex with my data.

All your groups are NA. Can you produce data that has groups included?

Or can you create fake data that resembles your actual data enough to work with?

Or can you filter out the NA groups then try head(data, 20) again?

Keep trying, you'll get there.

You almost have it. See here, I added a couple of groups

df<-data.frame(stringsAsFactors=FALSE,
               row.names = c("Hauck-013-1A31","Hauck-014-1B31",
                             "Hauck-015-1C31",
                             "Hauck-016-2A31",
                             "Hauck-017-2B31",
                             "Hauck-018-2C31"),
               distances = c(57.3039396454089,48.6048873728772,
                             59.7256598344438,
                             70.6678419043269,
                             52.1992713821929,
                             50.5899888666069),
               group = as.factor(LETTERS[1:2])
)

ggplot(df, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
  labs(fill="Treatment")

image

Is this like a proper reprex looks like?

df<-data.frame(
                   row.names = c("Hauck-013-1A31","Hauck-014-1B31","Hauck-015-1C31",
                                 "Hauck-016-2A31","Hauck-017-2B31",
                                 "Hauck-018-2C31"),
                   distances = c(57.3039396454089,48.6048873728772,59.7256598344438,
                                 70.6678419043269,52.1992713821929,
                                 50.5899888666069),
                       group = as.factor(c("Control","Control",
                                           "Control","Coccidia ","Coccidia ",
                                           "Coccidia "))
                )
disg <- ggplot(df, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
#> Error in ggplot(df, aes(x = group, y = distances, fill = group)): konnte Funktion "ggplot" nicht finden
disg + 
  labs(fill="Treatment")
#> Error in eval(expr, envir, enclos): Objekt 'disg' nicht gefunden

Created on 2020-11-11 by the reprex package (v0.3.0)

Ok, well done, this is getting closer.

The following will switch the order of the factors no problem as your original question requested, but it does not reproduce any errors.

df %>% 
  mutate(group = forcats::fct_relevel(group, 'Control')) %>%
  ggplot(aes(x=group, y=distances, fill=group)) + 
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  labs(x = "", y = "Distance to centroid", fill="Treatment")

Also, I don't speak German(?) but does 'konnte Funktion "ggplot" nicht finden' that mean "can't find function ggplot"? If so are you sure you have attached the ggplot2 package with library(ggplot2)?

1 Like

YES!!! That looks like a reprex, does it? library(ggplot2) was the error

library(ggplot2)
df<-data.frame(
                   row.names = c("Hauck-013-1A31","Hauck-014-1B31","Hauck-015-1C31",
                                 "Hauck-016-2A31","Hauck-017-2B31",
                                 "Hauck-018-2C31"),
                   distances = c(57.3039396454089,48.6048873728772,59.7256598344438,
                                 70.6678419043269,52.1992713821929,
                                 50.5899888666069),
                       group = as.factor(c("Control","Control",
                                           "Control","Coccidia ","Coccidia ",
                                           "Coccidia "))
                )
disg <- ggplot(df, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
disg + 
  labs(fill="Treatment")

Created on 2020-11-11 by the reprex package (v0.3.0)

Ahhh :smile: that is so funny! But that wasn't the original issue, though right?

You had ggplot working but were struggling to arrange the boxes in your boxplot. Is that working now too?

Your code is not the solution, because it changes the colors of the control and coccidia group. I need a code where the colors remain the same (Control = cyan, coccidia = tomato)!
And your code gave me this:
Rplot

Ok, well that's easy

'+ scale_fill_manual(values = c("cyan", "tomato"))`

Ok. So, I have to add the colors manually, right? I did that before, but I found it hard to find out the exact default colors, so the colors are matching the rest of the plots.

I have control as first plot now. How do I order the other 3 plots?

Yes, if you want colours do deviate from the defaults (either colur values, or order), you must specify them manually.
The ggplot default colours for the first 4 groups are c("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF")
As far as I know, these are determined dynamically. See ?scale_fill_hue for details

So this will give cyan, red, green, purple
scale_fill_manual(values = c("#00BFC4", "#F8766D", "#7CAE00", "#C77CFF"))
As far as I know, these are determined dynamically. See ?scale_fill_hue for details.

See also scales::hue_pal()(4)

1 Like

Now the colors are nice and the control is the first plot and coccidia the second one.
But! I want to have S.Typhimurium as third one and both as last....
I am so sorry.
Rplot01

And that brings us back to here I think,

Where you specify the factor levels in the order you wish (the final level can be omitted, as it the other levels are merely "brought to the front" of the vector). Remember to use the correct dataframe name and variable name if either have changed since then.