No worries Lacona, like I said, I know this is a painful part of learning R. I remember well my own struggles.

Before you post your reprex, try it yourself to see if it reproduces the problem. This currently does not because there are still no groups (all are na_integer_). Also note that head will only give 6 rows, which may not be enough to reproduce the problem. You can do head(data, 20) for example to give more rows

How do I test my reprex?

I seems, that no approach I try will generate a reprex with my data.

All your groups are NA. Can you produce data that has groups included?

Or can you create fake data that resembles your actual data enough to work with?

Or can you filter out the NA groups then try head(data, 20) again?

Keep trying, you'll get there.

You almost have it. See here, I added a couple of groups

df<-data.frame(stringsAsFactors=FALSE,
               row.names = c("Hauck-013-1A31","Hauck-014-1B31",
                             "Hauck-015-1C31",
                             "Hauck-016-2A31",
                             "Hauck-017-2B31",
                             "Hauck-018-2C31"),
               distances = c(57.3039396454089,48.6048873728772,
                             59.7256598344438,
                             70.6678419043269,
                             52.1992713821929,
                             50.5899888666069),
               group = as.factor(LETTERS[1:2])
)

ggplot(df, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
  labs(fill="Treatment")

image

Is this like a proper reprex looks like?

df<-data.frame(
                   row.names = c("Hauck-013-1A31","Hauck-014-1B31","Hauck-015-1C31",
                                 "Hauck-016-2A31","Hauck-017-2B31",
                                 "Hauck-018-2C31"),
                   distances = c(57.3039396454089,48.6048873728772,59.7256598344438,
                                 70.6678419043269,52.1992713821929,
                                 50.5899888666069),
                       group = as.factor(c("Control","Control",
                                           "Control","Coccidia ","Coccidia ",
                                           "Coccidia "))
                )
disg <- ggplot(df, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
#> Error in ggplot(df, aes(x = group, y = distances, fill = group)): konnte Funktion "ggplot" nicht finden
disg + 
  labs(fill="Treatment")
#> Error in eval(expr, envir, enclos): Objekt 'disg' nicht gefunden

Created on 2020-11-11 by the reprex package (v0.3.0)

Ok, well done, this is getting closer.

The following will switch the order of the factors no problem as your original question requested, but it does not reproduce any errors.

df %>% 
  mutate(group = forcats::fct_relevel(group, 'Control')) %>%
  ggplot(aes(x=group, y=distances, fill=group)) + 
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  labs(x = "", y = "Distance to centroid", fill="Treatment")

Also, I don't speak German(?) but does 'konnte Funktion "ggplot" nicht finden' that mean "can't find function ggplot"? If so are you sure you have attached the ggplot2 package with library(ggplot2)?

1 Like

YES!!! That looks like a reprex, does it? library(ggplot2) was the error

library(ggplot2)
df<-data.frame(
                   row.names = c("Hauck-013-1A31","Hauck-014-1B31","Hauck-015-1C31",
                                 "Hauck-016-2A31","Hauck-017-2B31",
                                 "Hauck-018-2C31"),
                   distances = c(57.3039396454089,48.6048873728772,59.7256598344438,
                                 70.6678419043269,52.1992713821929,
                                 50.5899888666069),
                       group = as.factor(c("Control","Control",
                                           "Control","Coccidia ","Coccidia ",
                                           "Coccidia "))
                )
disg <- ggplot(df, aes(x=group, y=distances, fill=group)) + geom_boxplot() +
  xlab("") +
  ylab("Distance to centroid") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
disg + 
  labs(fill="Treatment")

Created on 2020-11-11 by the reprex package (v0.3.0)

Ahhh :smile: that is so funny! But that wasn't the original issue, though right?

You had ggplot working but were struggling to arrange the boxes in your boxplot. Is that working now too?

Your code is not the solution, because it changes the colors of the control and coccidia group. I need a code where the colors remain the same (Control = cyan, coccidia = tomato)!
And your code gave me this:
Rplot

Ok, well that's easy

'+ scale_fill_manual(values = c("cyan", "tomato"))`

Ok. So, I have to add the colors manually, right? I did that before, but I found it hard to find out the exact default colors, so the colors are matching the rest of the plots.

I have control as first plot now. How do I order the other 3 plots?

Yes, if you want colours do deviate from the defaults (either colur values, or order), you must specify them manually.
The ggplot default colours for the first 4 groups are c("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF")
As far as I know, these are determined dynamically. See ?scale_fill_hue for details

So this will give cyan, red, green, purple
scale_fill_manual(values = c("#00BFC4", "#F8766D", "#7CAE00", "#C77CFF"))
As far as I know, these are determined dynamically. See ?scale_fill_hue for details.

See also scales::hue_pal()(4)

1 Like

Now the colors are nice and the control is the first plot and coccidia the second one.
But! I want to have S.Typhimurium as third one and both as last....
I am so sorry.
Rplot01

And that brings us back to here I think,

Where you specify the factor levels in the order you wish (the final level can be omitted, as it the other levels are merely "brought to the front" of the vector). Remember to use the correct dataframe name and variable name if either have changed since then.

Thank you for your patience! Now the colors are off and the order.

disg <-dis %>% 
  mutate(group = forcats::fct_relevel(group, "Control", "Coccidia", "S. Typhimurium")) %>%
  ggplot(aes(x=group, y=distances, fill=group)) + 
  geom_boxplot() +
  scale_fill_manual(values = c("#00BFC4", "#F8766D","#C77CFF","#7CAE00")) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  labs(x = "", y = "Distance to centroid", fill="Treatment")

**Error message:
1: Problem with mutate() input group.
:information_source: Unknown levels in f: Coccidia
:information_source: Input group is forcats::fct_relevel(group, "Control", "Coccidia", "S. Typhimurium").
2: Unknown levels in f: Coccidia **
Rplot02

Ah, you don't have 'Coccidia' in your data, you have 'Coccidia ' - note the space.
You should be able to fix it with

df %>% 
  mutate( group = forcats::fct_relevel(trimws(group), "Control", "Coccidia", "S. Typhimurium"))

Thank you so so so much for your help!
It was a long and exhausting journey, but I learned a lot (my first reprex!) and at the end I could even find the space you mentioned.
Wow. I am very very grateful that you spent all the time to guide me through.
Now, look at that:
Rplot03

Here I managed to sort the box-plots in the right order.
However, it is not obvious to me, where I shall change the colors.
scale_colour_manual(Treatment=c("#00BFC4", "#F8766D", "#7CAE00", "#C77CFF")) + didn't work and scale_fill_manual is not supported with gather.

adiv16 <- data.frame(
  "Observed" = phyloseq::estimate_richness(gps16, measures = "Observed"),
  "Shannon" = phyloseq::estimate_richness(gps16, measures = "Shannon"),
  "PD" = picante::pd(samp = data.frame(t(data.frame(phyloseq::otu_table(gps16)))), tree = phyloseq::phy_tree(gps16)) [ ,1],
  "Treatment" = phyloseq::sample_data(gps16)$treatment)
adiv_plot16 <- adiv16 %>%
  mutate(Treatment = forcats::fct_relevel(Treatment, "Control", "Coccidia", "S. Typhimurium")) %>%
  gather(key = metric, value = value, c("Observed", "Shannon", "PD")) %>%
  mutate(metric = factor(metric, levels = c("Observed", "Shannon", "PD"))) %>%
  ggplot(aes(group=Treatment, x = Treatment, y = value)) +
  geom_boxplot(outlier.color = NA) +
  geom_jitter(aes(color = Treatment), height = 0, width = .2) +
  labs(x = "", y = "") +
  facet_wrap(~ metric, scales = "free") +
  theme(legend.position="bottom") +
  theme(axis.text.x = element_blank(), axis.ticks.x=element_blank())
adiv_plot16

Rplot04

Hi Lacona,

Well done on your first reprex! It was perfect for that question. But suppose the space was on 'S. Typhimutium', then I would not have been able to spot the issue, and would have had to ask for another reprex!

And, unfortunately, that's the issue with your new query. Again, I don't have adiv16 or gps16 so I can't tell what the data actually looks like I'm afraid.

And I think that it is probably important here because as far as I know there should be no issue with scale_fill_manual and gather. I suspect adiv16 is a somewhat nested dataframe? So to use ggplot2 you would have to flatten it out somehow - see the unnest functions in the tidyr package.
You should either supply a new dataset that has the same structure with fake data, or else use dput again. I reckon I could probably find the issue then.

However, even better in this instance, since I know nothing of bioconductor, would be to create your reprex and post it as a separate question, where someone who does know bioconductor could help.

2 Likes

Solution was to generate a color object first:
scale_col<-scale_color_manual(values=c('Control'="#00BFC4", 'Coccidia'="#F8766D",'S. Typhimurium'="#C77CFF",'Coccidia and S. T.'="#7CAE00"))
and then add scale_col +into my code.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.