R ggplot2 Reorder stacked plot ?

Hi,

I want to order my variable depending on the frequency of the swelling 1. It's mean that x axis has to be ordered like: Genotype 2, Genotype 3, Genotype 1

The pictures show you what I have and what I want.

Thank you for your help

Here you can find my R data:

df <- data.frame("Swelling" = c("Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2", "Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2","Swelling 3","Swelling 1", "Swelling 2" ), 
                "Genotype" = c("Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3"), 
                "Freq" = c("1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1"))
df
### want to know frequency (%) of each Swelling depending on genotype -> do not show when frequency is 0
df2<-ddply(df,.(Swelling), function(x) with(x, data.frame(100*round(table(Genotype)/length(Genotype),2))))
#
ggplot(df2, aes(x = Genotype, y = Freq, fill = Swelling)) + 
  geom_bar(position = "fill",stat = "identity") + 
  scale_y_continuous(labels = percent_format())  # to have percentage on y axis

1 Like

I'm sure it must be a more elegant solution but in the mean while this works

df <- data.frame("Swelling" = c("Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2", "Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2","Swelling 3","Swelling 1", "Swelling 2" ), 
                 "Genotype" = c("Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3"), 
                 "Freq" = as.numeric(c("1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1"))
)
library(dplyr)
library(ggplot2)

df2 <- df %>% 
    group_by(Swelling, Genotype) %>% 
    summarise(Freq = round(sum(Freq)/sum(df$Freq) * 100, 2)) %>% 
    ungroup()

df2 %>% 
    left_join(df2 %>%
                  filter(Swelling == 'Swelling 1') %>%
                  select(-Swelling, Freq_1 = Freq),
              by = 'Genotype') %>% 
    ggplot(aes(x = reorder(Genotype, desc(Freq_1)), y = Freq, fill = Swelling)) + 
    geom_bar(position = "fill",stat = "identity") + 
    scale_y_continuous(labels = scales::percent_format()) +
    labs(x = 'Genotype')

Created on 2019-02-14 by the reprex package (v0.2.1)

2 Likes

Any time you have a factor you can reorder the items using factor(levels= ):

myfactor <- c("them", "items", "you", "want")
myfactor <- factor(myfactor)
myfactor <- factor(myfactor, levels=c("items", "in", "the", "order", "you", "want", "them"))
1 Like

factor(levels= ) works for manually setting the order, but the question stands for ordering based on a given group of another variable.

You can generate the ordering on the fly and the Freq column isn't necessary. In the code below, we group by Genotype in order to count the frequency of "Swelling 1" within each Genotype. They we use those frequencies to set the factor order of Genotype.

library(tidyverse)
library(scales)

df %>% 
  # Get frequency of "Swelling 1" within each level of Genotype
  group_by(Genotype) %>% 
  mutate(freq.s1 = sum(Swelling=="Swelling 1")) %>% 
  ungroup %>% 
  # Order by frequency of "Swelling 1"
  arrange(desc(freq.s1)) %>% 
  # Set Genotype factor order based on the sorting we just created
  mutate(Genotype = factor(Genotype, levels=unique(Genotype))) %>% 
  # Get percents for each bar segment
  group_by(Genotype, Swelling) %>% 
  tally %>% 
  mutate(pct=n/sum(n)) %>% 
  ggplot(aes(x = Genotype, y = pct, fill = Swelling)) + 
    geom_col() + 
    scale_y_continuous(labels = percent)

Rplot07

4 Likes

Yes, so you need to do it programmatically. Insert this before the ggplot.

df3 <- df2 %>% 
	filter(Swelling=="Swelling 1") %>% 
	arrange(desc(Freq))
df2$Genotype <- factor(df2$Genotype, levels=df3$Genotype)
2 Likes

Hi Joels,

Thank you for your answer. Its works very great !

Do you have a solution to have bars arranged with the Swelling 1 at the bottom and the swelling 3 on the top ?

Thanks

Alan

Hi Alan,

same trick as Joels used before by reordering the Genotype levels. So just us the reordering-trick to reorder your Swelling levels. Both variables are of type factor. :wink:

So, here is a modification or my extension of Joels nice solution of your request:

library(tidyverse)
library(scales)

df <- data.frame("Swelling" = c("Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2", "Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2","Swelling 3","Swelling 1", "Swelling 2" ), 
                 "Genotype" = c("Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3"), 
                 "Freq" = as.numeric(c("1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1")))

df %>% 
  # Get frequency of "Swelling 1" within each level of Genotype
  group_by(Genotype) %>% 
  mutate(freq.s1 = sum(Swelling=="Swelling 1")) %>% 
  ungroup %>% 
  # Order by frequency of "Swelling 1"
  arrange(desc(freq.s1)) %>% 
  # Set Genotype factor order based on the sorting we just created
  mutate(Genotype = factor(Genotype, levels=unique(Genotype))) %>% 
  ##### set Swelling factor levels to decreasing order ##### 
  mutate(Swelling = factor(Swelling, levels=sort(levels(Swelling), decreasing=TRUE))) %>% 
  # Get percents for each bar segment
  group_by(Genotype, Swelling) %>% 
  tally %>% 
  mutate(pct=n/sum(n)) %>% 
  ggplot(aes(x = Genotype, y = pct, fill = Swelling)) + 
  geom_col() + 
  scale_y_continuous(labels = percent)

Hi Adam,

Thank your for your help.

However with your alternative, I get this alarm : “Error: Must request at least one colour from a hue palette.”

Additionally, I tried the code of Joels on my data (not the example I show before), but it seems that it doesn't work. The bar are not ordered nicely with the frequency of the swelling 1 for each genotype.

Below you will find my data (unfortunately I cannot send the csv file)

. I will be very happy to see what I'm doing wrong.

Thank your for the help.

my_legend_title <- "Swelling"
df %>% 
 # Get frequency of "Swelling 1" within each level of Genotype
 group_by(Genotype) %>% 
 mutate(freq.s1 = sum(SwellingComb=="Swelling 1")) %>% 
 ungroup %>% 
 # Order by frequency of "Swelling 1"
 arrange(desc(freq.s1)) %>% 
 # Set Genotype factor order based on the sorting we just created
 mutate(Genotype = factor(Genotype, levels=unique(Genotype))) %>% 
 # Get percents for each bar segment
 group_by(Genotype, SwellingComb) %>% 
 tally %>% 
 mutate(pct=n/sum(n)) %>% 
 ggplot(aes(x = Genotype, y = pct, fill = SwellingComb)) + 
 geom_col() + 
 scale_y_continuous(name= "Percentage of plants showing different swelling symptoms",labels = percent)+
 scale_fill_brewer(palette="RdPu", name= my_legend_title, 
                   labels=c("No Swelling", "Slight swelling (L-axil)", "Strong swelling (L-axil)", "Swelling (hypocotyl)", "Crackings (L-axil)") ) + 
 theme(axis.text.x = element_text(angle = 90)) +
 theme(legend.title = element_text(size=16, face="bold"))

We could write a function to apply on the geom call to sort this type of issue as it's quite common and the way out is verbose and awkward. I took a shot at it below, it should work with all geom_*(), stat_*(), and ggplot() calls.

One cool thing about it is that for once the different operator precedence of + and %>% works for us and not against.

setup

df <- data.frame("Swelling" = c("Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2", "Swelling 1","Swelling 2", "Swelling 3","Swelling 3", "Swelling 2","Swelling 1","Swelling 3","Swelling 1","Swelling 1", "Swelling 3","Swelling 1", "Swelling 2","Swelling 3","Swelling 1", "Swelling 2" ), 
                 "Genotype" = c("Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3", "Genotype 1","Genotype 2", "Genotype 3","Genotype 1","Genotype 2", "Genotype 3"), 
                 "Freq" = c("1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1","1", "1","1","1","1","1", "1","1", "1","1","1", "1"))
library(plyr)
df2<-plyr::ddply(df,.(Swelling), function(x) with(x, data.frame(100*round(table(Genotype)/length(Genotype),2))))

function

arrange_gg <- function(gg, var, by, focus = TRUE){
  fun <- 
    eval(eval(substitute(quote(function(x) {
      lvls <- x %>% 
        dplyr::filter(focus) %>% 
        dplyr::arrange(by) %>% 
        dplyr::pull(var) %>% 
        as.character() %>%
        unique()
      x <- dplyr::mutate_at(x, dplyr::vars(var), ~factor(., union(lvls, levels(.))))
    })))[-4])
  
  data <- gg$data
  if(is.function(data)){
    gg$data <- purrr::compose(fun,data)
  } else if(ggplot2:::is.waive(data)){
    gg$data <- fun
  } else if(is.data.frame(data)) {
    gg$data <- fun(data)
  } else {
    stop("unexpected class")
  }
  gg
}

Former request

library(ggplot2)
ggplot(df2, aes(x = Genotype, y = Freq, fill = Swelling)) + 
  geom_bar(position = "fill",stat = "identity") %>% 
  arrange_gg(Genotype, desc(Freq), Swelling=="Swelling 1") %>% 
  arrange_gg(Swelling, desc(Freq)) +
  scale_y_continuous(labels = scales::percent)

Additional request

Your additional request to get Swelling 1 at the bottom can be done in different ways yielding the same result here but not in the general case.

We can sort by the condition Swelling !="Swelling 1" . FALSE is put before TRUE when sorting, and first levels are on top. so it will not reorder all values, just put Swelling 1 at the bottom.

ggplot(df2, aes(x = Genotype, y = Freq, fill = Swelling)) + 
  geom_bar(position = "fill",stat = "identity") %>% 
  arrange_gg(Genotype, desc(Freq), Swelling =="Swelling 1") %>% 
  arrange_gg(Swelling, desc(Freq), Swelling !="Swelling 1") +
  scale_y_continuous(labels = scales::percent)

This will revert the existing order as defined alphabetically IF by is a character col, and the factor order otherwise. No need for a focus argument here.

ggplot(df2, aes(x = Genotype, y = Freq, fill = Swelling)) + 
  geom_bar(position = "fill",stat = "identity") %>% 
  arrange_gg(Genotype, desc(Freq), Swelling=="Swelling 1") %>% 
  arrange_gg(Swelling, desc(Swelling)) +
  scale_y_continuous(labels = scales::percent)

To be sure to reverse the alphabetical order, add as.character

ggplot(df2, aes(x = Genotype, y = Freq, fill = Swelling)) + 
  geom_bar(position = "fill",stat = "identity") %>% 
  arrange_gg(Genotype, desc(Freq), Swelling=="Swelling 1") %>% 
  arrange_gg(Swelling, desc(as.character(Swelling))) +
  scale_y_continuous(labels = scales::percent)

Apply on ggplot call

You can also apply it on the ggplot object to make these changes permanent
through the chain.

ggplot(df2, aes(x = Genotype, y = Freq, fill = Swelling))  %>% 
  arrange_gg(Genotype, desc(Freq), Swelling =="Swelling 1") %>% 
  arrange_gg(Swelling, desc(Freq), Swelling !="Swelling 1") + 
  geom_bar(position = "fill",stat = "identity") +
  scale_y_continuous(labels = scales::percent)
1 Like

A quick way to do this is to just reverse the natural alphabetic order of the levels of Swelling using fct_rev from the forcats package (forcats is included in tidyverse so its already loaded):

df %>% 
  mutate(Swelling = fct_rev(Swelling)) %>% 
  # without forcats: mutate(Swelling = factor(Swelling, levels=rev(sort(unique(Swelling))))) %>%  
  group_by(Genotype) %>% 
  mutate(freq.s1 = sum(Swelling=="Swelling 1")) %>% 
  ungroup %>% 
  arrange(desc(freq.s1)) %>% 
  mutate(Genotype = factor(Genotype, levels=unique(Genotype))) %>%
  group_by(Genotype, Swelling) %>% 
  tally %>% 
  mutate(pct=n/sum(n)) %>% 
  ggplot(aes(x = Genotype, y = pct, fill = Swelling)) + 
    geom_col() + 
    scale_y_continuous(labels = percent) 

Rplot09

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.