Can I sort the x-axis of a ggplot to appear as it does in the raw data file?

I am making a number of figures to illustrate trends in diversity data across seven different sites that I surveyed. I used one landscape-level characteristic (the amount of non-agricultural land in a 2-km radius) as a "master variable" to determine the order that I'd like to compare all of the sites against each other. Now I want to create a number of plots that reflect different diversity metrics of these seven sites, but have the x-axis in the same order as the first plot for easy comparison.

So one diversity plot might be:

#Creating ISI figure
nISI_summary<-anovadata %>%
  group_by(Site) %>%
  summarise(mean_ISI= mean(nISI),
            sd_ISI= sd(nISI),
            n_ISI= n(),
            se_ISI= sd(nISI)/sqrt(n()))

nISIPlot<-ggplot(nISI_summary, aes(Site, mean_ISI))+
  geom_col(color= "gray20", fill="cornsilk3")+
  geom_errorbar(aes(ymin=mean_ISI-se_ISI, ymax=mean_ISI+se_ISI), width=0.2)+
  geom_text(label=c("d","ab","abcd","a","bcd","cd","abc"), aes(y=mean_ISI+se_ISI, x=Site), 
            vjust=-0.5, size=5)+
  ylim(0, 3.5)

nISIfigure<-nISIPlot + labs(y="Mean ISI ± SE", x="Site")

nISIfigure + theme(axis.title.x = element_text(size=14, face = "bold", vjust = -0.5),
                 axis.title.y = element_text(size=14, face = "bold", vjust = 0.5),
                 axis.text.x = element_text(size=12),
                 axis.text.y = element_text(size=12))

Where "nISI" is the diversity metric I am testing. My data are sorted in the source file by site, according to the order I determined with my first figure, but when I create this plot in R, the x-axis gets sorted automatically in alphabetical order.
Is there a way to custom-sort the x-axis to match the custom order I want? I know there are "factors" in ggplot that can be used to apply sorting rules to the axes but due to the transformed nature of the factors I am plotting (i.e. I'm plotting a summary of statistics of the variables in the data, not the variables themselves), I don't know if I can apply a factor without screwing up the statistical calculations within the code. Any help would be greatly appreciated!

Setting an order to the levels of your Site variable will not cause any problems with the later calculations. You can use the factor() function and manually set the levels argument and ordered = TRUE or you can use the reorder() function with x = Site and X = nISI. Take a look at the help information on either function and don't hesitate to ask if you get stuck.

Hi, thank you for your input. So, I would create a factor () separate from the aes that I'm plotting? In that case where would I insert the sorting factor into the code? I'm still learning to navigate ggplot...

You would redefine the Site column in your data frame to be an ordered factor, setting the order according to the mean of nISI (if I understand correctly what you want your order to be). Here is an example.

library(dplyr)
library(ggplot2)
#Standard alphabetical ordering
DF <- data.frame(Site = rep(c("A", "B", "C"), each = 10), 
                 Value = c(rnorm(10, 10,2), rnorm(10, 1,1), rnorm(10, 5,0.5)))
str(DF)
#> 'data.frame':    30 obs. of  2 variables:
#>  $ Site : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ Value: num  9.72 8.97 10.1 8.74 11.5 ...
Site_summary <- DF %>%
  group_by(Site) %>%
  summarise(mean_Site= mean(Value),
            sd_Site = sd(Value),
            n_Site = n(),
            se_Site = sd(Value)/sqrt(n()))
Site_summary
#> # A tibble: 3 x 5
#>   Site  mean_Site sd_Site n_Site se_Site
#>   <fct>     <dbl>   <dbl>  <int>   <dbl>
#> 1 A         10.2    1.32      10   0.417
#> 2 B          1.20   0.858     10   0.271
#> 3 C          5.14   0.371     10   0.117
ggplot(Site_summary, aes(Site, mean_Site)) + geom_col()


#Now redo it but first order Site according to the mean of Value
DF <- DF %>% mutate(Site = reorder(Site, Value, FUN = mean))
str(DF)
#> 'data.frame':    30 obs. of  2 variables:
#>  $ Site : Factor w/ 3 levels "B","C","A": 3 3 3 3 3 3 3 3 3 3 ...
#>   ..- attr(*, "scores")= num [1:3(1d)] 10.2 1.2 5.14
#>   .. ..- attr(*, "dimnames")=List of 1
#>   .. .. ..$ : chr  "A" "B" "C"
#>  $ Value: num  9.72 8.97 10.1 8.74 11.5 ...
Site_summary <- DF %>%
  group_by(Site) %>%
  summarise(mean_Site= mean(Value),
            sd_Site = sd(Value),
            n_Site = n(),
            se_Site = sd(Value)/sqrt(n()))
Site_summary
#> # A tibble: 3 x 5
#>   Site  mean_Site sd_Site n_Site se_Site
#>   <fct>     <dbl>   <dbl>  <int>   <dbl>
#> 1 B          1.20   0.858     10   0.271
#> 2 C          5.14   0.371     10   0.117
#> 3 A         10.2    1.32      10   0.417
ggplot(Site_summary, aes(Site, mean_Site)) + geom_col()

ggplot(Site_summary, aes(Site, sd_Site)) + geom_col()

Created on 2019-08-30 by the reprex package (v0.2.1)

1 Like