back to back barplot

Need some help with a plot. The one below is not exactly what I'm looking for, but its the closest I could get with what I know...

Instead of y = diff I'd like to plot res1 and res2 back-to-back, centeret at 0, and then arrange by diff(or possibly cat).

Any suggestions would be greatly appreciated! Thank you!

library(ggplot2)
library(tibble)
set.seed(1)
dframe <- tibble(id = 1:20, 
                 res1 = sample(0:5, 20, replace = TRUE), 
                 res2 = sample(0:5, 20, replace = TRUE), 
                 diff = res1-res2,
                 cat = sample(c("a", "b", "c", "d"), 20, replace = TRUE))

ggplot(dframe, aes(x = id, y = diff, fill = cat)) +
  geom_bar(stat = "identity", position = "identity") +
  coord_flip() +
  scale_x_discrete(limits = 1:20) +
  scale_y_discrete(limits = -4:5)

Created on 2018-10-26 by the reprex package (v0.2.1)

Hello @bragks!

I have code below that does what I think you're asking for. Let me know if I misunderstood your post. Allow me to explain the changes that I've made.

Reordering factor based on continuous variable

  • Instead of relying on scale_x_discrete(limits = 1:20) to force the X-axis to be discrete, I'm coercing the id column to a factor (using mutate() from the dplyr package).
  • The secret sauce for reordering factors based on other values (e.g. other columns in a data frame) is the forcats tidyverse package.
  • For the first plot, I reorder based on the diff column using fct_reorder(id, diff).
  • Because we are plotting the values in the res1 and res2 columns along the same axis (i.e. in ggplot2 terms, using the same aesthetic), we need to store the values from those two columns in the same column. You can achieve this with gather() from the tidyr package (akin to melt() from reshape2, in case you're more familiar with that name).
  • I have added a vertical line to accentuate the X-axis using geom_hline(yintercept = 0).
  • Bonus: You can replace geom_bar(stat = "identity", position = "identity") with geom_col().

Extra credit

Note that here, there is a one-to-one mapping between id and diff. In many cases, there would be multiple values for diff for each value of id. The third argument for fct_reorder() is a function that aggregates the multiple values in the second argument for each value in the first argument. Be default, this function is median(), which is why your single diff value per id remains unchanged.

Example

library(ggplot2)
library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.3
library(forcats)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

set.seed(1)
dframe <- tibble(id = 1:20, 
                 res1 = sample(0:5, 20, replace = TRUE), 
                 res2 = sample(0:5, 20, replace = TRUE), 
                 diff = res1-res2,
                 cat = sample(c("a", "b", "c", "d"), 20, replace = TRUE))

dframe <- mutate(dframe, 
                 id = as.factor(id),
                 id = fct_reorder(id, diff),
                 res2 = -res2)

dframe <- gather(dframe, result_name, value, res1, res2)

ggplot(dframe, aes(x = id, y = value, fill = cat)) +
  geom_col() +
  geom_hline(yintercept = 0) +
  coord_flip() +
  scale_y_discrete(limits = -4:5)

Reordering factor based on discrete variable

This one is a bit trickier, but it still uses fct_reorder().

  • I convert the cat column to a factor so that each value is encoded by an integer behind the scenes. Try running as.integer() on a factor to see what I mean.
  • I use fct_reorder() in the same way I did above, but I use the integer values stored in the factor to reorder the id column using the as.integer() function.

Extra credit

If you want a different order for the cat column, you can use the fct_relevel() function (before running fct_reorder()) from the forcats package to specify the order you want.

Example


dframe <- mutate(dframe, 
                 cat = as.factor(cat),
                 id = fct_reorder(id, as.integer(cat)))

ggplot(dframe, aes(x = id, y = value, fill = cat)) +
  geom_col() +
  geom_hline(yintercept = 0) +
  coord_flip() +
  scale_y_discrete(limits = -4:5)

Created on 2018-10-26 by the reprex package (v0.2.1)

4 Likes

Wow, thank you, excellent answer!

This does exactly what I'm looking for. I really appreciate you taking to time to explain the details as well!

Hmm, this does introduce another problem. How do I label the y (value) axis for both sides of the hline when the values are from one variable?

Hello @bragks!

I'm glad you found my answer useful!

Labelling each side using manual annotations

For labelling each side of the X-axis, my first thought was to use manual annotations with the annotate() function from ggplot2. However, I found this finicky because you have to add extra space at the top of your plot in order to fit the extra text labels. It's certainly possible, but I thought that the second solution I came up with was more elegant.

Labelling each side using facets

I realized that we have the values res1 and res2 in a column called result_name thanks to our gather() operation. This means that we can leverage facets in ggplot2. Here's a summary of the changes I made to adapt the code posted above:

  • You facet the plot using the result_name column we created in our gather operation using facet_grid(~ result_name, scales = "free_x").
    • The scales = "free_x" argument ensures that the X-axis scales of each facet are not forced to be the same across the facets. In other words, it allows each facet to "fit" the data it contains.
    • Because facets are shown alphabetically by default, I decided to convert res1 to a negative scale instead of res2. Alternatively, you could convert the result_name column to a factor and use fct_relevel() to set res2 as the first level.
  • In order to "collapse" the facets together (i.e. eliminate the gap between the two vertical bars), we need to do two things:
    • We eliminate the gap between facets using theme(panel.spacing.x = unit(0, "pt")).
    • We eliminate the gap between the bars from the barplot and the edge of the plot using scale_y_continuous(expand = c(0, 0)).
  • One issue that's introduced by setting expand = c(0, 0) in scale_y_continuous() is that there is no space "above" the bars, which makes the plot feel a bit claustrophobic.
    • Normally, there are a few solutions to this (e.g. expand_limits() or expand_scale() from ggplot2), but most of them don't work here.
    • Instead, I had to add geom_blank(aes(y = value * 1.05)) to account for the fact that value contains a mix of positive and negative values.
    • The 1.05 value is simulating the default behaviour of the expand argument for scale_x_continuous() (see help).
  • To emphasize the facet titles/strips, I added the following to the theme: strip.background = element_rect(colour = "black").
    • I also set colour = "black" for geom_hline() for consistency.

Display only positive values on X-axis

While I was improving the labelling, I also made a small tweak so that the X-axis displays positive values on each side (instead of negative values on one side). For this, here's what I changed:

  • I create a vector called breaks that contains the values I want to label on the X-axis. Then, I set the names of the vector to the absolute values of that vector. See below for what I mean.
    > breaks
     6  5  4  3  2  1  0  1  2  3  4  5  6 
    -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6 
    
  • N.B. You normally shouldn't have duplicate names in a vector or list. In this case, we won't be using the names to access the elements, so it's not a problem. Alternatively, you could store the absolute values in a separate vector.
  • We can fix our X-axis labels by setting scale_y_continuous(breaks = breaks, labels = names(breaks)). This ensures that positive values are shown on both sides of the vertical bar.

Example

library(ggplot2)
library(tibble)
library(forcats)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

set.seed(1)

dframe <- tibble(id = 1:20, 
                 res1 = sample(0:5, 20, replace = TRUE), 
                 res2 = sample(0:5, 20, replace = TRUE), 
                 diff = res1-res2,
                 cat = sample(c("a", "b", "c", "d"), 20, replace = TRUE))

dframe <- mutate(dframe, 
                 id = as.factor(id),
                 id = fct_reorder(id, diff),
                 res1 = -res1)

dframe <- gather(dframe, result_name, value, res1, res2)

breaks <- -6:6
names(breaks) <- abs(breaks)

ggplot(dframe, aes(x = id, y = value, fill = cat)) +
  geom_col() +
  geom_hline(yintercept = 0, colour = "black") +
  geom_blank(aes(y = value * 1.05)) +
  coord_flip() +
  facet_grid(~ result_name, scales = "free_x") +
  scale_y_continuous(breaks = breaks, labels = names(breaks), 
                     expand = c(0, 0)) +
  theme(panel.spacing.x = unit(0, "pt"), 
        strip.background = element_rect(colour = "black"))

Created on 2018-10-28 by the reprex package (v0.2.1)

3 Likes

If you want to force each facet to have the same limits/range, you can manually set the data for geom_blank(). Here are the changes in the example below:

  • I changed the range of res2 to 0:4 to simulate the case where the data isn't "symmetrical".
  • I created the expand_data data frame that contains the limit I want to display in each facet. The minimum columns I need is one for the facet and one for the aesthetic I want to expand (in this case, y).
  • Because I don't have all of the columns from dframe in expand_data, I moved the aes(x = id, y = value, fill = cat) from the ggplot() call to geom_col(). Otherwise, geom_blank() is going to inherit the requirement for x = id and fill = cat.

Example

library(ggplot2)
library(tibble)
library(forcats)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

set.seed(1)

dframe <- tibble(id = 1:20, 
                 res1 = sample(0:5, 20, replace = TRUE), 
                 res2 = sample(0:4, 20, replace = TRUE), 
                 diff = res1-res2,
                 cat = sample(c("a", "b", "c", "d"), 20, replace = TRUE))

dframe <- mutate(dframe, 
                 id = as.factor(id),
                 id = fct_reorder(id, diff),
                 res1 = -res1)

dframe <- gather(dframe, result_name, value, res1, res2)

breaks <- -6:6
names(breaks) <- abs(breaks)

expand_data <- data.frame(result_name = c("res1", "res2"),
                          value = c(-5, 5))

ggplot(dframe) +
  geom_col(aes(x = id, y = value, fill = cat)) +
  geom_hline(yintercept = 0, colour = "black") +
  geom_blank(aes(y = value * 1.05), expand_data) +
  coord_flip() +
  facet_grid(~ result_name, scales = "free_x") +
  scale_y_continuous(breaks = breaks, labels = names(breaks), 
                     expand = c(0, 0)) +
  theme(panel.spacing.x = unit(0, "pt"), 
        strip.background = element_rect(colour = "black"))

Created on 2018-11-14 by the reprex package (v0.2.1)

Awesome, thank you again! There should've been a "Send Champagne" button here...

1 Like