Aligning labels under geom_col with varying widths

I'm making a column graph of building energy consumption. The x axis is ordered by construction year and bldg name, y axis is energy metric, and I'm trying to make the width of each column proportional to the square footage. When I run the geom_col without assigning width, the x axis labels line up perfectly well. When I add the calls for width and for position_dodge to eliminate overlap (seen below), the labels bunch up in the center. Any help or direction to resources would be greatly appreciated.

library(tidyverse)
help.data <- data.frame(
  stringsAsFactors = FALSE,
         row.names = c("1", "6", "9", "16", "18"),
             bName = c("AEROSPACE/MECHANICAL ENG",
                       "AHSC","ANTHROPOLOGY","APACHE HALL","ARBOL DE LA VIDA"),
           constYr = c(1997, 1968, 1962, 1957, 2009),
               gsf = c(184586, 460019, 38906, 30876, 234455),
          bldgType = c("Academic","Medical",
                       "Academic","Dormitory","Dormitory"),
               eui = c(214.404397950007,
                       526.612007183703,33.4818633521822,86.3320853089779,79.3320687466678)
)

ggplot() + geom_col(data = help.data, 
                    aes(x = reorder(bName, constYr), y = eui, fill = bldgType), 
                    width = help.data$gsf, 
                    position_dodge2(preserve = c("total"))) + 
  ggtitle("Energy Use Indices - Width by gsf") + 
  labs(x = NULL, y = bquote('EUI (kBTU/gsft)')) + 
  theme(axis.text.x = element_text(angle = 90, size = 7, hjust = 1),
        plot.title = element_text(hjust = 0.5))

Created on 2020-03-12 by the reprex package (v0.3.0)

Hi, and welcome!

Thanks for the great reprex. I'm working on this now. My usual way is to start with a base object

p <- ggplot() + geom_col(data = help.data, 
                aes(x = reorder(bName, constYr), y = eui, fill = bldgType), 
                width = help.data$gsf, 
                position_dodge2(preserve = c("total")))

and remove the embellishments to get to the core issue, then the embellishments can be added back in. Be back in a bit.

1 Like

Can you clarify what the widths mean to you?

Since the x axis is made up of a factor with 5 levels, each tick mark is 1 unit apart so the width of the whole plot is about 5 units on the x axis. But your gsf values are extremely large, so the bar widths are huge compared to how far apart each axis label is. I believe that's why things look "bunched".

Could it make sense to do something like have the widths be based on each value divided by the max value; i.e., width = help.data$gsf/max(help.data$gsf)? (I'm just taking a wild guess here. :slightly_smiling_face:)

That would look like

ggplot() + 
     geom_col(data = help.data, 
                    aes(x = reorder(bName, constYr), y = eui, fill = bldgType), 
                    width = help.data$gsf/max(help.data$gsf)
              )  + 
     ggtitle("Energy Use Indices - Width by gsf") + 
     labs(x = NULL, y = bquote('EUI (kBTU/gsft)')) + 
     theme(axis.text.x = element_text(angle = 90, size = 7, hjust = 1),
           plot.title = element_text(hjust = 0.5))

Created on 2020-03-12 by the reprex package (v0.3.0)

3 Likes

Only, here we're showing gsf on the x-axis, where I need the x-axis to scale to year of construction, and then by bName.

OK, so bars are chronologically ordered by labelled by name, in chronological, rather than alphabetic order?

Yes, least, that was my intent with the reorder(bName, constYr) call.

Would it help if I push my whole working code up to Rpubs?

1 Like

Thanks.

Probably no need since the reprex illustrates the problem sufficiently. I can get ordered by constYr, could label buildings with a text_geom, but now I'm having problems with width overlaps.

I love the smell of napalm in the morning!

2 Likes

I'm going to leave it here for the evening. I can get a bare plot with the building names in chronological order (by re-arranging the data frame) and the names as the x-axis label at 90º . As soon, however, as I introduce geom_col(), it goes all blooey. (An artifact of tearing my hair out all day over this, I fear).

Will try again manaña

I've re-arranged the data frame to put the data in chronological order and gotten as far as

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2)) 
help.data <- data.frame(
        stringsAsFactors = TRUE,
        row.names = c("16", "9", "6", "1", "18"),
        bName = c("APACHE HALL","ANTHROPOLOGY","AHSC","AEROSPACE/MECHANICAL ENG","ARBOL DE LA VIDA"),
        constYr = c(1957, 1962, 1968, 1997, 2009),
        gsf = c(38906, 30876, 460019, 184586, 234455),
        bldgType = c("Dormitory", "Academic","Medical","Academic","Dormitory"),
        eui = c(33.4818633521822,86.3320853089779,526.612007183703, 214.404397950007,79.3320687466678))
levels(help.data$bName) <- c("APACHE HALL","ANTHROPOLOGY","AHSC","AEROSPACE/MECHANICAL ENG","ARBOL DE LA VIDA")

p <- ggplot(help.data, aes(bName,eui, fill = bldgType))
p + theme(axis.text.x = element_text(angle = 90, size = 7, hjust = 1)) + 
    xlab(NULL) +
    ylab("EUI (kBTU/gsft)")

Created on 2020-03-13 by the reprex package (v0.3.0)

Adding `geom_col() works to add the bars, colored by building type, but adding the width adjustment clobbers the x-axis. Argggg

What would happen if I sort the data frame by construction year and name, use the row names (which would just be an index number) as x axis, but use bName as x axis labels? Not clear if this would allow the labels to spread out to sit under the varying width columns.

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2)) 
help.data <- data.frame(
        stringsAsFactors = TRUE,
        row.names = c("16", "9", "6", "1", "18"),
        bName = as.factor(c("APACHE HALL","ANTHROPOLOGY","AHSC","AEROSPACE/MECHANICAL ENG","ARBOL DE LA VIDA")),
        constYr = c(1957, 1962, 1968, 1997, 2009),
        gsf = c(38906, 30876, 460019, 184586, 234455),
        bldgType = c("Dormitory", "Academic","Medical","Academic","Dormitory"),
        eui = c(33.4818633521822,86.3320853089779,526.612007183703, 214.404397950007,79.3320687466678))
p <- ggplot(help.data, aes(bName,eui, fill = bldgType))
p + theme(axis.text.x = element_text(angle = 90, size = 7, hjust = 1)) + 
    xlab(NULL) +
    ylab("EUI (kBTU/gsft)") +
    geom_col(aes(x = reorder(bName, constYr), y = eui, fill = bldgType), 
                    width = help.data$gsf/max(help.data$gsf)) +
    ggtitle("Energy Use Indices", subtitle = "Width proportional to gsf")

Created on 2020-03-13 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Thanks, and apologies for not stripping it down far enough. I'm at a point in my work with R that I didn't trust myself to not remove an element impacting my problem. I really appreciate the assist!

1 Like

I think I've spotted the big problem, but I'm still working on a fix.

bName = c("AEROSPACE/MECHANICAL ENG",  "AHSC","ANTHROPOLOGY","APACHE HALL","ARBOL DE LA VIDA")

is saying the same thing as label the tick "Now is the time for all good men to come to the aid of their party"

1 Like

Thanks for looking at this. That's an interesting approach, although if I want the width to represent the proportional square footage, would I set width = gsf / sum(gsf)?

I just re-ran the plot of my entire dataset - the columns in some cases are so slim they are not visible.

Thanks, that's where I was crawling. gsf axis spans 10^5. Maybe fixable with a log scale

library(tidyverse)
help.data <- data.frame(
  stringsAsFactors = TRUE,
         row.names = c("1", "6", "9", "16", "18"),
             bName = c("A","B","C","D","E"),
           constYr = c(1997, 1968, 1962, 1957, 2009),
               gsf = c(184586, 460019, 38906, 30876, 234455),
          bldgType = c("Academic","Medical",
                       "Academic","Dormitory","Dormitory"),
               eui = c(214.404397950007,
                    526.612007183703,33.4818633521822,
                    86.3320853089779,79.3320687466678)
)


p <- ggplot() + geom_col(data = help.data, 
                aes(x = gsf, group = bName, y = eui, fill = bldgType), 
                width = help.data$gsf,
                position_dodge2(preserve = c("total")))
p

Created on 2020-03-12 by the reprex package (v0.3.0)

2 Likes

I was thinking it would be useful to have the max width be 1 in terms of plotting.

I feel like I might be missing something, but the relative widths of each bar will be the same if you divide them all by the same thing. I was picturing that the widths should be compared among groups, so the interest would be in relative widths.

For example, the square footage of AHSC is ~11.8 times larger than AEROSPACE (460019/38906). If you use proportion of the max square footage instead you get the same result (1/0.08457477).

1 Like