Coloring/modifying 2 dots of a waterfall plot in ggplot2 ?

cwright1 · April 28, 2020, 4:03pm

I've adapted an example plot from a thread I found online demonstrating how to make waterfall plots in ggplot:

library(ggplot2)

#dummy data
mydat <- mtcars[ , c("cyl", "mpg")]
mydat$Cancer <- paste0("cancer_", mydat$cyl)

#Order by cancer and mpg 
mydat <- mydat[ order(mydat$Cancer, mydat$mpg), ]
mydat$x <- seq(nrow(mydat))

#For this purpose get rid of cancer_4 and cancer_6
mydat<-mydat[!mydat$Cancer=="cancer_4",]
mydat<-mydat[!mydat$Cancer=="cancer_6",]


# plot
ggplot(mydat, aes(x, mpg, col = Cancer)) +
  geom_point() +
  scale_x_continuous(breaks = which(!is.na(mydat$xlabel)), 
                     labels = mydat$xlabel[ !is.na(mydat$xlabel)])

My result is this :

How can I highlight the dots corresponding to rows "Maserati Bora" and "Merc 450SLC" - and how can I make them bigger than the rest of the dots? For example, make both dots bigger but "Maserati Bora" in green and "Merc 450LC" in blue?

mara · April 29, 2020, 3:40pm

Given you're colouring them using the "Cancer" variable, you'd have to make that variable different for the values corresponding to Maserati Bora and Merc 450LC to get them to be different colours. Given they're all the same right now, it's not really encoding any information. (Note that it's difficult to use rownames for aesthetics in ggplot, so I'd recommend using rownames_to_column() on the data to make the car names more readily available).

You might want to take a look at the gghighlight package, too:

Edit: You can control size using gghighlight as well, see the section on customization here.

For the size you, again, would want to map those values to size to have those points be bigger (a dummy variable should do the trick there). For example, here I'm creating a new variable with a larger size for those two points, then hiding the legend for it, since it's not actually a meaningful numeric scale.

suppressPackageStartupMessages(library(tidyverse))

#dummy data

mydat <- mtcars[ , c("cyl", "mpg")]
mydat <- rownames_to_column(mtcars)
mydat$Cancer <- paste0("cancer_", mydat$cyl)

#Order by cancer and mpg 
mydat <- mydat[ order(mydat$Cancer, mydat$mpg), ]
mydat$x <- seq(nrow(mydat))

#For this purpose get rid of cancer_4 and cancer_6
mydat<-mydat[!mydat$Cancer=="cancer_4",]
mydat<-mydat[!mydat$Cancer=="cancer_6",]

select_cars <- c("Maserati Bora", "Merc 450SLC")
mydat <- mydat %>%
  mutate(special_cars = if_else(rowname %in% select_cars, 1.5, 1))

# plot
ggplot(mydat, aes(x, mpg, col = Cancer)) +
  geom_point(aes(size = special_cars)) +
  scale_x_continuous(breaks = which(!is.na(mydat$xlabel)), 
                     labels = mydat$xlabel[ !is.na(mydat$xlabel)]) +
  guides(size = FALSE) # get rid of legend for size

^{Created on 2020-04-29 by the reprex package (v0.3.0.9001)}

cwright1 · May 1, 2020, 2:54am

that packages beautifully served my purpose. Thanks for your time! solved

system · May 8, 2020, 2:54am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.