Is it possible to conditionally layer/raise certain text using geom_text?

Is it possible to conditionally raise certain data points with ggplot2? Here's an example where I'm trying to highlight the name "Phillip", but it's difficult to see with all the grey. I still want the grey values, I just want to raise all the "Phillip" layers to the top, over the grey "Billip" layers.

library(tidyverse)

mydata <- tibble(name = c(rep("Phillip", 10), rep("Billip", 9990)),
                 value_1 = runif(n = 10000, min = 0, max = 5),
                 value_2 = runif(n = 10000, min = 2, max = 4))

mydata %>% 
  ggplot(aes(x = value_1, y = value_2, label = name)) +
  geom_text(aes(color = ifelse(name == "Phillip", "black", "grey")), fontface = "bold", family = "mono") + 
  scale_color_manual(values = c("black", "grey"), guide = FALSE)

Created on 2018-05-30 by the reprex package (v0.2.0).

1 Like

I believe the plotting will be in order of appearance in the data frame. So it might work to arrange the data frame before it is fed into ggplot. Or you could split into two geom_text layers, making the second one just filtered to the name you want to highlight, with black text.

2 Likes

Yeah, making multiple calls with subsetted data is the usual way to highlight:

library(tidyverse)
set.seed(47)

mydata <- tibble(name = c(rep("Phillip", 10), rep("Billip", 2990)),
                 value_1 = runif(n = 3000, min = 0, max = 5),
                 value_2 = runif(n = 3000, min = 2, max = 4))

ggplot(mapping = aes(x = value_1, y = value_2, label = name)) +
    geom_text(data = mydata[mydata$name != "Phillip",], color = "grey") + 
    geom_text(data = mydata[mydata$name == "Phillip",], color = "black")

There's also gghighlight, but that only works for points and lines so far, not text.

Personally, what I really want with geom_text is a simple way to add drop shadows or other means of creating contrast without resorting to geom_label, which covers up too much and looks a little clunky.

5 Likes

Another approach is to make a column in your data frame that determines whether a row should be highlighted and tying the colour aesthetic to that column:

library(tidyverse)
set.seed(47)

mydata =
  tibble(
    name = c(rep("Phillip", 10), rep("Billip", 2990)),
    value_1 = runif(n = 3000, min = 0, max = 5),
    value_2 = runif(n = 3000, min = 2, max = 4)) %>%
  mutate(hl_phillip = if_else(name == "Phillip", "Phillip", "Not Phillip"))

ggplot(mydata, aes(x = value_1, y = value_2, label = name)) +
  geom_text(aes(colour = hl_phillip), fontface = "bold", family = "mono") + 
  scale_color_manual(values = c("Not Phillip" = "black", "Phillip" = "grey"), guide = FALSE)

This approach also means that your data stays in one data frame, which is especially helpful if you need to highlight data in several different ways!

EDIT: you might also consider plotting only points for the full data set and only plotting text for the Phillips, because geom_text and geom_label can also handle empty strings. I've used this approach before, and it works really well :slight_smile:

1 Like

That doesn't really simplify, as you'll still need to reorder the rows to get the highlighted rows to plot on top.

Sometimes you can avoid subsetting/respecifying data by setting one color value to "transparent" (which does what it says), though that's not useful in this particular case. Sometimes mapping to alpha can be a useful way to highlight, too.

nice I'm gonna try this. You think there's a way to subset the data the way you did in a pipeable way? Because in my real data set, I manipulate the data and then pipe it into the ggplot format -- like how I did it in the example.

Ah, that's true; I forgot about plotting order. I would probably just avoid plotting that much text in this case, but if it's important to print them all, then yeah, you'd need to arrange.

The issue is that you need the piped-in data in two locations, neither of which is the first parameter of ggplot. That's possible if you wrap the plotting code in braces and use . to specify where the data goes:

mydata %>% {
    ggplot(mapping = aes(x = value_1, y = value_2, label = name)) + 
        geom_text(data = filter(., name != "Phillip"), color = "grey") + 
        geom_text(data = filter(., name == "Phillip"), color = "black")
}

...but with a complicated plot, this arrangement can get considerably less readable than just saving a variable.

yeah that makes sense. My issue is I plan on making ~6 different plots, so it's a bit of a pain in the ass to have 6 new objects floating around that I won't be using. I think works pretty well though. Thanks a lot!

Yeah, apparently I saw you do that! Yeah for my chart, it's kinda nice to have all the text, but I can definitely see the value in what you did in most situations.

1 Like

Yeah, looks like you were right!

I'm not exactly sure what you're referring to RE: drop shadows, but shadowtext::geom_shadowtext() may be one way to accomplish what you're describing.

1 Like

If you’re going to do this 6 different times (and the plots are largely similar in their design), you could wrap the plotting code in a function that does take the dataframe as its first variable. Whether this is more straightforward than what @alistaire suggested (using braces and dot syntax) or not depends on how simple it is to generalize those six plots into a single function.

1 Like

That's pretty close, but I'm looking for something more like CSS's text-shadow, which is usually used with a significant Gaussian blur so it looks less meme-like but still adds contrast, e.g.


(source)

...and apologies to the OP; I've hijacked another thread

lol it's all good. I like the conversation, honestly.

Thanks yeah that makes sense. Turns out the 6 charts turned out to be like 60. Tbh turning it in a function is probably the smart approach, but my motto for this past week has been "finish my projects, regardless of how pretty the code is"

No need to subset and make two layers. You just need to sort the data in the order in which you want to plot it

library(tidyverse)

mydata <- tibble(name = c(rep("Phillip", 10), rep("Billip", 990)),
  value_1 = runif(n=1000, min = 0, max = 5),
  value_2 = runif(n=1000, min = 2, max = 4))

# set order, sort to make Phillip last (to be plotted last)
mydata$name <- factor(mydata$name, levels=c("Billip", "Phillip"))
mydata <- arrange(mydata, name)

ggplot(mydata) +
  geom_text(aes(x=value_1, y=value_2, label=name, color=name)) +
  scale_color_manual(values=c("grey", "black"))