Accessing ggplot's inherited data object from a layer

nicke5012 · September 15, 2017, 3:35pm

Hi there, is there a way to access the object passed to ggplot from one of it's layers?

For example, if I wanted to do ggplot(some_data_frame, aes(x, y, labels=z)) + geom_line() + geom_text(<access some_data_frame to do some operations on it and only show labels for certain points>)

tbradley · September 15, 2017, 4:05pm

You can do this by specifying the data argument within the geom_text call and setting it equal to a subset of the data. For example:

ggplot(some_data_frame, aes(x, y)) +
  geom_line() +
  geom_text(data = some_data_frame %>%
                     filter(x %in% c(4, 7, 9)), 
             aes(x, y, label = z))

knm · September 15, 2017, 4:08pm

If I am understanding you correctly, you want to do something similar to

ggplot(mpg, aes(x = hwy, y = displ, label = model)) +
  geom_point() +
  geom_text(data = filter(mpg, hwy > 40))

which labels only the points with hwy above 40, but you want to do so without explicitly using the name of the data frame, mpg, inside the geom_text() function. If that is what you want, you can achieve that by using a function to define your data, e.g.:

ggplot(mpg, aes(x = hwy, y = displ, label = model)) +
  geom_point() +
  geom_text(
    data = function(x) { filter(x, hwy > 40) }
  )

cpsievert · September 15, 2017, 4:17pm

Assuming the desired labels can't be determined until print time (perhaps by inspecting the graphics device?), it sounds like you want to be leveraging/creating a custom ggplot2 geom extension like ggrepel. If ggrepel isn't what you need, you can learn about creating your own geom here.

To answer the question in the title: "how to access ggplot’s inherited data object from a layer", you can always use ggplot_build() to get a data structure sufficient for describing the plot, which includes each layer of data:

library(ggplot2)
library(sf)
library(albersusa)

usa <- usa_sf("laea")

# st_centroid gets the center POINT of polygons
uscenter <- st_centroid(usa)

p <- ggplot() + 
  geom_sf(data = usa) +
  geom_sf(data = uscenter)

b <- ggplot_build(p)
identical(usa, b$plot$layers[[1]]$data)
#> [1] TRUE
identical(usa, b$plot$layers[[2]]$data)
#> [1] TRUE

I'm not sure how that gets you any closer to the goal of only showing labels for certain points, though. You'd probably need to hack into ggplot_gtable() which is the actual grid object ggplot2 passes onto grid to do the drawing (assuming you don't want to go the custom geom route):

> ggplot_gtable(b)
TableGrob (10 x 7) "layout": 17 grobs
    z         cells       name                                  grob
1   0 ( 1-10, 1- 7) background       rect[plot.background..rect.274]
2   5 ( 5- 5, 3- 3)     spacer                        zeroGrob[NULL]
3   7 ( 6- 6, 3- 3)     axis-l   absoluteGrob[GRID.absoluteGrob.268]
4   3 ( 7- 7, 3- 3)     spacer                        zeroGrob[NULL]
5   6 ( 5- 5, 4- 4)     axis-t                   null[GRID.null.254]
6   1 ( 6- 6, 4- 4)      panel              gTree[panel-1.gTree.253]
7   9 ( 7- 7, 4- 4)     axis-b   absoluteGrob[GRID.absoluteGrob.261]
8   4 ( 5- 5, 5- 5)     spacer                        zeroGrob[NULL]
9   8 ( 6- 6, 5- 5)     axis-r                   null[GRID.null.269]
10  2 ( 7- 7, 5- 5)     spacer                        zeroGrob[NULL]
11 10 ( 4- 4, 4- 4)     xlab-t                        zeroGrob[NULL]
12 11 ( 8- 8, 4- 4)     xlab-b                        zeroGrob[NULL]
13 12 ( 6- 6, 2- 2)     ylab-l                        zeroGrob[NULL]
14 13 ( 6- 6, 6- 6)     ylab-r                        zeroGrob[NULL]
15 14 ( 3- 3, 4- 4)   subtitle zeroGrob[plot.subtitle..zeroGrob.271]
16 15 ( 2- 2, 4- 4)      title    zeroGrob[plot.title..zeroGrob.270]
17 16 ( 9- 9, 4- 4)    caption  zeroGrob[plot.caption..zeroGrob.272]

nick · September 15, 2017, 6:15pm

I hadn't noticed the function option for the geom_* data argument in the help previously -- that's neat! If you wanted to make it a little more "tidyverse-ish", you could use a functional sequence:

ggplot(mpg, aes(x = hwy, y = displ, label = model)) + 
  geom_point() + 
  geom_text(data = . %>% filter(hwy > 40))

knm · September 15, 2017, 6:28pm

Nice, that is a lot cleaner and easier to read.

nicke5012 · September 15, 2017, 7:37pm

oh, awesome! Thanks for all the responses everyone. @knm hit it on the head when she suggested ggplot(mpg, aes(x=hwy,y=displ,label=model))+geom_point()+geom_text(data=function(x){filter(x,hwy>40)}), and @nick made it look nicer by using ..

To enumerate, I've been doing something like the following, where I create an intermediate tmp dataframe:

tmp <- mpg %>%
    filter(<something>)
tmp %>%
    ggplot(aes(x=hwy, y=displ, label=model)) + 
    geom_point() +
    geom_text(tmp >% filter(hwy > 40))

But wanted to see if I could be super lazy and do it all in a single string of pipes. So this would totally work:

mpg %>%
    filter(<something>) %>%
    ggplot(aes(x=hwy, y=displ, label=model)) + 
    geom_point() +
    geom_text(data = . %>% filter(hwy > 40))

@nick -- to as a followup, is the . (dot) object standard in R and can I use it to reference inherited functions in other objects? Or it that something tidyverse specific?

Thanks again!

nick · September 15, 2017, 8:11pm

In the tidyverse, much of the use of the dot comes from the magrittr pipe package. In this case, if the left argument of the pipe is ., then it makes a function instead of trying to evaluate the dot. In general, . %>% somefunction() is equivalent to function(x) {somefunction(x)}.

You can also make chains, like from the help file for %>%:

# Building unary functions with %>%
trig_fest <- . %>% tan %>% cos %>% sin

1:10 %>% trig_fest
#>  [1]  0.0133878 -0.5449592  0.8359477  0.3906486 -0.8257855  0.8180174
#>  [7]  0.6001744  0.7640323  0.7829771  0.7153150

trig_fest(1:10)
#>  [1]  0.0133878 -0.5449592  0.8359477  0.3906486 -0.8257855  0.8180174
#>  [7]  0.6001744  0.7640323  0.7829771  0.7153150

If you look through the magrittr help, there are several references to functional sequences, such as the fact that you can extract them with [[.

yeedle · October 22, 2017, 4:36pm

In a similar vein to OP, can I do something like this:

 ggplot(some_data_frame, aes(x, y)) + 
    geom_whatever() +
    scale_x_discrete(breaks = <reference column x or dataframe$x to get breaks, e.g. min(x):max(x)>)

tjmahr · October 23, 2017, 11:18am

Whoa, I never knew this either!

jcblum · October 24, 2017, 5:58pm

You'll probably get more and better answers if you post this as a separate topic. Even better, use the "Reply as Linked Topic" feature (which, unfortunately, seems to only be accessible via keyboard shortcut?)

Try: type j to select the original post, then type t to begin a new topic that's automatically linked as a follow-up to this one.

If you do that, I'll happily reply in more detail there, but in the meantime depending on how you want to calculate your breaks, is this bit of the discrete_scale docs helpful?:

breaks control the breaks in the guide. There are four possible types of input:

[…]

a function, that when called with a single argument, a character vector giving the limits of the scale, returns a character vector specifying which breaks to display.

stanwood · December 14, 2017, 6:53pm

I wasn't able to get the keyboard shortcut to work but I think I found this functionality in the GUI. Click on the "share" icon under a post (looks like a chain) and there is an option to "+ New Topic".

jcblum · December 14, 2017, 7:25pm

Hey! You're right! Thanks!!

Wish that was easier to find, because it's a nice feature. There are an abundance of little widgets to click on in Discourse, and I find it hard to guess which ones will hold which menus. Not to mention, do I want to reply to a post? To the first post? To the topic, which is somehow distinct from the originating post? (at least, it has a separate bank of widgets at the bottom of the page) OK, off-topic rant over!