Implementing a new Geom of ggplot2 with tidyeval

yutannihilation · September 26, 2017, 8:53am

(Sorry for posting a vague question here, but I think SO or GitHub issues are not appropriate place to ask this so RStudio Community is my only hope...)

Hi. I'm trying to create an extension of ggplot2, which enables to interactively highlight data series with dplyr and tidyeval.

GitHub: GitHub - yutannihilation/gghighlight: Highlight points and lines in ggplot2

Example:

library(gghighlight)
library(ggplot2)

# generate dummy data
set.seed(1)
genenerate_series <- function(cat) {
  data.frame(idx = 1:100, value = cumsum(runif(100, -1, 1)), category = cat,
             stringsAsFactors = FALSE)
}
d <- purrr::map_dfr(letters, genenerate_series)

# plot
ggplot(d) +
  geom_highlighted_line(max(value) > 10, aes(idx, value, colour = category))

Seems easy? The code above is OK, as the mapping is provided to the layer so that the layer knows enough information to construct a predicate function from the expression max(value) > 20.

But, if the mapping is provided only on the other layer, it fails; I want to filter them by grouped manner, but there are no means to know which key will be used to split data into groups.

ggplot(d, aes(idx, value, colour = category)) +
  geom_highlighted_line(max(value) > 10)

A simplified version of my implementation is bellow:

build_grouped_filter_func <- function(predicate, key) {
  function(df) {
    df_grouped <- dplyr::group_by(df, !! key)
    df_filtered <- dplyr::filter(df_grouped, !! predicate)
    dplyr::ungroup(df_filtered)
  }
}

geom_highlighted_line <- function(.predicate, mapping, ...) {
  predicate <- rlang::enquo(.predicate)
  key <- mapping$colour
  
  filter_func <- build_grouped_filter_func(predicate, key)

  mapping_orig <- mapping
  mapping$group <- key

  list(
    # grey layer
    geom_line(mapping = mapping, colour = unhighlighted_colour),
    # coloured layer
    geom_line(data = filter_func, mapping_orig, ...)
  )
}

IIUC, when creating a Geom or Stat object, it is too early to know what mappings will be used. On the other hand, customizable method of Geom or Stat (e.g. compute_group) is too late, as the data passed to them are already aes-mapped, with which the quoted expression can be no more evaluated.

Is it possible for a layer to know the other layer's raw data and mappings? Any advice is appreciated.

cderv · September 26, 2017, 10:01pm

Did you look into the extension mechanism for ggplot2 with ggproto?

There is a part in the vignette about creating a new geom. As you can see there is a way with this to inherits aes. I think it is the way to go to create ggplot extension.

Nice idea by the way!

yutannihilation · September 26, 2017, 11:09pm

@cderv Thanks! Yes, I do know the vignette

Replacing the current toy implementation above with a new geom is what I'm struggling with. Sorry I forgot to describe this... To clarify, I've changed the title.

The difficult thing is that Stats and Geoms have little control over how the data for plotting is extracted from the original data and the mapping. layer_data is generated here in ggplot_build(), but Geoms have no control here. Later, methods of a Geom/Stat take effect around here, but it is too late as the data passed to the method is already converted the one with aes colnames such as x, y and group...