Interesting tidy eval use cases


#22

Hi @lionel,

This works really nicely for continuous outcomes, but not when I try to implement something similar for categorical outcomes, The issue is in the gather function, it seems to convert the values to character instead of preserving factors. So when you use complete it doesn't see the Value column as factor. (This is probably straying from the discussion around tidy evaluation though...)

library(tidyverse)
fac_mtcars <- mtcars %>%
  mutate_at(vars(cyl,carb), as.factor) %>%
  select(cyl, carb) %>%
  as_tibble()

table(fac_mtcars$cyl, fac_mtcars$carb)
#>    
#>     1 2 3 4 6 8
#>   4 5 6 0 0 0 0
#>   6 2 0 0 4 1 0
#>   8 0 4 3 6 0 1

## Doesn't honour the `complete` since Value is of type `character`
## rather than `factor`.
fac_mtcars %>%
  group_by(cyl) %>%
  gather("Variable", "Value", carb) %>%
  group_by(Variable, add = TRUE) %>%
  count(Value) %>%
  complete(Value, fill=list(n=0))
#> # A tibble: 9 x 4
#> # Groups:   cyl, Variable [3]
#>   cyl   Variable Value     n
#>   <fct> <chr>    <chr> <dbl>
#> 1 4     carb     1         5
#> 2 4     carb     2         6
#> 3 6     carb     1         2
#> 4 6     carb     4         4
#> 5 6     carb     6         1
#> 6 8     carb     2         4
#> 7 8     carb     3         3
#> 8 8     carb     4         6
#> 9 8     carb     8         1
```

#23

I take it back! Capturing by action is definetly better in that case. Here's a rather elegant way to implement word_dict() by forwarding inputs to transmute() and making use a tibble::deframe() to transform a two-column data frames to a named vector. cc @jennybryan

word_dict <- function(data, word, score) {
  word <- enquo(word)
  score <- enquo(score)

  data %>%
    transmute(!!word, !!score) %>%
    tibble::deframe()
}

This implementation allows simple selections as before:

test_data %>% word_dict(letter, number)
#>     a     b     c     d     e     f     g     h     i     j     k     l
#>  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
#>     m     n     o     p     q     r     s     t     u     v     w     x
#> FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
#>     y     z
#> FALSE FALSE

However, because it is now taking actions, you can transform the vectors on the fly:

test_data %>% word_dict(toupper(letter), number * 10)
#>  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
#> 10  0  0  0 10  0  0  0 10  0  0  0  0  0 10  0  0  0  0  0 10  0  0  0  0  0

#24

Here's an example to convert user input to a partial Stan code, allowing to unquote variables and expressions (Disclaimer: I'm not familiar with Stan at all, so I don't know how useful this is in actual :wink:). I think tidy eval is particularly interesting when it's used in between some DSL (or other language) and R code.

library(rlang)

generate_stan_model_code <- function(x, y) {
  quos <- enquos(x = x, y = y, .ignore_empty = "all")
  labels <- paste(names(quos), "~", purrr::map_chr(quos, as_label))
  labels <- paste0("    ", labels, ";", collapse = "\n")
  glue::glue("
parameters {
    real<lower=0,upper=1> x;
    real<lower=0,upper=1> y;
}
model {
{{labels}}
}",
    .open = "{{", .close = "}}"
  )
}

generate_stan_model_code(beta(1, 1))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ beta(1, 1);
#> }
generate_stan_model_code(beta(1, 1), normal(0, 100))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ beta(1, 1);
#>     y ~ normal(0, 100);
#> }

mu <- 0.1
generate_stan_model_code(normal(!!mu, 1), normal(!!mu, 100))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ normal(0.1, 1);
#>     y ~ normal(0.1, 100);
#> }

get_mu <- function() runif(1)
generate_stan_model_code(normal(!!get_mu(), 1), normal(!!get_mu(), 100))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ normal(0.994106655474752, 1);
#>     y ~ normal(0.661887304158881, 100);
#> }

Created on 2019-01-10 by the reprex package (v0.2.1)


#25

I think base::deparse() would be more appropriate than as_label() here. The latter will simplify complex expressions. You'll have to squash the quosure before deparsing. Though I would just capture with enexprs() instead of enquos(), since there's no evaluation going on.


#26

Thanks, true. When we want a more nested DSL, maybe quosures will be needed, but I don't come up with a good example...


#27

I've been using this to tidy up data that has subheaders embedded in a data variable. The subheaders are matched with regex and put into their own variable.

The function:

untangle2 <- function(df, regex, orig, new) {
  orig <- dplyr::enquo(orig)
  new <- dplyr::ensym(new)
  to_fill <- dplyr::mutate(
    df,
    !!new := dplyr::if_else(grepl(regex, !!orig), !!orig, NA_character_)
  )
  dffilled <- tidyr::fill(to_fill, !!new)
  dplyr::filter(dffilled, !grepl(regex, !!orig))
}

Example usage:

dat <- tibble::tibble(
  site = c("Wet Season", "a", "b", "Dry Season", "a", "b"),
  rain = c(NA, 52, 41, NA, 12, 9)
)

dat %>% untangle2("Season", site, Season)


#28

I often find myself wanting to use tidyselect helpers to specify a series of columns for which I want to calculate a rowwise sum or mean. I use the below to wrangle rowSums() or rowMeans() into accepting tidyselect helpers. I'm sure it's a bit hacky, but it seems to do the trick!

(I also added a .value argument so I can specify the name of the output column.)

library(tidyverse)

tidyselect_row_sums <- function (.data, ..., .value = "row_sum", na.rm = FALSE) {
  
  dots <- exprs(...)
  value <- sym(.value)
  cols <- select(.data, !!!dots)
  out <- mutate(.data, !!value := rowSums(cols, na.rm = na.rm))
  return (out)
}

iris %>%
  tidyselect_row_means(starts_with("Sepal"), .value = "Sepal.Mean")
# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Mean
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>        <dbl>
# 1          5.1         3.5          1.4         0.2 setosa        4.3 
# 2          4.9         3            1.4         0.2 setosa        3.95
# 3          4.7         3.2          1.3         0.2 setosa        3.95
# 4          4.6         3.1          1.5         0.2 setosa        3.85
# 5          5           3.6          1.4         0.2 setosa        4.3 
# ... with 145 more rows

#29

A little off-topic, but if there's work being done on tidyeval learning resources, I'd love a cheat sheet. I often feel like I read through resources and grok the differences between different functions, but then it all falls out of my head later. Being able to just see which basic data types go in and out of each function would be really helpful :slightly_smiling_face:


#30

Have a look at https://www.rstudio.com/resources/cheatsheets/ :wink:


#31

Okay, yeah, that's on me :laughing:


#32

I'm not sure how interesting it is, but I often make some quick functions that are essentially boilerplate code to help with exploratory analysis, especially plotting e.g.:

myhist <- function(.data, field, binwidth = 50) {
    ggplot(.data, aes(!!enquo(field)) +
    geom_histogram(binwidth = binwidth)
}

I can then (re)make histograms (in this example) quickly, with different bins, and easily add other features that I might want to do (e.g. log10 scale x-axis if the data are difficult to see). I've also made similar functions for other types of plot, as well as numerical/statistical summaries of the data, too.


#33

Thanks for sharing! That's very much what I imagine the typical use of tidyeval to be.


#34

3 posts were split to a new topic: provocative question: Will tidyeval kill the tidyverse?


#37

For writing functions that use dplyr/DBI interface:

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "mtcars", mtcars)

filterFun <- function(db, tab, col, val){                   
    col <- rlang::enquo(col)
    
    tbl(db, tab)  %>% 
        filter(!!col > val) %>% 
        collect
}

filterFun(db = con,
          tab = "mtcars",
          col = cyl,
          val = 4)

provocative question: Will tidyeval kill the tidyverse?