Interesting tidy eval use cases


#24

Here's an example to convert user input to a partial Stan code, allowing to unquote variables and expressions (Disclaimer: I'm not familiar with Stan at all, so I don't know how useful this is in actual :wink:). I think tidy eval is particularly interesting when it's used in between some DSL (or other language) and R code.

library(rlang)

generate_stan_model_code <- function(x, y) {
  quos <- enquos(x = x, y = y, .ignore_empty = "all")
  labels <- paste(names(quos), "~", purrr::map_chr(quos, as_label))
  labels <- paste0("    ", labels, ";", collapse = "\n")
  glue::glue("
parameters {
    real<lower=0,upper=1> x;
    real<lower=0,upper=1> y;
}
model {
{{labels}}
}",
    .open = "{{", .close = "}}"
  )
}

generate_stan_model_code(beta(1, 1))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ beta(1, 1);
#> }
generate_stan_model_code(beta(1, 1), normal(0, 100))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ beta(1, 1);
#>     y ~ normal(0, 100);
#> }

mu <- 0.1
generate_stan_model_code(normal(!!mu, 1), normal(!!mu, 100))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ normal(0.1, 1);
#>     y ~ normal(0.1, 100);
#> }

get_mu <- function() runif(1)
generate_stan_model_code(normal(!!get_mu(), 1), normal(!!get_mu(), 100))
#> parameters {
#>     real<lower=0,upper=1> x;
#>     real<lower=0,upper=1> y;
#> }
#> model {
#>     x ~ normal(0.994106655474752, 1);
#>     y ~ normal(0.661887304158881, 100);
#> }

Created on 2019-01-10 by the reprex package (v0.2.1)


#25

I think base::deparse() would be more appropriate than as_label() here. The latter will simplify complex expressions. You'll have to squash the quosure before deparsing. Though I would just capture with enexprs() instead of enquos(), since there's no evaluation going on.


#26

Thanks, true. When we want a more nested DSL, maybe quosures will be needed, but I don't come up with a good example...


#27

I've been using this to tidy up data that has subheaders embedded in a data variable. The subheaders are matched with regex and put into their own variable.

The function:

untangle2 <- function(df, regex, orig, new) {
  orig <- dplyr::enquo(orig)
  new <- dplyr::ensym(new)
  to_fill <- dplyr::mutate(
    df,
    !!new := dplyr::if_else(grepl(regex, !!orig), !!orig, NA_character_)
  )
  dffilled <- tidyr::fill(to_fill, !!new)
  dplyr::filter(dffilled, !grepl(regex, !!orig))
}

Example usage:

dat <- tibble::tibble(
  site = c("Wet Season", "a", "b", "Dry Season", "a", "b"),
  rain = c(NA, 52, 41, NA, 12, 9)
)

dat %>% untangle2("Season", site, Season)


#28

I often find myself wanting to use tidyselect helpers to specify a series of columns for which I want to calculate a rowwise sum or mean. I use the below to wrangle rowSums() or rowMeans() into accepting tidyselect helpers. I'm sure it's a bit hacky, but it seems to do the trick!

(I also added a .value argument so I can specify the name of the output column.)

library(tidyverse)

tidyselect_row_sums <- function (.data, ..., .value = "row_sum", na.rm = FALSE) {
  
  dots <- exprs(...)
  value <- sym(.value)
  cols <- select(.data, !!!dots)
  out <- mutate(.data, !!value := rowSums(cols, na.rm = na.rm))
  return (out)
}

iris %>%
  tidyselect_row_means(starts_with("Sepal"), .value = "Sepal.Mean")
# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Mean
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>        <dbl>
# 1          5.1         3.5          1.4         0.2 setosa        4.3 
# 2          4.9         3            1.4         0.2 setosa        3.95
# 3          4.7         3.2          1.3         0.2 setosa        3.95
# 4          4.6         3.1          1.5         0.2 setosa        3.85
# 5          5           3.6          1.4         0.2 setosa        4.3 
# ... with 145 more rows

#29

A little off-topic, but if there's work being done on tidyeval learning resources, I'd love a cheat sheet. I often feel like I read through resources and grok the differences between different functions, but then it all falls out of my head later. Being able to just see which basic data types go in and out of each function would be really helpful :slightly_smiling_face:


#30

Have a look at https://www.rstudio.com/resources/cheatsheets/ :wink:


#31

Okay, yeah, that's on me :laughing:


#32

I'm not sure how interesting it is, but I often make some quick functions that are essentially boilerplate code to help with exploratory analysis, especially plotting e.g.:

myhist <- function(.data, field, binwidth = 50) {
    ggplot(.data, aes(!!enquo(field)) +
    geom_histogram(binwidth = binwidth)
}

I can then (re)make histograms (in this example) quickly, with different bins, and easily add other features that I might want to do (e.g. log10 scale x-axis if the data are difficult to see). I've also made similar functions for other types of plot, as well as numerical/statistical summaries of the data, too.


#33

Thanks for sharing! That's very much what I imagine the typical use of tidyeval to be.


split this topic #34

3 posts were split to a new topic: provocative question: Will tidyeval kill the tidyverse?


#37

For writing functions that use dplyr/DBI interface:

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "mtcars", mtcars)

filterFun <- function(db, tab, col, val){                   
    col <- rlang::enquo(col)
    
    tbl(db, tab)  %>% 
        filter(!!col > val) %>% 
        collect
}

filterFun(db = con,
          tab = "mtcars",
          col = cyl,
          val = 4)

provocative question: Will tidyeval kill the tidyverse?
#38

I was working a Bayesian nonlinear model. I wanted to quickly plot samples from the prior predictive distribution, so I could see what kinds of curves were plausible (before the model saw the data).

I wrote a function that would take sampling statements like asymptote = rbeta(n, 10, 20) where n is an another argument to the function. The nonstandard evaluation in this case capturing the sampling statement and plugging the value of n.

plot_priors <- function(x, n, asymptote = rbeta(n, 10, 20),
                        midpoint = rnorm(n, 0, 1),
                        scale = abs(rnorm(n, .1, .025))) {
  # ...
}

# example use
plot_priors(
  x = 0:54,
  n = 40,
  midpoint = rnorm(n, 2, 2),
  scale = rnorm(n, 10, 5))

I posted a demo of the function and plots on Rpubs.


#39

My most convoluted attempt is:

add_first_by_row <- function(df, cols, new_name, predicate) {
	   cond <- as_mapper(predicate)
	   condq <- enexpr(cond)
	   new_name <- enexpr(new_name) %>% as.character 
	   arg_names <- c(cols, "...") 
	   col_syms <- cols %>% syms
	   
	   args <- rep(0, times = length(cols)+1) %>% set_names(arg_names)
	 	 new_fun <- new_function(args, 
	 	 	           expr(c(!!!col_syms) %>% {.[(!!condq)(.)]} %>% .[1] )
	 	             )
	 	 
	 #	 return(new_fun)
	 	 df %>% mutate(!!new_name := pmap_dbl(df,new_fun))
}

which allows the user to find the a first value in the a row, from a selection of columns, which meets a predicate function (originally my colleage wanted a way to find the first non-NA value, but I figured you could generalise the predicate).

so, you could do

df = tibble(a = c("b", "d", "l", "m"), x = c(NA,NA, 1, 3), y = c(2,NA, 2, 3), z = c(1,2, 0, NA))
vars <- c("x", "y", "z")
df %>% add_first_by_row(vars, new_column, ~ . > 1)
# A tibble: 4 x 5
  a         x     y     z new_column
  <chr> <dbl> <dbl> <dbl>      <dbl>
1 b        NA     2     1         NA
2 d        NA    NA     2         NA
3 l         1     2     0          2
4 m         3     3    NA          3

I use tidy eval a lot, but I often wonder if it's really needed in my use cases or not. But then if you feel comfortable with it, why prefer "standard" eval to "non-standard"?


#40

Here are a couple we used during our Big Data class:

my_mean <- function(.data, x){
  x <- enquo(x)
  summarise(
    .data, 
    !! quo_name(x) := mean(!! x, na.rm = TRUE)
  )
}
de_select <- function(.data, ...){
  vars <- enquos(...)
  vars <- map(vars, ~ expr(- !! .x))
  select(
    .data,
    !!! vars
  )
}

#41

I think your function feels so complex because you're attacking the problem at the slightly wrong level. I think it's easier if you think about writing a vectorised function that you can use in concert with mutate():

library(tidyverse)

df <- tibble(
  a = c("b", "d", "l", "m"), 
  x = c(NA, NA, 1, 3), 
  y = c(2, NA, 2, 3), 
  z = c(1, 2, 0, NA)
)

first_match <- function(.f, ...) {
  tibble(...) %>% 
    transpose() %>% 
    map_dbl(~ purrr::detect(.x, .f) %||% NA)
}

df %>% mutate(new_col = first_match(~ . > 1, x, y, z))  
#> # A tibble: 4 x 5
#>   a         x     y     z new_col
#>   <chr> <dbl> <dbl> <dbl>   <dbl>
#> 1 b        NA     2     1       2
#> 2 d        NA    NA     2       2
#> 3 l         1     2     0       2
#> 4 m         3     3    NA       3

We're planning on thinking about ways for first_match() to have selection semantics so you could simplify further to first_match(~ . > 1, x:y)


#42

Ah, transpose, yes I hadn't thought of that. I knew there was a better name out there too, ha! Thanks for the feedback.


#43

One more...

Sometimes you might have rows in your data which are duplicated in some columns, but not others, and you might have a good reason for wanting to merge the un-duplicated rows. Then you could use this function:

concat_groups <- function(data, group_var, cat_var) {

	group_varq <- enquo(group_var)
	cat_varq <- enquo(cat_var)
	cat_var <- as.character(cat_varq) %>% .[2] %>% paste0("s")
	
	data %>% group_by(!! group_varq) %>%
		       summarise(!! cat_var := !! cat_varq %>% unique %>% str_c(collapse = "/"))

}

So, a practical example might be:

site_info <- tibble(site = c("A", "B", "B", "C"),
	                  category = c("school", "office", "shop", "field"))
site_info %>% concat_groups(site, category)
# A tibble: 3 x 2
  site  categorys  
  <chr> <chr>      
1 A     school     
2 B     office/shop
3 C     field

function name probably needs work..


#44

Last one from me, promise:

If you want to write a complex if_else statement, you can do it with map, quo and reduce. Eg.


library(tidyverse)

set.seed(123)

# define a plausible mapping. 
i = 1
numbers <- list()
pot <- seq(from = 1, to = (26*3), by = 1)
while (i <= 26) { 
	    numbers[[i]] <- sample(pot, 3)  
	    pot <- setdiff(pot, numbers %>% unlist)
	    i <- i + 1
	    }
 

mapping <- list(letters, numbers) %>%
		            transpose

# here's a tbl with only numbers
number_tbl <- tibble(number = sample(1:120, 10))

# this function creates the if_else statement for the mapping
build_ifelse <- function(mapping_list, col) { 
	 
	              col <- enquo(col)
	                 
	              mapping_list %>% map(~ list(quo(!!col %in% !!.[[2]]), .[[1]] ) ) %>%
                { c("other", .) } %>%
	              reduce(~ quo(if_else(!!(.y[[1]]), !!(.y[[2]]), !!.x )))           
	 
}

# use the function to create the expression
mapping_exp <- build_ifelse(mapping, number)

# use the expression in a mutate for create a new column from the mapping
number_tbl %>% mutate(letter = !!mapping_exp)
# # A tibble: 10 x 2
#    number letter
#     <int> <chr> 
#  1     43 x     
#  2     14 m     
#  3     29 l     
#  4     79 other 
#  5     49 k     
#  6     91 other 
#  7     12 o     
#  8     50 i     
#  9    111 other 
# 10    100 other

closed #45

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.