Use quoted parameter as variable name for closure instantiation?

djmcdowell · April 9, 2020, 6:29pm

I'm looking to improve the semantics of my code and answer a specific question regarding quoted variables as parameters to closure-like functions.

I've provided a reprex to demonstrate my problem and while the reprex works, I'm unsure how to scale the function to handle different use-cases.

library(tidyverse)

# A df of file-paths split so all basenames
# are in the same column, but parent-dirs
# are spread across an abritary number of columns
# and filled with NA's.
dat <- tibble(
  ref01 = rep("analysis", 5),
  ref02 = c(NA, NA, "next", "next", "next"),
  ref03 = c(NA, NA, NA, NA, "last"),
  target = c("analysis.test1", "analysis.test2",
             "next.test3", "next.test4",
             "last.test5")
)

# For example this reprex df shows file-paths
# from a file-tree that looks like:
# analysis
# ├── next
# │   ├── last
# │   │   └── last.test5
# │   ├── next.test3
# │   └── next.test4
# ├── analysis.test1
# └── analysis.test2
dat
#> # A tibble: 5 x 4
#>   ref01    ref02 ref03 target        
#>   <chr>    <chr> <chr> <chr>         
#> 1 analysis <NA>  <NA>  analysis.test1
#> 2 analysis <NA>  <NA>  analysis.test2
#> 3 analysis next  <NA>  next.test3    
#> 4 analysis next  <NA>  next.test4    
#> 5 analysis next  last  last.test5

This function cleans up the 'target' test basenames.
All test-names are preceded by its' parent-dir name and a period.
(e.g. 'last.test5')

This function takes a "target" column and an arbitrary number of parent-dir columns. It reverses the list of parent-dirs and finds the first non-NA value. It then matches that value to the target value and removes it.

My question lies within this function:

Is there a more semantic way of building this function so that it can be expressed inside of a `mutate()' function?
Currently, the replace_pattern() function relies on the fact that the .key column is titled "target" and is hardcoded as an input parameter.

This is because of the way `pmap' works by taking p-num arguments from a list and matching arguments to names.

Since I want this function to work for arbitrarily deep file-paths, I need to find a way to handle varying .key names.

Is there a way to quote .key variable so that it will be the name of the first parameter of the replace_pattern() function?

trim_target <- function(.tbl, .key, ...){
  key <- tidyselect::eval_select(expr(c(!!enquo(.key))), .tbl)
  loc <- tidyselect::eval_select(expr(c(...)), .tbl)

  # First param has to be "target" since that's the name
  # of the .key column.
  replace_pattern <- function(target, ...){
    args <- c(...)
    pattern <- args %>% 
      rev() %>% 
      discard(is.na) %>% 
      first() %>% 
      paste0("\\.")
    
    unlist(str_remove(target, pattern))
  }
  
  pmap(.tbl[,c(key, loc)], replace_pattern) %>% 
    unlist()
}

Expected Output:

This works as expected but is not scalable. Also in reference to question 01, I have to pass dat into the mutate() function-call; which I don't see typically done.

dat %>% 
  mutate(target = trim_target(dat, target, ref01:ref03))
#> # A tibble: 5 x 4
#>   ref01    ref02 ref03 target
#>   <chr>    <chr> <chr> <chr> 
#> 1 analysis <NA>  <NA>  test1 
#> 2 analysis <NA>  <NA>  test2 
#> 3 analysis next  <NA>  test3 
#> 4 analysis next  <NA>  test4 
#> 5 analysis next  last  test5

^{Created on 2020-04-08 by the reprex package (v0.3.0)}

lionel · April 16, 2020, 11:36am

You can get the name with names(.tbl)[key].

But I would take another direction than changing dynamically the argument names of the replace_pattern() function. I would just pull the key column in a vector, then your replace_pattern() internal closure can refer to it directly.

nirgrahamuk · April 17, 2020, 10:10am

Hello, apologies for my not understanding, but is it possible that your example data 'dat' doesnt fully express the complexity of the challenge you are writing code to overcome ?
The transformation to go from 'dat' to your outcome seems much more trivial than the approach you show, as I don't see that intervening ref01:ref03 objects have any bearing to what happens to target, which seems to just want to throw away everything before the final fullstop.

dat %>%
  rowwise() %>%
  mutate(
    target = tail(unlist(stringr::str_split(
      string = target,
      pattern = "\\."
    )), 1)
  ) %>%
  ungroup() -> alternative

system · May 8, 2020, 10:10am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.