I'm looking to improve the semantics of my code and answer a specific question regarding quoted variables as parameters to closure-like functions.
I've provided a reprex to demonstrate my problem and while the reprex works, I'm unsure how to scale the function to handle different use-cases.
library(tidyverse)
# A df of file-paths split so all basenames
# are in the same column, but parent-dirs
# are spread across an abritary number of columns
# and filled with NA's.
dat <- tibble(
ref01 = rep("analysis", 5),
ref02 = c(NA, NA, "next", "next", "next"),
ref03 = c(NA, NA, NA, NA, "last"),
target = c("analysis.test1", "analysis.test2",
"next.test3", "next.test4",
"last.test5")
)
# For example this reprex df shows file-paths
# from a file-tree that looks like:
# analysis
# ├── next
# │ ├── last
# │ │ └── last.test5
# │ ├── next.test3
# │ └── next.test4
# ├── analysis.test1
# └── analysis.test2
dat
#> # A tibble: 5 x 4
#> ref01 ref02 ref03 target
#> <chr> <chr> <chr> <chr>
#> 1 analysis <NA> <NA> analysis.test1
#> 2 analysis <NA> <NA> analysis.test2
#> 3 analysis next <NA> next.test3
#> 4 analysis next <NA> next.test4
#> 5 analysis next last last.test5
This function cleans up the 'target' test basenames.
All test-names are preceded by its' parent-dir name and a period.
(e.g. 'last.test5')
This function takes a "target" column and an arbitrary number of parent-dir columns. It reverses the list of parent-dirs and finds the first non-NA value. It then matches that value to the target value and removes it.
My question lies within this function:
-
Is there a more semantic way of building this function so that it can be expressed inside of a `mutate()' function?
-
Currently, the
replace_pattern()
function relies on the fact that the.key
column is titled "target" and is hardcoded as an input parameter.This is because of the way `pmap' works by taking p-num arguments from a list and matching arguments to names.
Since I want this function to work for arbitrarily deep file-paths, I need to find a way to handle varying
.key
names.Is there a way to quote
.key
variable so that it will be the name of the first parameter of thereplace_pattern()
function?
trim_target <- function(.tbl, .key, ...){
key <- tidyselect::eval_select(expr(c(!!enquo(.key))), .tbl)
loc <- tidyselect::eval_select(expr(c(...)), .tbl)
# First param has to be "target" since that's the name
# of the .key column.
replace_pattern <- function(target, ...){
args <- c(...)
pattern <- args %>%
rev() %>%
discard(is.na) %>%
first() %>%
paste0("\\.")
unlist(str_remove(target, pattern))
}
pmap(.tbl[,c(key, loc)], replace_pattern) %>%
unlist()
}
Expected Output:
This works as expected but is not scalable. Also in reference to question 01, I have to pass dat
into the mutate()
function-call; which I don't see typically done.
dat %>%
mutate(target = trim_target(dat, target, ref01:ref03))
#> # A tibble: 5 x 4
#> ref01 ref02 ref03 target
#> <chr> <chr> <chr> <chr>
#> 1 analysis <NA> <NA> test1
#> 2 analysis <NA> <NA> test2
#> 3 analysis next <NA> test3
#> 4 analysis next <NA> test4
#> 5 analysis next last test5
Created on 2020-04-08 by the reprex package (v0.3.0)