Custom Functions: Writing wrappers instead of variants for operating over vector or over a dataframe

emman · February 5, 2021, 10:20am

I'm curious what would be a "typical tidyverse approach" for writing custom functions when we have different types of data structures to apply those function upon.

For example, I've built my own function for converting values of TRUE and FALSE to 1 and 0.

library(magrittr)
#> Warning: package 'magrittr' was built under R version 4.0.3

convert_true_false_to_1_0 <- function(x) {
  
  gsub("^(?:TRUE)$", 1, x, ignore.case = TRUE) %>%
    gsub("^(?:FALSE)$", 0, ., ignore.case = TRUE)
}


set.seed(123)
my_vec <- sample(c(TRUE, FALSE, "true", "false"), 15, replace = TRUE)
my_vec
#>  [1] "true"  "true"  "true"  "FALSE" "true"  "FALSE" "FALSE" "FALSE" "true" 
#> [10] "TRUE"  "false" "FALSE" "FALSE" "TRUE"  "FALSE"

convert_true_false_to_1_0(my_vec)
#>  [1] "1" "1" "1" "0" "1" "0" "0" "0" "1" "1" "0" "0" "0" "1" "0"

^{Created on 2021-02-05 by the reprex package (v0.3.0)}

Right now , convert_true_false_to_1_0 is designed to operate over vectors. If I had wanted to make it work over columns in a data frame, I could've done either of the following options:

Use mutate(across((..., convert_true_false_to_1_0)) plainly; or
Write an additional variant for convert_true_false_to_1_0() that will be:

library(dplyr)

convert_true_false_to_1_0 <- function(x) {
  
  gsub("^(?:TRUE)$", 1, x, ignore.case = TRUE) %>%
    gsub("^(?:FALSE)$", 0, ., ignore.case = TRUE)
}


convert_true_false_to_1_0_over_df <- function(my_data, my_cols) {
  
  my_data %>%
    mutate(across({{ my_cols }}, convert_true_false_to_1_0))
  
}

set.seed(123)
matrix(sample(c(TRUE, FALSE, "true", "false"), 20, replace = TRUE), ncol = 5) %>%
  as.data.frame() %>%
  convert_true_false_to_1_0_over_df(my_data = ., my_cols = V1:V3)
#>   V1 V2 V3    V4    V5
#> 1  1  1  1 FALSE false
#> 2  1  0  1  TRUE  TRUE
#> 3  1  0  0 FALSE  true
#> 4  0  0  0  true  true

^{Created on 2021-02-05 by the reprex package (v0.3.0)}

Is there a third way? Just as an example I have in my mind, something in the spirit of "adverbs": an over_df() wrapper that will do over_df(convert_true_false_to_1_0, cols = ...). Is there a typical "tidyvers-ish" way to deal with such things?

EDIT

I think I should clarify that my motivation is to write cleaner and more readable code. This is why I prefer a wrapper/adverb than to use mutate(across(..., my_func)).

EDIT 2 (2021-02-17)

I have found some code that echoes my intention in writing "wrappers instead of variants" functions: This code creates a function that wraps any dplyr's join function to ignore upper/lower cases when marging dataframes: https://gist.github.com/jimhester/a060323a05b40c6ada34

Are there any guidelines or training for doing similar things (i.e., writing wrappers)?

nirgrahamuk · February 17, 2021, 10:25am

S3 method, means you can define different functions for different objects to process, see
S3 · Advanced R. (had.co.nz)

Generic functions and method dispatch

Method dispatch starts with a generic function that decides which specific method to dispatch to. Generic functions all have the same form: a call to UseMethod that specifies the generic name and the object to dispatch on. This means that generic functions are usually very simple, like mean :

 mean <- function (x, ...) {
   UseMethod("mean", x)
 }

Methods are ordinary functions that use a special naming convention: generic.class :

mean.numeric <- function(x, ...) sum(x) / length(x)
mean.data.frame <- function(x, ...) sapply(x, mean, ...)
mean.matrix <- function(x, ...) apply(x, 2, mean)

(These are somewhat simplified versions of the real code).

As you might guess from this example, UseMethod uses the class of x to figure out which method to call. If x had more than one class, e.g. c("foo","bar") , UseMethod would look for mean.foo and if not found, it would then look for mean.bar . As a final fallback, UseMethod will look for a default method, mean.default , and if that doesn’t exist it will raise an error. The same approach applies regardless of how many classes an object has ....

emman · February 17, 2021, 7:08pm

Thanks for replying! Would you mind framing your answer in the context of the example function I gave (convert_true_false_to_1_0() )? Because I'm not 100% sure I understand the link between my question and your answer.

Thanks!

nirgrahamuk · February 18, 2021, 12:17am

  require(dplyr)
  require(magrittr)

convert_true_false_to_1_0 <- function(x,...){
  UseMethod("convert_true_false_to_1_0",x)
}
convert_true_false_to_1_0.character <- function(x) {
  gsub("^(?:TRUE)$", 1, x, ignore.case = TRUE) %>%
    gsub("^(?:FALSE)$", 0, ., ignore.case = TRUE)
}

convert_true_false_to_1_0.data.frame <- function(x, my_cols) {
  x %>%
    mutate(across({{ my_cols }}, 
convert_true_false_to_1_0))
}


set.seed(123)
my_vec <- sample(c(TRUE, FALSE, "true", "false"),
                  15, replace = TRUE)
my_vec

set.seed(123)
my_df <- matrix(sample(c(TRUE, FALSE, "true", "false"), 
                           20, replace = TRUE), ncol = 5) %>%
                                              as.data.frame() 

my_df

convert_true_false_to_1_0(my_vec)

convert_true_false_to_1_0(x = my_df ,
                                 my_cols = V1:V3)

system · March 11, 2021, 12:17am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.