Hello!
I am trying to use a purrr approach to make pairwise calculations with specific columns of a dataframe and I am wondering if it is a good idea in terms of speed and memory efficiency. The steps I have followed are:
-
Create a dataframe using grid.expand() that contains the columns names I want to use in each calculation.
pairwise_df <- expand.grid(columns1, columns1, stringsAsFactors = FALSE)
-
Define a function that takes 3 arguments: x, y and the dataframe with the data of interest.
do_something<- function(col1, col2, df) {
value ← mean(df[, col1] + df[, col1])
return(value)
}
- Use mutate() and map2() to add a new column to pairwise_df with the results of the calculation
pairwise_df <- pairwise_df %>%
mutate(calculation = map2_dbl(.x = Var1, .y = Var2, .f = do_something ,
df = data))
The previews code is just an example to illustrate the idea. What I am wondering is if for each iteration the dataframe is been copied and therefore being an inefficient strategy.
Thanks a lot!