How do I set up interaction variables' names of all column combinations?

From below code, .fns = ~ .x * .x and .names = '{.col}_{.col}') are my concern.

Essentially, when this code creates interaction variables of all the numeric variables (squaring itself), I would like the new column name to be something along the lines of column1_column2, but I'm not sure how to call column2 part. It seems that {.col} can be called for column1. It seems that .x * .y will not work, so I don't know how it would do something like yearr * monthh, for example.

library(readr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(reprex)

df = read_csv('https://github.com/andrew-couch/Tidy-Tuesday/raw/master/Season%202/Data/example_retail_sales.csv', show_col_types = FALSE)

print(df)
#> # A tibble: 293 × 2
#>    ds             y
#>    <chr>      <dbl>
#>  1 1/1/1992  146376
#>  2 2/1/1992  147079
#>  3 3/1/1992  159336
#>  4 4/1/1992  163669
#>  5 5/1/1992  170068
#>  6 6/1/1992  168663
#>  7 7/1/1992  169890
#>  8 8/1/1992  170364
#>  9 9/1/1992  164617
#> 10 10/1/1992 173655
#> # … with 283 more rows

dff = df |>
  mutate(yearr = year(ds),
         monthh = month(ds),
         dayy = day(ds),
         quarterr = quarter(ds),
         semesterr = semester(ds),
         ydayy = yday(ds)) |>
  select(-ds)

dff |>
  mutate(across(.cols = where(is.numeric),
                .fns = ~ .x * .x,
                .names = '{.col}_{.col}'),
         across(.cols = where(is.numeric),
                .fns = sqrt,
                .names = '{.col}_sqrt')) |>
  select(-starts_with('y_'))
#> # A tibble: 293 × 25
#>         y yearr monthh  dayy quarterr semesterr ydayy yearr_yearr monthh_monthh
#>     <dbl> <dbl>  <dbl> <int>    <int>     <int> <dbl>       <dbl>         <dbl>
#>  1 146376     1      1    19        1         1    19           1             1
#>  2 147079     2      1    19        1         1    19           4             1
#>  3 159336     3      1    19        1         1    19           9             1
#>  4 163669     4      1    19        1         1    19          16             1
#>  5 170068     5      1    19        1         1    19          25             1
#>  6 168663     6      1    19        1         1    19          36             1
#>  7 169890     7      1    19        1         1    19          49             1
#>  8 170364     8      1    19        1         1    19          64             1
#>  9 164617     9      1    19        1         1    19          81             1
#> 10 173655    10      1    19        1         1    19         100             1
#> # … with 283 more rows, and 16 more variables: dayy_dayy <int>,
#> #   quarterr_quarterr <int>, semesterr_semesterr <int>, ydayy_ydayy <dbl>,
#> #   yearr_sqrt <dbl>, monthh_sqrt <dbl>, dayy_sqrt <dbl>, quarterr_sqrt <dbl>,
#> #   semesterr_sqrt <dbl>, ydayy_sqrt <dbl>, yearr_yearr_sqrt <dbl>,
#> #   monthh_monthh_sqrt <dbl>, dayy_dayy_sqrt <dbl>,
#> #   quarterr_quarterr_sqrt <dbl>, semesterr_semesterr_sqrt <dbl>,
#> #   ydayy_ydayy_sqrt <dbl>

Created on 2022-01-28 by the reprex package (v2.0.1)

library(tidyverse)


# your existing method to create variables that are just direct transformations
(non_pair_df <- iris %>% mutate(across(.cols = where(is.numeric),
              .fns = ~ .x * .x,
              .names = '{.col}_{.col}'),
       across(.cols = where(is.numeric),
              .fns = sqrt,
              .names = '{.col}_sqrt')) %>% tibble() )


# get the names to make interaction terms for 
(nms_1 <- names(iris %>% select_if(is.numeric)))
(pairnames <- combn(nms_1,2,simplify = FALSE))

paircalcfunc <- function(.data,x){
  col1 <- x[[1]] 
  col2 <- x[[2]] 
  newcolname <- paste0(col1,"_",col2)
  transmute(.data,
         {{newcolname}} := .data[[col1]]*.data[[col2]])
}

#a test
paircalcfunc(iris,c("Sepal.Length","Petal.Length"))

#use
(pairdf <- map_dfc(pairnames,
                  ~paircalcfunc(iris,.x)) %>% tibble())
#fin
(findf <- bind_cols(non_pair_df,pairdf))
1 Like

Ah, so there is nothing built-in within dplyr I guess
Bummer
Thank you

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.