Create a new column from a vector of existing column names

Hello everyone! Thank you in advance for taking the time to read and help with this :slight_smile: I have a somewhat esoteric problem that I've somewhat solved, but I am positive there is a better way to do it...

I want to write a function that takes any existing columns in a data.frame/tibble and combines them into a new column, separated by -

I have the following data frame:

library(dplyr)
DFX <- data.frame(a = rep(LETTERS[1:5], 2),
                  b = sample(LETTERS[1:2], 10, replace = TRUE),
                  c = sample(LETTERS[1:5], 10, replace = TRUE),
                  x = 1:10,
                  y = rep(seq(2, 10, by = 2), 2),
                  z = sample(1:10, 10, replace = FALSE))
DFX

The function I've written can handle up to 3 columns (which I determine using some basic control flow), but I am wondering if there is another handy combination of .data[[cols]] and maybe stringr or glue?

create_new_column <- function(data, cols) {
    
     if (length(cols) == 3) {
        
      return_data <- mutate(data, 
                           new_col = paste(
                               .data[[cols[1]]], .data[[cols[2]]], 
                               .data[[cols[3]]], sep = "-")) 
      
    } else if (length(cols) == 2) {
        
     return_data <- mutate(data, 
                           new_col = paste(
                               .data[[cols[1]]], .data[[cols[2]]], sep = "-"))  
     
    } else {
        
     return_data <- mutate(data, 
                           new_col = paste(.data[[cols[1]]], sep = "-")) 
     
    }
    
    return(return_data)
    
}
# test
create_new_column(data = DFX, cols = c("b", "c"))
create_new_column(data = DFX, cols = c("b", "c" , "y"))

Thanks again in advance!

I though of pmap() but there may be a better way.

DFX <- data.frame(a = rep(LETTERS[1:5], 2),
                  b = sample(LETTERS[1:2], 10, replace = TRUE),
                  c = sample(LETTERS[1:5], 10, replace = TRUE),
                  x = 1:10,
                  y = rep(seq(2, 10, by = 2), 2),
                  z = sample(1:10, 10, replace = FALSE))
DFX
#>    a b c  x  y  z
#> 1  A B E  1  2 10
#> 2  B B E  2  4  2
#> 3  C B D  3  6  3
#> 4  D A E  4  8  5
#> 5  E B C  5 10  1
#> 6  A A D  6  2  7
#> 7  B A E  7  4  6
#> 8  C B E  8  6  4
#> 9  D A E  9  8  8
#> 10 E A C 10 10  9

library(purrr)
new_col <- function(DF,Nms){
  tmp <- DF[,Nms]
  DF$new_col <- pmap_chr(tmp,paste,sep="-")
  return(DF)
}

new_col(DFX,c("b","c","y","z"))
#>    a b c  x  y  z  new_col
#> 1  A B E  1  2 10 B-E-2-10
#> 2  B B E  2  4  2  B-E-4-2
#> 3  C B D  3  6  3  B-D-6-3
#> 4  D A E  4  8  5  A-E-8-5
#> 5  E B C  5 10  1 B-C-10-1
#> 6  A A D  6  2  7  A-D-2-7
#> 7  B A E  7  4  6  A-E-4-6
#> 8  C B E  8  6  4  B-E-6-4
#> 9  D A E  9  8  8  A-E-8-8
#> 10 E A C 10 10  9 A-C-10-9

Created on 2021-12-12 by the reprex package (v2.0.1)

1 Like

This is better. It doesn't fail if only one column is named.

library(purrr)
library(dplyr)
new_col <- function(DF,Nms){
  tmp <- select(DF,Nms)
  DF$new_col <- pmap_chr(tmp,paste,sep="-")
  return(DF)
}
1 Like

Thank you! I always forget about the pmap_ functions (but I won't now👌)

Cheers!

Maybe I'm missing something but I don't see the need for iteration or control structures

library(tidyverse)

DFX <- data.frame(a = rep(LETTERS[1:5], 2),
                  b = sample(LETTERS[1:2], 10, replace = TRUE),
                  c = sample(LETTERS[1:5], 10, replace = TRUE),
                  x = 1:10,
                  y = rep(seq(2, 10, by = 2), 2),
                  z = sample(1:10, 10, replace = FALSE))

create_new_column <- function(data, cols) {
    data %>% 
        unite(new_col, {{cols}}, remove = FALSE, sep = "-") %>% 
        relocate(new_col, everything())
}

create_new_column(DFX, cols = c("b","c","y","z"))
#>      new_col a b c  x  y  z
#> 1    B-B-2-6 A B B  1  2  6
#> 2    B-A-4-9 B B A  2  4  9
#> 3    A-B-6-5 C A B  3  6  5
#> 4    B-A-8-8 D B A  4  8  8
#> 5   B-D-10-7 E B D  5 10  7
#> 6    A-E-2-1 A A E  6  2  1
#> 7    B-B-4-4 B B B  7  4  4
#> 8    A-D-6-3 C A D  8  6  3
#> 9    A-C-8-2 D A C  9  8  2
#> 10 B-E-10-10 E B E 10 10 10

Created on 2021-12-13 by the reprex package (v2.0.1)

1 Like

Thank you @andresrcs ! You're absolutely right--this also works. I ended up needing a slightly different implementation (the function is going into a shiny app, and the users want to be able to specify the name of new joining variable).

This is what I am currently using:

create_join_column <- function(df, join_cols, by_col_name) {
  # select join_cols
  tmp <- select(df, all_of(join_cols))
  # rename original data 
  join_col_data <- df
  # assign new col with pmap_chr
  join_col_data$new_col <- pmap_chr(.l = tmp, .f = paste, sep = "-")
  # rename 
  names(join_col_data)[names(join_col_data) == "new_col"] <- by_col_name
  # relocate
  join_col_data <- relocate(join_col_data, all_of(by_col_name))
  # return
  return(join_col_data)
}

It's working for the moment, but now I'm curious how to use unite() and give a new column name:

library(tidyverse)
DFX <- data.frame(a = rep(LETTERS[1:5], 2),
                  b = sample(LETTERS[1:2], 10, replace = TRUE),
                  c = sample(LETTERS[1:5], 10, replace = TRUE),
                  x = 1:10,
                  y = rep(seq(2, 10, by = 2), 2),
                  z = sample(1:10, 10, replace = FALSE))

create_new_column <- function(data, cols, new_name) {
    data %>% 
        unite(new_name, {{cols}}, remove = FALSE, sep = "-") %>% 
        relocate(new_name, everything())
}

create_new_column(data = DFX, cols = c('a', 'b'), new_name = "joincol")
#>    new_name a b c  x  y  z <- 'new_name' should be 'joincol'
#> 1       A-B A B B  1  2 10
#> 2       B-B B B E  2  4  2
#> 3       C-B C B C  3  6  5
#> 4       D-B D B B  4  8  9
#> 5       E-B E B D  5 10  6
#> 6       A-B A B D  6  2  4
#> 7       B-B B B C  7  4  3
#> 8       C-A C A C  8  6  8
#> 9       D-A D A C  9  8  7
#> 10      E-B E B D 10 10  1

unite() has that strange 'quasiquotation (you can unquote strings and symbols)' for the col argument, and I am at a loss for how to get this to work (I thought maybe any_of(), but no luck).

Thank you so much for the other answer (and don't feel obligated to respond--you've helped a ton!)

Cheers!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Sorry for the late answer, most likely you have already figured it out but I'm letting my answer in case is useful for somebody else coming across this topic

library(tidyverse)

DFX <- data.frame(a = rep(LETTERS[1:5], 2),
                  b = sample(LETTERS[1:2], 10, replace = TRUE),
                  c = sample(LETTERS[1:5], 10, replace = TRUE),
                  x = 1:10,
                  y = rep(seq(2, 10, by = 2), 2),
                  z = sample(1:10, 10, replace = FALSE))

create_new_column <- function(data, cols, new_name) {
    data %>% 
        unite({{new_name}}, {{cols}}, remove = FALSE, sep = "-") %>% 
        relocate({{new_name}}, everything())
}

create_new_column(DFX, cols = c("b","c","y","z"), new_name = "joincol")
#>      joincol a b c  x  y  z
#> 1    B-E-2-5 A B E  1  2  5
#> 2    A-A-4-3 B A A  2  4  3
#> 3    B-B-6-1 C B B  3  6  1
#> 4    B-C-8-6 D B C  4  8  6
#> 5   A-B-10-4 E A B  5 10  4
#> 6    B-C-2-7 A B C  6  2  7
#> 7    B-E-4-9 B B E  7  4  9
#> 8    B-C-6-2 C B C  8  6  2
#> 9    A-A-8-8 D A A  9  8  8
#> 10 B-B-10-10 E B B 10 10 10

Created on 2021-12-24 by the reprex package (v2.0.1)

1 Like