How do I concatenate columns whose column name matches certain string? (Conditional Concatenation)

How do I concatenate columns whose column name matches certain string? (Conditional Concatenation)

I'm not sure what you mean by "concatenate columns", so I assume you have a data frame with a number of columns and that you want to create a new column that contains the concatenation of the values of a subset of the remaining columns that match given pattern (string):

## data are in the df data frame
pattern = "This is the string to match"
df$new_column = do.call(paste, df[,grepl(pattern, names(df))])

I do not mean to hijack the original thread, but for curiosity's sake: is there a tidyverse solution to this?

Say a way to create a new column via dplyr::mutate() by applying paste() to variables identified by tidyselect::starts_with()?

1 Like

unite does the paste bit:

A grep could probably select the columns, but I haven't tried.

1 Like

Thanks! I was hoping for a general solution (or at least one that would be capable of creating a total for numerical values) but yes, unite does the paste bit, and does it well (including a separator).
It will likely not do for computing a sum of numeric columns, but for the use case described it is spot on.

For the benefit of posterity a comparison of base R and tidyverse approaches:

library(tidyverse)

animals <- data.frame(cats = letters[1:10],
                      dogs = letters[1:10],
                      catepillars = rev(letters[1:10]),
                      stringsAsFactors = F)

# base R solution
animals$base <- do.call(paste, animals[,grepl("cat", names(animals))])

# dplyr solution
animals <- animals %>% 
  unite(dplyr, starts_with("cat"), remove = F, sep = " ")

# just print the stuff...
animals

   dplyr cats dogs catepillars base
1    a j    a    a           j  a j
2    b i    b    b           i  b i
3    c h    c    c           h  c h
4    d g    d    d           g  d g
5    e f    e    e           f  e f
6    f e    f    f           e  f e
7    g d    g    g           d  g d
8    h c    h    h           c  h c
9    i b    i    i           b  i b
10   j a    j    j           a  j a

For a sum it's even easier:

df$rowsums = rowSums(df[,grepl(pattern, names(df))

For the paste solution, if you want to specifiy the separator, just pass it as a parameter to paste, and wrap the parameters in c():

pattern = "This is the string to match"
df$new_column = do.call(paste, c(df[,grepl(pattern, names(df))], sep="_"))
2 Likes

Thank you, rowSums to the rescue!

library(tidyverse)

animals <- data.frame(cats = 1:10,
                      dogs = 1:10,
                      catepillars = 10 * 1:10,
                      stringsAsFactors = F)

# base approach
animals$base <- rowSums(animals[,grepl("cat", names(animals))])

# dplyr approach
animals <- animals %>% 
  mutate(dplyr = select(., starts_with("cat")) %>% rowSums())

# just print the stuff....
animals
   cats dogs catepillars base dplyr
1     1    1          10   11    11
2     2    2          20   22    22
3     3    3          30   33    33
4     4    4          40   44    44
5     5    5          50   55    55
6     6    6          60   66    66
7     7    7          70   77    77
8     8    8          80   88    88
9     9    9          90   99    99
10   10   10         100  110   110

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.