How do I concatenate columns whose column name matches certain string? (Conditional Concatenation)
I'm not sure what you mean by "concatenate columns", so I assume you have a data frame with a number of columns and that you want to create a new column that contains the concatenation of the values of a subset of the remaining columns that match given pattern (string):
## data are in the df data frame pattern = "This is the string to match" df$new_column = do.call(paste, df[,grepl(pattern, names(df))])
I do not mean to hijack the original thread, but for curiosity's sake: is there a tidyverse solution to this?
Say a way to create a new column via
dplyr::mutate() by applying
paste() to variables identified by
unite does the
grep could probably select the columns, but I haven't tried.
Thanks! I was hoping for a general solution (or at least one that would be capable of creating a total for numerical values) but yes,
unite does the
paste bit, and does it well (including a separator).
It will likely not do for computing a sum of numeric columns, but for the use case described it is spot on.
For the benefit of posterity a comparison of base R and tidyverse approaches:
library(tidyverse) animals <- data.frame(cats = letters[1:10], dogs = letters[1:10], catepillars = rev(letters[1:10]), stringsAsFactors = F) # base R solution animals$base <- do.call(paste, animals[,grepl("cat", names(animals))]) # dplyr solution animals <- animals %>% unite(dplyr, starts_with("cat"), remove = F, sep = " ") # just print the stuff... animals dplyr cats dogs catepillars base 1 a j a a j a j 2 b i b b i b i 3 c h c c h c h 4 d g d d g d g 5 e f e e f e f 6 f e f f e f e 7 g d g g d g d 8 h c h h c h c 9 i b i i b i b 10 j a j j a j a
For a sum it's even easier:
df$rowsums = rowSums(df[,grepl(pattern, names(df))
For the paste solution, if you want to specifiy the separator, just pass it as a parameter to paste, and wrap the parameters in c():
pattern = "This is the string to match" df$new_column = do.call(paste, c(df[,grepl(pattern, names(df))], sep="_"))
rowSums to the rescue!
library(tidyverse) animals <- data.frame(cats = 1:10, dogs = 1:10, catepillars = 10 * 1:10, stringsAsFactors = F) # base approach animals$base <- rowSums(animals[,grepl("cat", names(animals))]) # dplyr approach animals <- animals %>% mutate(dplyr = select(., starts_with("cat")) %>% rowSums()) # just print the stuff.... animals cats dogs catepillars base dplyr 1 1 1 10 11 11 2 2 2 20 22 22 3 3 3 30 33 33 4 4 4 40 44 44 5 5 5 50 55 55 6 6 6 60 66 66 7 7 7 70 77 77 8 8 8 80 88 88 9 9 9 90 99 99 10 10 10 100 110 110
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.