How do I concatenate columns whose column name matches certain string? (Conditional Concatenation)
I'm not sure what you mean by "concatenate columns", so I assume you have a data frame with a number of columns and that you want to create a new column that contains the concatenation of the values of a subset of the remaining columns that match given pattern (string):
## data are in the df data frame
pattern = "This is the string to match"
df$new_column = do.call(paste, df[,grepl(pattern, names(df))])
I do not mean to hijack the original thread, but for curiosity's sake: is there a tidyverse solution to this?
Say a way to create a new column via dplyr::mutate()
by applying paste()
to variables identified by tidyselect::starts_with()
?
unite
does the paste
bit:
A grep
could probably select the columns, but I haven't tried.
Thanks! I was hoping for a general solution (or at least one that would be capable of creating a total for numerical values) but yes, unite
does the paste
bit, and does it well (including a separator).
It will likely not do for computing a sum of numeric columns, but for the use case described it is spot on.
For the benefit of posterity a comparison of base R and tidyverse approaches:
library(tidyverse)
animals <- data.frame(cats = letters[1:10],
dogs = letters[1:10],
catepillars = rev(letters[1:10]),
stringsAsFactors = F)
# base R solution
animals$base <- do.call(paste, animals[,grepl("cat", names(animals))])
# dplyr solution
animals <- animals %>%
unite(dplyr, starts_with("cat"), remove = F, sep = " ")
# just print the stuff...
animals
dplyr cats dogs catepillars base
1 a j a a j a j
2 b i b b i b i
3 c h c c h c h
4 d g d d g d g
5 e f e e f e f
6 f e f f e f e
7 g d g g d g d
8 h c h h c h c
9 i b i i b i b
10 j a j j a j a
For a sum it's even easier:
df$rowsums = rowSums(df[,grepl(pattern, names(df))
For the paste solution, if you want to specifiy the separator, just pass it as a parameter to paste, and wrap the parameters in c():
pattern = "This is the string to match"
df$new_column = do.call(paste, c(df[,grepl(pattern, names(df))], sep="_"))
Thank you, rowSums
to the rescue!
library(tidyverse)
animals <- data.frame(cats = 1:10,
dogs = 1:10,
catepillars = 10 * 1:10,
stringsAsFactors = F)
# base approach
animals$base <- rowSums(animals[,grepl("cat", names(animals))])
# dplyr approach
animals <- animals %>%
mutate(dplyr = select(., starts_with("cat")) %>% rowSums())
# just print the stuff....
animals
cats dogs catepillars base dplyr
1 1 1 10 11 11
2 2 2 20 22 22
3 3 3 30 33 33
4 4 4 40 44 44
5 5 5 50 55 55
6 6 6 60 66 66
7 7 7 70 77 77
8 8 8 80 88 88
9 9 9 90 99 99
10 10 10 100 110 110
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.