Hi everyone. I have a question about something that seems like it should be simple, but that I can't seem to wrap my head around.
Say I have a tibble, my_tibble
, containing columns that represent the scores a group of people received on 5 items. The columns are named item_1_score
, item_2_score
, and so on. Together they comprise a rating scale. I want to use mutate
to derive a summary_score
column by adding those five item-level scores together.
A straightforward way of doing it would be to try my_tibble %>% mutate(summary_score = item_1_score + item_2_score + item_3_score + item_4_score + item_5_score)
. But the repetition here is a bit silly, especially since the real dataframe I'm trying to do this with has 21 of these scores, not 5.
Another way of doing it might be to try to match the columns I'm interested in with a regex-based approach. However, the dataset also includes columns with names like item_1b_score
, which I'm not interested in. Regexes have always felt like black magic to me, and although I'm sure it's possible, I have no idea how to write one that matches item_1_score
but not item_1b_score.
So I thought, what if I programmatically generate strings representing the column names I'm interested in, and then get mutate
to recognize those strings as variable names? It turns out that getting R to recognize strings as variable names seems to involve black magic of a different sort. I went down a rabbit hole of trying to understand !!
, {{
, expr
, sym
and so on, which didn't go anywhere.
Finally, in frustration, I built a string that contains the literal argument to mutate
that I want, and then used eval
and parse
to read it.
This approach technically works, but its ugliness bothers me. I know there has to be a simple, elegant, idiomatic way to do what I want that doesn't involve parsing strings, but I just can't seem to figure out what it might be.
Any help would be appreciated.
Here's a reprex that includes my ugly non-solution:
library(tidyverse)
set.seed(1)
my_tibble <- tibble(
subject_id = c(1:20),
item_1_score = sample.int(10, 20, replace=TRUE),
item_1b_score = sample.int(10, 20, replace=TRUE),
item_2_score = sample.int(10, 20, replace=TRUE),
item_3_score = sample.int(10 ,20, replace=TRUE),
item_4_score = sample.int(10, 20, replace=TRUE),
item_5_score = sample.int(10, 20, replace=TRUE),
)
construct_column_name <- function(item_number) {
return(paste0('item_', item_number, '_score'))
}
my_string <-
1:4 %>%
map(construct_column_name) %>%
paste0(' + ') %>%
paste0(collapse='') %>%
paste0(construct_column_name(5))
my_tibble <- my_tibble %>%
mutate(summary_score = eval(parse(text=my_string)))
Created on 2019-09-28 by the reprex package (v0.3.0)