Hi folks!
Short version
After checking out the colwise and grouping vignettes, I still have no idea how to perform a group_by
all columns except two and then summarize those two columns into one.
Input:
tribble(
~id, ~q1, ~q2, ~q3, ~v3, ~q4,
0, "a", "b", "x", 1, "c",
0, "a", "b", "y", 2, "c",
0, "a", "b", "z", 3, "c"
)
Desired output:
tribble(
~id, ~q1, ~q2, ~qv3, ~q4,
0, "a", "b", list("x" = 1, "y" = 2, "z" = 3), "c"
)
All the documentation keeps pointing me to using across()
inside summarise()
and I looked into using {tidyselect} with those and at least that way I can group_by(id)
and then use across(!contains("3"), head, n = 1L)
to avoid grouping by all the other columns except q3 and v3, but it doesn't look like I can use a two-parameter function that would operate on q3 an v3. Maybe I'm missing something…
Also as a rule of thumb I'm trying to stay away from any superseded functions for future-proofing and let's say manually grouping by isn't an option because I'm dealing with 20+ columns with repeated values in the rows and just need to collapse rows across two columns into 1 cell.
Any help would be greatly appreciated!!!
Long version
Context/background: I'm working with survey responses from Google Forms and one of the questions was a check matrix/grid, and the way Google turns that into a spreadsheet is by making each row in the grid a column in the CSV and then the values the user selected become a concatenated list of values in the cell. For example:
package | use case 1 | use case 2 | use case 3 |
---|---|---|---|
dplyr | x | x | |
tidyr | x | x |
The CSV of survey responses would then have
package_dplyr | package_tidyr |
---|---|
use case 1; use case 3 | use case 1; use case 2 |
So I used tidyr::pivot_longer()
to collect those columns together:
... %>%
pivot_longer(
cols = starts_with("package_"),
names_to = "package",
values_to = "use_cases",
names_prefix = "package_"
)
But that makes
package | use_cases |
---|---|
dplyr | use case 1; use case 3 |
tidyr | use case 1; use case 2 |
And all the other questions & responses turn into repeated rows, and I want to have 1 row per survey responder. So what I'm asking about is summarise()
-ing (with the intention to then dplyr::mutate()
& purrr::map()
) those into:
package_use_cases |
---|
list(dplyr = c(1, 3), tidyr = c(1, 2)) |
Thank you!!!