Hi folks!
Short version
After checking out the colwise and grouping vignettes, I still have no idea how to perform a group_by all columns except two and then summarize those two columns into one.
Input:
tribble(
~id, ~q1, ~q2, ~q3, ~v3, ~q4,
0, "a", "b", "x", 1, "c",
0, "a", "b", "y", 2, "c",
0, "a", "b", "z", 3, "c"
)
Desired output:
tribble(
~id, ~q1, ~q2, ~qv3, ~q4,
0, "a", "b", list("x" = 1, "y" = 2, "z" = 3), "c"
)
All the documentation keeps pointing me to using across() inside summarise() and I looked into using {tidyselect} with those and at least that way I can group_by(id) and then use across(!contains("3"), head, n = 1L) to avoid grouping by all the other columns except q3 and v3, but it doesn't look like I can use a two-parameter function that would operate on q3 an v3. Maybe I'm missing something…
Also as a rule of thumb I'm trying to stay away from any superseded functions for future-proofing and let's say manually grouping by isn't an option because I'm dealing with 20+ columns with repeated values in the rows and just need to collapse rows across two columns into 1 cell.
Any help would be greatly appreciated!!!
Long version
Context/background: I'm working with survey responses from Google Forms and one of the questions was a check matrix/grid, and the way Google turns that into a spreadsheet is by making each row in the grid a column in the CSV and then the values the user selected become a concatenated list of values in the cell. For example:
| package |
use case 1 |
use case 2 |
use case 3 |
| dplyr |
x |
|
x |
| tidyr |
x |
x |
|
The CSV of survey responses would then have
| package_dplyr |
package_tidyr |
| use case 1; use case 3 |
use case 1; use case 2 |
So I used tidyr::pivot_longer() to collect those columns together:
... %>%
pivot_longer(
cols = starts_with("package_"),
names_to = "package",
values_to = "use_cases",
names_prefix = "package_"
)
But that makes
| package |
use_cases |
| dplyr |
use case 1; use case 3 |
| tidyr |
use case 1; use case 2 |
And all the other questions & responses turn into repeated rows, and I want to have 1 row per survey responder. So what I'm asking about is summarise()-ing (with the intention to then dplyr::mutate() & purrr::map()) those into:
| package_use_cases |
list(dplyr = c(1, 3), tidyr = c(1, 2)) |
Thank you!!!