Widening a dataframe by a set of columns,

Hello everyone

I have a problem that I think could be solved using pivot_wider, however I have not figured out how to do it. I have a dataset:

data <- data.frame(Person = c("Peter", "Peter", "Peter", "Peter", "Carol", "Carol", "Carol", "Carol"),
                   GroupID = c(1, 1, 2, 2, 3, 3, 4, 4),
                   Group = c("Scholar", "Scholar", "Zephyr", "Zephyr", "HOJ", "HOJ", "Colossus", "Colossus"),
                   Committee = c("Nature", "Transport", "Nature", "Transport", "State", "Transport", "State", "Transport"))

# Which looks like that:

   Person GroupID    Group Committee
1 Peter       1  Scholar    Nature
2 Peter       1  Scholar Transport
3 Peter       2   Zephyr    Nature
4 Peter       2   Zephyr Transport
5 Carol       3      HOJ     State
6 Carol       3      HOJ Transport
7 Carol       4 Colossus     State
8 Carol       4 Colossus Transport

I want to have one row per Person. For that, i need to widen the dataset by GroupID and group.

Note that the observations for "Committee" of a Person repeat for each Group. This is intended and is the case for every "Name" in the original dataset. Peter for example is in two committees and two groups.

The new dataset should as follows:

 Person GroupID_1 Group_1 GroupID_2  Group_2 Committee_1 Committee_2
1 Peter         1 Scholar         2   Zephyr      Nature   Transport
2 Carol         3     HOJ         4 Colossus       State   Transport

Any help is massively appreciated!
(I suppose it is likely that there is already a solution somewhere out there, but a big problem with being new to R and programming language in general is that I am missing the lingo to search for the right keywords and find already existing solutions.)

This is actually pretty tricky! I've pivoted each of GroupID, Group and Committee in turn via a function. I don't know if it would be possible to do it all in one step, but I bow to those with a better command of tidyr.

library(tidyverse)

data <- data.frame(Person = c("Peter", "Peter", "Peter", "Peter", "Carol", "Carol", "Carol", "Carol"),
                   GroupID = c(1, 1, 2, 2, 3, 3, 4, 4),
                   Group = c("Scholar", "Scholar", "Zephyr", "Zephyr", "HOJ", "HOJ", "Colossus", "Colossus"),
                   Committee = c("Nature", "Transport", "Nature", "Transport", "State", "Transport", "State", "Transport"))

widen = function(col, pre){
  
  data |> 
    select(Person, {{col}}) |> 
    distinct() |> 
    with_groups(Person, ~mutate(.x, n = row_number())) |> 
    pivot_wider(names_from = n, values_from = {{col}}, names_prefix = pre)
  
}

reduce(list(widen(Committee, "Committee_"),
            widen(GroupID, "GroupID_"),
            widen(Group, "Group_")), 
       left_join, by = "Person")
#> # A tibble: 2 x 7
#>   Person Committee_1 Committee_2 GroupID_1 GroupID_2 Group_1 Group_2 
#>   <chr>  <chr>       <chr>           <dbl>     <dbl> <chr>   <chr>   
#> 1 Peter  Nature      Transport           1         2 Scholar Zephyr  
#> 2 Carol  State       Transport           3         4 HOJ     Colossus

Created on 2021-12-15 by the reprex package (v2.0.1)

2 Likes

Fantastic, worked like a charm. Thank you so much Jack.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.