Widening dataframe

mhakanda · March 4, 2021, 5:45am

I have my df1, where A and B column has same uniqe entry. C and D column may have different entries as shown. I like to trasform it to df2, where C column is expanded to C1,C2,C3 and same as D column.
I try to use pivot_wider to make it wider, but seems it does not work well in this case. Any better way to do that using dplyr tools? Thanks in advance.

library(tibble)
df1<-tribble(
  ~A,~B,~C,~D,
  4,1,2,7,
  4,1,3,9,
  4,1,4,10
)
df1
#> # A tibble: 3 x 4
#>       A     B     C     D
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     4     1     2     7
#> 2     4     1     3     9
#> 3     4     1     4    10
# Transform df1 to df2
df2<-tribble(
  ~A,~B,~C1,~C2,~C3,~D1,~D2,~D3,
  4,1,2,3,4,7,9,10
)
df2
#> # A tibble: 1 x 8
#>       A     B    C1    C2    C3    D1    D2    D3
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4     1     2     3     4     7     9    10

mgirlich · March 4, 2021, 7:11am

With a little work before pivot_wider() actually works here pretty nicely

library(tibble)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

df1 <- tribble(
  ~A,~B,~C,~D,
  4,1,2,7,
  4,1,3,9,
  4,1,4,10
)

# it seems that the columns `A` and `B` are supposed to uniquely
# identify an observation
# but the following doesn't work as we want because everything goes into
# a single column `C` resp. `D`
df1 %>% 
  pivot_wider(
    id_cols = c(A, B),
    values_from = c(C, D),
    names_sep = ""
  )
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates

#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> # A tibble: 1 x 4
#>       A     B C         D        
#>   <dbl> <dbl> <list>    <list>   
#> 1     4     1 <dbl [3]> <dbl [3]>

df1 %>% 
  # so we need to add a column that helps us to create the names `C1`, `C2`, ...
  group_by(A, B) %>% 
  mutate(id = 1:n()) %>% 
  ungroup() %>% 
  pivot_wider(
    id_cols = c(A, B),
    names_from = id,
    values_from = c(C, D),
    names_sep = ""
  )
#> # A tibble: 1 x 8
#>       A     B    C1    C2    C3    D1    D2    D3
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4     1     2     3     4     7     9    10

^{Created on 2021-03-04 by the reprex package (v1.0.0)}

mhakanda · March 4, 2021, 7:21am

Thank you for the solution. It works perfectly what I want.
I noticed the following code also gives same result (the reason may be A and B column have unique elements).

df1 %>% 
  # so we need to add a column that helps us to create the names `C1`, `C2`, ...
  #group_by(A, B) %>% 
  mutate(id = 1:n()) %>% 
  #ungroup() %>% 
  pivot_wider(
    id_cols = c(A, B),
    names_from = id,
    values_from = c(C, D),
    names_sep = ""
  )
#> # A tibble: 1 x 8
#>       A     B    C1    C2    C3    D1    D2    D3
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     4     1     2     3     4     7     9    10

system · March 25, 2021, 7:21am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.