how to efficiently go from long to wide with multiple variables?

Hello!

I know this has been asked several times, but I wonder if there is a tidyverse compliant function to spread multiple columns efficiently when the number of variables to spread out is not fixed. I am asking this because I heard there is a new pivot_wide() function in tidyr but I was not able to use it correctly.

Consider this simple example:

tibble(mycode = c('A','A','A','B','B'),
       mytime = c(ymd('2018-01-02',
                      '2018-01-03',
                      '2018-01-04',
                      '2018-01-02',
                      '2018-01-05')),
       var1 = c(1,2,3,4,5),
       othervar = c(10,20,30,10,20)) 

# A tibble: 5 x 4
  mycode mytime      var1 othervar
  <chr>  <date>     <dbl>    <dbl>
1 A      2018-01-02     1       10
2 A      2018-01-03     2       20
3 A      2018-01-04     3       30
4 B      2018-01-02     4       10
5 B      2018-01-05     5       20

This is in long format. I would like to end up with a wide format where the time is the index. Of course, the difficulty is that I do not know in advance how many columns I need to spread (here, only 2: var1 and othervar).

The ideal output would be the following (first two rows):

# A tibble: 2 x 5
  mytime     var1_A othervar_A var1_B othervar_B
  <date>      <dbl>      <dbl>  <dbl>      <dbl>
1 2018-01-02      1         10      4         10
2 2018-01-03      2         20     NA         NA

How can I do that? Thanks!

Like this? Note that the pivot_wider() and pivot_longer() verbs are still in development, so you need to use the dev version of tidyr to use them.

This vignette is basically what I followed:

https://tidyr.tidyverse.org/dev/articles/pivot.html

suppressPackageStartupMessages(library(tidyverse))
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
df <- tibble(mycode = c('A','A','A','B','B'),
       mytime = c(ymd('2018-01-02',
                      '2018-01-03',
                      '2018-01-04',
                      '2018-01-02',
                      '2018-01-05')),
       var1 = c(1,2,3,4,5),
       othervar = c(10,20,30,10,20)) 

df %>%
  pivot_wider(names_from = mycode, values_from = c(var1, othervar))
#> # A tibble: 4 x 5
#>   mytime     var1_A var1_B othervar_A othervar_B
#>   <date>      <dbl>  <dbl>      <dbl>      <dbl>
#> 1 2018-01-02      1      4         10         10
#> 2 2018-01-03      2     NA         20         NA
#> 3 2018-01-04      3     NA         30         NA
#> 4 2018-01-05     NA      5         NA         20

Created on 2019-05-08 by the reprex package (v0.2.1)

2 Likes

ahhh thanks! I thought they were production-ready... sob sob

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.