How to use reshape() without a timevar? Can I automatically create one?

Hello, guys!

Currently I am looking for a way to use reshape() to make a long data frame with grouped data to a wide one. I only have the ID which groups the data from LR, but no variable which acts as a timevar inside the groups.

I'd like to use reshape() since it seems to be a simple function, but I still need a timevar to make it work. Is there a way to create one which gives the individual rows inside of the groups a number?

Or is there another approach?

The left one is how the data are right now and the right one is the kind of result I need.022

Hi,

Here is one implementation using tidyverse functions

library(dplyr)
library(tidyr)

set.seed(1) #for reproducibility

#Generate random data
myData = data.frame(
  ID = sample(1:5, 20, replace = T),
  LR = runif(20)
)

head(myData)
#>   ID        LR
#> 1  1 0.8696908
#> 2  4 0.3403490
#> 3  1 0.4820801
#> 4  2 0.5995658
#> 5  5 0.4935413
#> 6  3 0.1862176

#Transforms
myData = myData %>% group_by(ID) %>% mutate(timeVar = 1:n()) %>% 
  pivot_wider(id_cols = ID, names_from = timeVar, values_from = LR) %>% 
  arrange(ID)

head(myData)
#> # A tibble: 5 x 7
#> # Groups:   ID [5]
#>      ID   `1`    `2`    `3`    `4`    `5`     `6`
#>   <int> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
#> 1     1 0.870  0.482  0.108  0.783  0.789  0.0233
#> 2     2 0.600  0.827  0.821  0.647 NA     NA     
#> 3     3 0.186  0.668  0.794 NA     NA     NA     
#> 4     4 0.340 NA     NA     NA     NA     NA     
#> 5     5 0.494  0.724  0.411  0.553  0.530  0.477

Created on 2020-09-14 by the reprex package (v0.3.0)

In short, you need a timevar in order for these functions to work. In this case, I just created one by giving every LR with the same ID a unique number starting from 1 to the the max number of LR for that ID (that's the group_by and mutate functions). I then use pivot_wider (it's the latest tidyverse equivalent of reshape) to widen it into columns, generating NA where there are no values.

Beware that because we 'randomly' assign the timevar, the columns have no meaning themselves (e.g. the values in 1 could have been in 2 or 3, ... just by chance).

I don't know what you want to use this for, but I think this is what you wanted for now right?

Hope this helps,
PJ

2 Likes

Dear pieterjanvc,

this definitely helps. You have my thanks! :+1:t2:

EDIT: I have checked the columns and in my case the order of the values in the columns from left to right in the wide data frame is the same as from top to bottom in the groups of the long data frame. It worked out just as it was supposed to do.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.