Reshape a dataset using gather()

Hello,

I'm working on a dataset that I want to plot.

# Here is the structure of my dataset:

   ID      var1     var2      t0     t1   ... t100      y_t0      y_t1  ...  y_t100
   A         10        blue       0      1        100         1%        3%    ...    20%
   B         9         green      0     1         100         2%        4%    ...    19%
   C         8         pink         0     1         100         1%        6%    ...    22%

Where ID, var1 and var2 are the variables I want to keep like that, and t0 to t100 are time (always the same) and y_t0 to y_t100 are the measures at that same time.

I want to ggplot (geom_point) Y_time function of time but as this isn't in row i need to reshape that dataset (I thought using gather but I don't really see how to do with this amount of variables).

Does anyone know how to handle such a problem?

Thanks in advance

Hi Martino,

gather() and spread() are obsolete as they are confusing everyone. I would suggest using tidyr's pivot_longer() and it will give you exactly what you need. You can check the code below:

library(dplyr)
library(tidyr)

df <- tibble(
  ID   = c("A", "B", "C"),
  var1 = c(10, 9, 8),
  var2 = c("blue", "green", "pink"),
  `t0` = c(0, 0, 0),
  `t1` = c(1, 1, 1),
  `t2` = c(2, 2, 2),
  `t100` = c(100, 100, 100)
)

df %>% 
  pivot_longer(`t0`:`t100`, names_to = "measurement_time", values_to = "measurement_value")

P.S. a better data example would help a lot so people will not spend time inputting the values manually. You can check more on how to do that at the link below:
Tidyverse

Hope his helps.

Vlad

1 Like

That's not quite right. They are merely superseded by by the pivot_*() functions and can still be used:
Lifecycle stages • lifecycle (r-lib.org)

As for the confusion this is entirely subjective. The new functions do offer more functionality though.

Agree, it was poor choice of words from my part.

If you're curious what an approach looks like using gather():

Note:

  • Thank you @vlad_aluas for the data.
  • key and value are the names of the new columns.
  • 1:4 indicate the selection of columns that are to be gathered into key-value pairs (from wide to long).
  • arrange is used to match the pivot_longer solution provided by @vlad_aluas.
library(dplyr)
library(tidyr)

df <- tibble(
  ID   = c("A", "B", "C"),
  var1 = c(10, 9, 8),
  var2 = c("blue", "green", "pink"),
  `t0` = c(0, 0, 0),
  `t1` = c(1, 1, 1),
  `t2` = c(2, 2, 2),
  `t100` = c(100, 100, 100)
)

df %>% 
  gather(key = 'measurement_time',
         value = 'measurement_value',
         4:7) %>% 
  arrange(ID)

Output

#> # A tibble: 12 x 5
#>    ID     var1 var2  measurement_time measurement_value
#>    <chr> <dbl> <chr> <chr>                        <dbl>
#>  1 A        10 blue  t0                               0
#>  2 A        10 blue  t1                               1
#>  3 A        10 blue  t2                               2
#>  4 A        10 blue  t100                           100
#>  5 B         9 green t0                               0
#>  6 B         9 green t1                               1
#>  7 B         9 green t2                               2
#>  8 B         9 green t100                           100
#>  9 C         8 pink  t0                               0
#> 10 C         8 pink  t1                               1
#> 11 C         8 pink  t2                               2
#> 12 C         8 pink  t100                           100

Created on 2021-03-19 by the reprex package (v0.3.0)

1 Like

Hello to all,

I see that a lot of intéresting answers have been written. I will test all your recommendations and tell which one I prefered.

a big thanks!!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.