Wide to Long format with regex and multiple value columns in tidyr

hru2294 · June 30, 2021, 12:22am

I'm trying to convert data from wide to long form with tidyr, but also open to other options. This is fake dataset with repeating values but it has the same structure as real dataset


    structure(list(Category = c("Pre", "Pre", "Pre", "post_med_1", 
    "post_med_1", "post_med_1", "post_med_2", "post_med_2", "post_med_2"
    ), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Subj_1_tox = c(4.2, 
    5, 2.3, 4.2, 5, 2.3, 4.2, 5, 2.3), Subj_2_tox = c(23L, 1L, 4L, 
    23L, 1L, 4L, 23L, 1L, 4L), Subj_3_tox = c(6, 4.9, 3.2, 6, 4.9, 
    3.2, 6, 4.9, 3.2), Subj_1_a1 = c(4.2, 5, 2.3, 4.2, 5, 2.3, 4.2, 
    5, 2.3), Subj_2_a1 = c(23L, 1L, 4L, 23L, 1L, 4L, 23L, 1L, 4L), 
        Subj_3_a1 = c(6, 4.9, 3.2, 6, 4.9, 3.2, 6, 4.9, 3.2)), class = "data.frame", row.names = c(NA, 
    -9L))

The confusing part for me is how in one call could I convert the tox columns and the a1 columns to long form (with each having their own tox and a1 columns with values inside) and maintain the category and time columns. First off is the regex for the name pattern. I have looked up regex patterns, but not clear how to get it and second how to include 2 different value columns in 1 call?

Basically something like this in one call

    df_longer<-df %>% 
      pivot_longer(
        cols=contains("tox") & contains("a1"), 
        names_to = c("subject", "tox", "a1"), 
        names_pattern = "(Subj_['all_numbers') (tox and a1) "
        values_to = c("tox_value", "a1_value"))

With the end result being the Subject(#) being in one column called subject and the tox values and the a1 values being in other columns. Is it possible to do this in one call? I'm also open to other solutions but am trying to learn tidyr more

The final result should look something like this, but values are not right in this one but the other parts are accurate.

FJCC · June 30, 2021, 3:03am

Is this what you are after?

library(tidyr)
DF <- structure(list(Category = c("Pre", "Pre", "Pre", "post_med_1", 
                                  "post_med_1", "post_med_1", "post_med_2", "post_med_2", "post_med_2"), 
                     Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), 
                     Subj_1_tox = c(4.2,5, 2.3, 4.2, 5, 2.3, 4.2, 5, 2.3), 
                     Subj_2_tox = c(23L, 1L, 4L, 23L, 1L, 4L, 23L, 1L, 4L), 
                     Subj_3_tox = c(6, 4.9, 3.2, 6, 4.9,3.2, 6, 4.9, 3.2), 
                     Subj_1_a1 = c(4.2, 5, 2.3, 4.2, 5, 2.3, 4.2, 5, 2.3), 
                     Subj_2_a1 = c(23L, 1L, 4L, 23L, 1L, 4L, 23L, 1L, 4L), 
                     Subj_3_a1 = c(6, 4.9, 3.2, 6, 4.9, 3.2, 6, 4.9, 3.2)), 
                class = "data.frame", row.names = c(NA, -9L))
DF_Long <- pivot_longer(DF,cols = Subj_1_tox:Subj_3_a1,
             names_pattern = "(Subj_\\d)_(tox|a1)",names_to = c("Subject",".value"))
DF_Long
#> # A tibble: 27 x 5
#>    Category    Time Subject   tox    a1
#>    <chr>      <int> <chr>   <dbl> <dbl>
#>  1 Pre            1 Subj_1    4.2   4.2
#>  2 Pre            1 Subj_2   23    23  
#>  3 Pre            1 Subj_3    6     6  
#>  4 Pre            2 Subj_1    5     5  
#>  5 Pre            2 Subj_2    1     1  
#>  6 Pre            2 Subj_3    4.9   4.9
#>  7 Pre            3 Subj_1    2.3   2.3
#>  8 Pre            3 Subj_2    4     4  
#>  9 Pre            3 Subj_3    3.2   3.2
#> 10 post_med_1     1 Subj_1    4.2   4.2
#> # ... with 17 more rows

^{Created on 2021-06-29 by the reprex package (v0.3.0)}

system · July 21, 2021, 3:03am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.