Spread() with multiple `value` columns

joeystan · February 19, 2018, 3:13pm

I find myself needing to use spread on multiple value columns, as in this question here. In a nutshell, the original data frame is

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

and the top answer provides this tidyverse solution:

df %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  spread(temp, value)

#   month Amy_A Amy_B Bob_A Bob_B
# 1     1     9     6     8     5
# 2     2     7     7     6     6
# 3     3     6     8     9     7

Would it be possible to implement this as one function like this?

df %>%
    spread(key = student, value = c(A, B))

I have to use this a lot and it would be really handy if it were all combined into one.

danr · February 20, 2018, 3:35pm

You can do this by using quosures. I think that the reprex ( https://www.jessemaegan.com/post/so-you-ve-been-asked-to-make-a-reprex ) below has the function you want.


suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(month=rep(1:3,2),
                                 student=rep(c("Amy", "Bob"), each=3),
                                 A=c(9, 7, 6, 8, 6, 9),
                                 B=c(6, 7, 8, 5, 6, 7))

t1 <- df %>%
    gather(variable, value, -(month:student)) %>%
    unite(temp, student, variable) %>%
    spread(temp, value)

myspread <- function(df, key, value) {
    # quote key
    keyq <- rlang::enquo(key)
    # break value vector into quotes
    valueq <- rlang::enquo(value)
    s <- rlang::quos(!!valueq)
    df %>% gather(variable, value, !!!s) %>%
        unite(temp, !!keyq, variable) %>%
        spread(temp, value)
}

t2 <- df %>% myspread(student, c(A, B))

identical(t1, t2)
#> [1] TRUE

t2
#>   month Amy_A Amy_B Bob_A Bob_B
#> 1     1     9     6     8     5
#> 2     2     7     7     6     6
#> 3     3     6     8     9     7

Created on 2018-02-20 by the reprex package (v0.2.0).

joeystan · February 24, 2018, 10:08am

This is great. I coincidentally just watched Hadley Wickham's video on Tidy Evaluation this morning so this makes a lot more sense than it would have a week ago. I'll incorporate this into my code and probably call it spread_n or something since it works with more than just two columns for value. Looks like I've still got a ways to go to fully understand what's going on here, but this is a strong push in the right direction. Thank you.

JohnMount · February 24, 2018, 2:58pm

We have a tool for this sort of thing (with a diagrammatic theory) here. If you can draw a picture of what you want to happen (essentially the before or the after using column names) then you can perform the transform. In this case:

library("cdata")

# data
df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

# transform description
cT <- dplyr::tribble(
  ~student, ~A,      ~B,
  "Amy",    "Amy_A", "Amy_B",
  "Bob",    "Bob_A", "Bob_B"
)

# do the work
blocks_to_rowrecs(df, cT, keyColumns = "month")

And we are now testing some new more concise notations for specifying the control table (requires development version of cdata):

cT <- cdata::qchar_frame(
  student, A    , B     |
  Amy    , Amy_A, Amy_B |
  Bob    , Bob_A, Bob_B )

rensa · November 15, 2018, 1:38am

This is an ooooooold thread, but I'm facing the same problem and decided to nest() the multiple value columns to achieve the same effect. It's still not a one line spread, but I found it to be a more flexible solution for more complex gather/spread problems:

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.5.1
#> Warning: package 'dplyr' was built under R version 3.5.1

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

df %>%
  nest(A, B, .key = 'value_col') %>%
  spread(key = student, value = value_col) %>%
  unnest(Amy, Bob, .sep = '_')
#>   month Amy_A Amy_B Bob_A Bob_B
#> 1     1     9     6     8     5
#> 2     2     7     7     6     6
#> 3     3     6     8     9     7

Created on 2018-11-15 by the reprex package (v0.2.0).

Update: the upcoming tidyr::pivot_*() functions (I believe they'll likely be called pivot_longer() and pivot_wider()?) are going to make these sorts of problems much, much easier

apreshill · April 5, 2019, 3:08am

If you are stumbling on this thread, you may want to read through the pivot functions vignette: https://tidyr.tidyverse.org/dev/articles/pivot.html

Here are some very simple worked examples using the new pivot functions: https://github.com/apreshill/teachthat/tree/master/pivot

One example involves multiple columns, as framed in the original question here:

github.com

apreshill/teachthat/blob/master/pivot/pivot_multiples.md

pivot\_multiples.R
================
alison
2019-03-29

Use the development version of tidyr from GitHub:

``` r
#install.packages("devtools")
#devtools::install_github("tidyverse/tidyr")
```

``` r
library(dplyr)
library(tidyr)

juniors_multiple <- tribble(
  ~ "baker", ~"score_1", ~"score_2", ~"score_3", ~ "guess_1", ~"guess_2", ~"guess_3",
  "Emma", 1L,   0L, 1L, "cinnamon",   "cloves", "nutmeg",
  "Harry", 1L,   1L, 1L, "cinnamon",   "cardamom", "nutmeg",

This file has been truncated. show original

Use the development version of tidyr from GitHub:

#install.packages("devtools")
#devtools::install_github("tidyverse/tidyr")

library(dplyr)
library(tidyr)

juniors_multiple <- tribble(
  ~ "baker", ~"score_1", ~"score_2", ~"score_3", ~ "guess_1", ~"guess_2", ~"guess_3",
  "Emma", 1L,   0L, 1L, "cinnamon",   "cloves", "nutmeg",
  "Harry", 1L,   1L, 1L, "cinnamon",   "cardamom", "nutmeg",
  "Ruby", 1L,   0L, 1L, "cinnamon",   "cumin", "nutmeg",
  "Zainab", 0L, NA, 0L, "cardamom", NA_character_, "cinnamon"
)

juniors_multiple %>% 
  knitr::kable()

baker	score_1	score_2	score_3	guess_1	guess_2	guess_3
Emma	1	0	1	cinnamon	cloves	nutmeg
Harry	1	1	1	cinnamon	cardamom	nutmeg
Ruby	1	0	1	cinnamon	cumin	nutmeg
Zainab	0	NA	0	cardamom	NA	cinnamon

I want three total columns:

first is order (1/2/3) –> these are the numbers at end of my column names
second is score (0/1/NA)
third is guess (cinnamon/cloves/nutmeg/NA)

juniors_multiple %>% 
  tidyr::pivot_longer(-baker,
                      names_to = c(".value", "order"),
                      names_sep = "_"
  )
#> # A tibble: 12 x 4
#>    baker  order score guess   
#>    <chr>  <chr> <int> <chr>   
#>  1 Emma   1         1 cinnamon
#>  2 Emma   2         0 cloves  
#>  3 Emma   3         1 nutmeg  
#>  4 Harry  1         1 cinnamon
#>  5 Harry  2         1 cardamom
#>  6 Harry  3         1 nutmeg  
#>  7 Ruby   1         1 cinnamon
#>  8 Ruby   2         0 cumin   
#>  9 Ruby   3         1 nutmeg  
#> 10 Zainab 1         0 cardamom
#> 11 Zainab 2        NA <NA>    
#> 12 Zainab 3         0 cinnamon

order is a character
make order a factor

juniors_multiple %>% 
  pivot_longer(-baker,
               names_to = c(".value", "order"),
               names_sep = "_",
               col_ptype = list(
                 order = factor(levels = c(1, 2, 3))
               )
  )
#> # A tibble: 12 x 4
#>    baker  order score guess   
#>    <chr>  <fct> <int> <chr>   
#>  1 Emma   1         1 cinnamon
#>  2 Emma   2         0 cloves  
#>  3 Emma   3         1 nutmeg  
#>  4 Harry  1         1 cinnamon
#>  5 Harry  2         1 cardamom
#>  6 Harry  3         1 nutmeg  
#>  7 Ruby   1         1 cinnamon
#>  8 Ruby   2         0 cumin   
#>  9 Ruby   3         1 nutmeg  
#> 10 Zainab 1         0 cardamom
#> 11 Zainab 2        NA <NA>    
#> 12 Zainab 3         0 cinnamon

make order a number instead

juniors_multiple %>% 
  pivot_longer(-baker,
               names_to = c(".value", "order"),
               names_sep = "_",
               col_ptype = list(
                 order = integer()
               )
  )
#> # A tibble: 12 x 4
#>    baker  order score guess   
#>    <chr>  <int> <int> <chr>   
#>  1 Emma       1     1 cinnamon
#>  2 Emma       2     0 cloves  
#>  3 Emma       3     1 nutmeg  
#>  4 Harry      1     1 cinnamon
#>  5 Harry      2     1 cardamom
#>  6 Harry      3     1 nutmeg  
#>  7 Ruby       1     1 cinnamon
#>  8 Ruby       2     0 cumin   
#>  9 Ruby       3     1 nutmeg  
#> 10 Zainab     1     0 cardamom
#> 11 Zainab     2    NA <NA>    
#> 12 Zainab     3     0 cinnamon

^{Created on 2019-04-04 by the reprex package (v0.2.1)}

JohnMount · April 5, 2019, 2:39pm

The cdata package (where a lot of the newer pivot methods are from) has some neat notation both for specifying and describing transforms like this. In the interest of breadth of solutions, you might want to consider including some of these. Nina Zumel, and I, worked really hard to develop the methodology and teaching materials.

juniors_multiple <- tibble::tribble(
  ~ "baker", ~"score_1", ~"score_2", ~"score_3", ~ "guess_1", ~"guess_2", ~"guess_3",
  "Emma", 1L,   0L, 1L, "cinnamon",   "cloves", "nutmeg",
  "Harry", 1L,   1L, 1L, "cinnamon",   "cardamom", "nutmeg",
  "Ruby", 1L,   0L, 1L, "cinnamon",   "cumin", "nutmeg",
  "Zainab", 0L, NA, 0L, "cardamom", NA_character_, "cinnamon"
)

library("cdata")

control_table <- wrapr::qchar_frame(
  "order"  , "score"  , "guess" |
    "1"    , score_1  , guess_1 |
    "2"    , score_2  , guess_2 |
    "3"    , score_3  , guess_3 )

rowrecs_to_blocks(juniors_multiple, control_table, columnsToCopy = "baker")
#>     baker order score    guess
#> 1    Emma     1     1 cinnamon
#> 2    Emma     2     0   cloves
#> 3    Emma     3     1   nutmeg
#> 4   Harry     1     1 cinnamon
#> 5   Harry     2     1 cardamom
#> 6   Harry     3     1   nutmeg
#> 7    Ruby     1     1 cinnamon
#> 8    Ruby     2     0    cumin
#> 9    Ruby     3     1   nutmeg
#> 10 Zainab     1     0 cardamom
#> 11 Zainab     2    NA     <NA>
#> 12 Zainab     3     0 cinnamon

juniors_multiple <- tibble::tribble(
  ~ "baker", ~"score_1", ~"score_2", ~"score_3", ~ "guess_1", ~"guess_2", ~"guess_3",
  "Emma", 1L,   0L, 1L, "cinnamon",   "cloves", "nutmeg",
  "Harry", 1L,   1L, 1L, "cinnamon",   "cardamom", "nutmeg",
  "Ruby", 1L,   0L, 1L, "cinnamon",   "cumin", "nutmeg",
  "Zainab", 0L, NA, 0L, "cardamom", NA_character_, "cinnamon"
)

library("cdata")

control_table <- wrapr::qchar_frame(
  "order"  , "score"  , "guess" |
    "1"    , score_1  , guess_1 |
    "2"    , score_2  , guess_2 |
    "3"    , score_3  , guess_3 )

transform <- cdata::rowrecs_to_blocks_spec(
  control_table,
  recordKeys = "baker")

juniors_multiple %.>% transform
#>     baker order score    guess
#> 1    Emma     1     1 cinnamon
#> 2    Emma     2     0   cloves
#> 3    Emma     3     1   nutmeg
#> 4   Harry     1     1 cinnamon
#> 5   Harry     2     1 cardamom
#> 6   Harry     3     1   nutmeg
#> 7    Ruby     1     1 cinnamon
#> 8    Ruby     2     0    cumin
#> 9    Ruby     3     1   nutmeg
#> 10 Zainab     1     0 cardamom
#> 11 Zainab     2    NA     <NA>
#> 12 Zainab     3     0 cinnamon

print(transform)
#> {
#>  row_record <- wrapr::qchar_frame(
#>    "baker"  , "score_1", "score_2", "score_3", "guess_1", "guess_2", "guess_3" |
#>      .      , score_1  , score_2  , score_3  , guess_1  , guess_2  , guess_3   )
#>  row_keys <- c('baker')
#> 
#>  # becomes
#> 
#>  block_record <- wrapr::qchar_frame(
#>    "baker"  , "order", "score", "guess" |
#>      .      , "1"    , score_1, guess_1 |
#>      .      , "2"    , score_2, guess_2 |
#>      .      , "3"    , score_3, guess_3 )
#>  block_keys <- c('baker', 'order')
#> 
#>  # args: c(checkNames = TRUE, checkKeys = TRUE, strict = FALSE)
#> }