Spread() with multiple `value` columns

I find myself needing to use spread on multiple value columns, as in this question here. In a nutshell, the original data frame is

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

and the top answer provides this tidyverse solution:

df %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  spread(temp, value)

#   month Amy_A Amy_B Bob_A Bob_B
# 1     1     9     6     8     5
# 2     2     7     7     6     6
# 3     3     6     8     9     7

Would it be possible to implement this as one function like this?

df %>%
    spread(key = student, value = c(A, B))

I have to use this a lot and it would be really handy if it were all combined into one.

5 Likes

You can do this by using quosures. I think that the reprex ( https://www.jessemaegan.com/post/so-you-ve-been-asked-to-make-a-reprex ) below has the function you want.


suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(month=rep(1:3,2),
                                 student=rep(c("Amy", "Bob"), each=3),
                                 A=c(9, 7, 6, 8, 6, 9),
                                 B=c(6, 7, 8, 5, 6, 7))

t1 <- df %>%
    gather(variable, value, -(month:student)) %>%
    unite(temp, student, variable) %>%
    spread(temp, value)

myspread <- function(df, key, value) {
    # quote key
    keyq <- rlang::enquo(key)
    # break value vector into quotes
    valueq <- rlang::enquo(value)
    s <- rlang::quos(!!valueq)
    df %>% gather(variable, value, !!!s) %>%
        unite(temp, !!keyq, variable) %>%
        spread(temp, value)
}

t2 <- df %>% myspread(student, c(A, B))

identical(t1, t2)
#> [1] TRUE

t2
#>   month Amy_A Amy_B Bob_A Bob_B
#> 1     1     9     6     8     5
#> 2     2     7     7     6     6
#> 3     3     6     8     9     7

Created on 2018-02-20 by the reprex package (v0.2.0).

12 Likes

This is great. I coincidentally just watched Hadley Wickham's video on Tidy Evaluation this morning so this makes a lot more sense than it would have a week ago. I'll incorporate this into my code and probably call it spread_n or something since it works with more than just two columns for value. Looks like I've still got a ways to go to fully understand what's going on here, but this is a strong push in the right direction. Thank you.

We have a tool for this sort of thing (with a diagrammatic theory) here. If you can draw a picture of what you want to happen (essentially the before or the after using column names) then you can perform the transform. In this case:

library("cdata")

# data
df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

# transform description
cT <- dplyr::tribble(
  ~student, ~A,      ~B,
  "Amy",    "Amy_A", "Amy_B",
  "Bob",    "Bob_A", "Bob_B"
)

# do the work
blocks_to_rowrecs(df, cT, keyColumns = "month")

And we are now testing some new more concise notations for specifying the control table (requires development version of cdata):

cT <- cdata::qchar_frame(
  student, A    , B     |
  Amy    , Amy_A, Amy_B |
  Bob    , Bob_A, Bob_B )
4 Likes

This is an ooooooold thread, but I'm facing the same problem and decided to nest() the multiple value columns to achieve the same effect. It's still not a one line spread, but I found it to be a more flexible solution for more complex gather/spread problems:

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.5.1
#> Warning: package 'dplyr' was built under R version 3.5.1

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

df %>%
  nest(A, B, .key = 'value_col') %>%
  spread(key = student, value = value_col) %>%
  unnest(Amy, Bob, .sep = '_')
#>   month Amy_A Amy_B Bob_A Bob_B
#> 1     1     9     6     8     5
#> 2     2     7     7     6     6
#> 3     3     6     8     9     7

Created on 2018-11-15 by the reprex package (v0.2.0).

Update: the upcoming tidyr::pivot_*() functions (I believe they'll likely be called pivot_longer() and pivot_wider()?) are going to make these sorts of problems much, much easier :heart:

13 Likes

If you are stumbling on this thread, you may want to read through the pivot functions vignette: https://tidyr.tidyverse.org/dev/articles/pivot.html

Here are some very simple worked examples using the new pivot functions: https://github.com/apreshill/teachthat/tree/master/pivot

One example involves multiple columns, as framed in the original question here:

Use the development version of tidyr from GitHub:

#install.packages("devtools")
#devtools::install_github("tidyverse/tidyr")
library(dplyr)
library(tidyr)

juniors_multiple <- tribble(
  ~ "baker", ~"score_1", ~"score_2", ~"score_3", ~ "guess_1", ~"guess_2", ~"guess_3",
  "Emma", 1L,   0L, 1L, "cinnamon",   "cloves", "nutmeg",
  "Harry", 1L,   1L, 1L, "cinnamon",   "cardamom", "nutmeg",
  "Ruby", 1L,   0L, 1L, "cinnamon",   "cumin", "nutmeg",
  "Zainab", 0L, NA, 0L, "cardamom", NA_character_, "cinnamon"
)

juniors_multiple %>% 
  knitr::kable()
baker score_1 score_2 score_3 guess_1 guess_2 guess_3
Emma 1 0 1 cinnamon cloves nutmeg
Harry 1 1 1 cinnamon cardamom nutmeg
Ruby 1 0 1 cinnamon cumin nutmeg
Zainab 0 NA 0 cardamom NA cinnamon

I want three total columns:

  • first is order (1/2/3) –> these are the numbers at end of my column names
  • second is score (0/1/NA)
  • third is guess (cinnamon/cloves/nutmeg/NA)
juniors_multiple %>% 
  tidyr::pivot_longer(-baker,
                      names_to = c(".value", "order"),
                      names_sep = "_"
  )
#> # A tibble: 12 x 4
#>    baker  order score guess   
#>    <chr>  <chr> <int> <chr>   
#>  1 Emma   1         1 cinnamon
#>  2 Emma   2         0 cloves  
#>  3 Emma   3         1 nutmeg  
#>  4 Harry  1         1 cinnamon
#>  5 Harry  2         1 cardamom
#>  6 Harry  3         1 nutmeg  
#>  7 Ruby   1         1 cinnamon
#>  8 Ruby   2         0 cumin   
#>  9 Ruby   3         1 nutmeg  
#> 10 Zainab 1         0 cardamom
#> 11 Zainab 2        NA <NA>    
#> 12 Zainab 3         0 cinnamon

order is a character
make order a factor

juniors_multiple %>% 
  pivot_longer(-baker,
               names_to = c(".value", "order"),
               names_sep = "_",
               col_ptype = list(
                 order = factor(levels = c(1, 2, 3))
               )
  )
#> # A tibble: 12 x 4
#>    baker  order score guess   
#>    <chr>  <fct> <int> <chr>   
#>  1 Emma   1         1 cinnamon
#>  2 Emma   2         0 cloves  
#>  3 Emma   3         1 nutmeg  
#>  4 Harry  1         1 cinnamon
#>  5 Harry  2         1 cardamom
#>  6 Harry  3         1 nutmeg  
#>  7 Ruby   1         1 cinnamon
#>  8 Ruby   2         0 cumin   
#>  9 Ruby   3         1 nutmeg  
#> 10 Zainab 1         0 cardamom
#> 11 Zainab 2        NA <NA>    
#> 12 Zainab 3         0 cinnamon

make order a number instead

juniors_multiple %>% 
  pivot_longer(-baker,
               names_to = c(".value", "order"),
               names_sep = "_",
               col_ptype = list(
                 order = integer()
               )
  )
#> # A tibble: 12 x 4
#>    baker  order score guess   
#>    <chr>  <int> <int> <chr>   
#>  1 Emma       1     1 cinnamon
#>  2 Emma       2     0 cloves  
#>  3 Emma       3     1 nutmeg  
#>  4 Harry      1     1 cinnamon
#>  5 Harry      2     1 cardamom
#>  6 Harry      3     1 nutmeg  
#>  7 Ruby       1     1 cinnamon
#>  8 Ruby       2     0 cumin   
#>  9 Ruby       3     1 nutmeg  
#> 10 Zainab     1     0 cardamom
#> 11 Zainab     2    NA <NA>    
#> 12 Zainab     3     0 cinnamon

Created on 2019-04-04 by the reprex package (v0.2.1)

4 Likes

The cdata package (where a lot of the newer pivot methods are from) has some neat notation both for specifying and describing transforms like this. In the interest of breadth of solutions, you might want to consider including some of these. Nina Zumel, and I, worked really hard to develop the methodology and teaching materials.

juniors_multiple <- tibble::tribble(
  ~ "baker", ~"score_1", ~"score_2", ~"score_3", ~ "guess_1", ~"guess_2", ~"guess_3",
  "Emma", 1L,   0L, 1L, "cinnamon",   "cloves", "nutmeg",
  "Harry", 1L,   1L, 1L, "cinnamon",   "cardamom", "nutmeg",
  "Ruby", 1L,   0L, 1L, "cinnamon",   "cumin", "nutmeg",
  "Zainab", 0L, NA, 0L, "cardamom", NA_character_, "cinnamon"
)

library("cdata")

control_table <- wrapr::qchar_frame(
  "order"  , "score"  , "guess" |
    "1"    , score_1  , guess_1 |
    "2"    , score_2  , guess_2 |
    "3"    , score_3  , guess_3 )

rowrecs_to_blocks(juniors_multiple, control_table, columnsToCopy = "baker")
#>     baker order score    guess
#> 1    Emma     1     1 cinnamon
#> 2    Emma     2     0   cloves
#> 3    Emma     3     1   nutmeg
#> 4   Harry     1     1 cinnamon
#> 5   Harry     2     1 cardamom
#> 6   Harry     3     1   nutmeg
#> 7    Ruby     1     1 cinnamon
#> 8    Ruby     2     0    cumin
#> 9    Ruby     3     1   nutmeg
#> 10 Zainab     1     0 cardamom
#> 11 Zainab     2    NA     <NA>
#> 12 Zainab     3     0 cinnamon
juniors_multiple <- tibble::tribble(
  ~ "baker", ~"score_1", ~"score_2", ~"score_3", ~ "guess_1", ~"guess_2", ~"guess_3",
  "Emma", 1L,   0L, 1L, "cinnamon",   "cloves", "nutmeg",
  "Harry", 1L,   1L, 1L, "cinnamon",   "cardamom", "nutmeg",
  "Ruby", 1L,   0L, 1L, "cinnamon",   "cumin", "nutmeg",
  "Zainab", 0L, NA, 0L, "cardamom", NA_character_, "cinnamon"
)

library("cdata")

control_table <- wrapr::qchar_frame(
  "order"  , "score"  , "guess" |
    "1"    , score_1  , guess_1 |
    "2"    , score_2  , guess_2 |
    "3"    , score_3  , guess_3 )

transform <- cdata::rowrecs_to_blocks_spec(
  control_table,
  recordKeys = "baker")

juniors_multiple %.>% transform
#>     baker order score    guess
#> 1    Emma     1     1 cinnamon
#> 2    Emma     2     0   cloves
#> 3    Emma     3     1   nutmeg
#> 4   Harry     1     1 cinnamon
#> 5   Harry     2     1 cardamom
#> 6   Harry     3     1   nutmeg
#> 7    Ruby     1     1 cinnamon
#> 8    Ruby     2     0    cumin
#> 9    Ruby     3     1   nutmeg
#> 10 Zainab     1     0 cardamom
#> 11 Zainab     2    NA     <NA>
#> 12 Zainab     3     0 cinnamon

print(transform)
#> {
#>  row_record <- wrapr::qchar_frame(
#>    "baker"  , "score_1", "score_2", "score_3", "guess_1", "guess_2", "guess_3" |
#>      .      , score_1  , score_2  , score_3  , guess_1  , guess_2  , guess_3   )
#>  row_keys <- c('baker')
#> 
#>  # becomes
#> 
#>  block_record <- wrapr::qchar_frame(
#>    "baker"  , "order", "score", "guess" |
#>      .      , "1"    , score_1, guess_1 |
#>      .      , "2"    , score_2, guess_2 |
#>      .      , "3"    , score_3, guess_3 )
#>  block_keys <- c('baker', 'order')
#> 
#>  # args: c(checkNames = TRUE, checkKeys = TRUE, strict = FALSE)
#> }
2 Likes