Spread() with multiple `value` columns


#1

I find myself needing to use spread on multiple value columns, as in this question here. In a nutshell, the original data frame is

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

and the top answer provides this tidyverse solution:

df %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  spread(temp, value)

#   month Amy_A Amy_B Bob_A Bob_B
# 1     1     9     6     8     5
# 2     2     7     7     6     6
# 3     3     6     8     9     7

Would it be possible to implement this as one function like this?

df %>%
    spread(key = student, value = c(A, B))

I have to use this a lot and it would be really handy if it were all combined into one.


#2

You can do this by using quosures. I think that the reprex ( https://www.jessemaegan.com/post/so-you-ve-been-asked-to-make-a-reprex ) below has the function you want.


suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(month=rep(1:3,2),
                                 student=rep(c("Amy", "Bob"), each=3),
                                 A=c(9, 7, 6, 8, 6, 9),
                                 B=c(6, 7, 8, 5, 6, 7))

t1 <- df %>%
    gather(variable, value, -(month:student)) %>%
    unite(temp, student, variable) %>%
    spread(temp, value)

myspread <- function(df, key, value) {
    # quote key
    keyq <- rlang::enquo(key)
    # break value vector into quotes
    valueq <- rlang::enquo(value)
    s <- rlang::quos(!!valueq)
    df %>% gather(variable, value, !!!s) %>%
        unite(temp, !!keyq, variable) %>%
        spread(temp, value)
}

t2 <- df %>% myspread(student, c(A, B))

identical(t1, t2)
#> [1] TRUE

t2
#>   month Amy_A Amy_B Bob_A Bob_B
#> 1     1     9     6     8     5
#> 2     2     7     7     6     6
#> 3     3     6     8     9     7

Created on 2018-02-20 by the reprex package (v0.2.0).


#3

This is great. I coincidentally just watched Hadley Wickham’s video on Tidy Evaluation this morning so this makes a lot more sense than it would have a week ago. I’ll incorporate this into my code and probably call it spread_n or something since it works with more than just two columns for value. Looks like I’ve still got a ways to go to fully understand what’s going on here, but this is a strong push in the right direction. Thank you.


#4

We have a tool for this sort of thing (with a diagrammatic theory) here. If you can draw a picture of what you want to happen (essentially the before or the after using column names) then you can perform the transform. In this case:

library("cdata")

# data
df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

# transform description
cT <- dplyr::tribble(
  ~student, ~A,      ~B,
  "Amy",    "Amy_A", "Amy_B",
  "Bob",    "Bob_A", "Bob_B"
)

# do the work
blocks_to_rowrecs(df, cT, keyColumns = "month")

And we are now testing some new more concise notations for specifying the control table (requires development version of cdata):

cT <- cdata::qchar_frame(
  student, A    , B     |
  Amy    , Amy_A, Amy_B |
  Bob    , Bob_A, Bob_B )