What is the best way to pivot_wider() and summarise() at the same time?

Suppose I have this data:

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

d <- data.frame(
  id = rep(1:5, each = 3),
  update = c(as.Date("2020-07-01") + 1:15),
  key = rep(c("a", "b", "c"), 5),
  val = 1:15
)

d
#>    id     update key val
#> 1   1 2020-07-02   a   1
#> 2   1 2020-07-03   b   2
#> 3   1 2020-07-04   c   3
#> 4   2 2020-07-05   a   4
#> 5   2 2020-07-06   b   5
#> 6   2 2020-07-07   c   6
#> 7   3 2020-07-08   a   7
#> 8   3 2020-07-09   b   8
#> 9   3 2020-07-10   c   9
#> 10  4 2020-07-11   a  10
#> 11  4 2020-07-12   b  11
#> 12  4 2020-07-13   c  12
#> 13  5 2020-07-14   a  13
#> 14  5 2020-07-15   b  14
#> 15  5 2020-07-16   c  15

and want to

  • pivot key to columns and
  • get the latest update of each id

at the same time.

I hoped pivot_wider() chops non-id_cols column into a list column, so that I can mutate() it later, but actually it doesn't (it just drops).

d %>% 
  pivot_wider(id,
    names_from = key,
    values_from = val
  ) %>%
  rowwise() %>%
  mutate(last_update = max(update), .keep = "unused")

# or simply this
d %>% 
  pivot_wider(id,
    names_from = key,
    values_from = val,
    values_fn = max
  )

Are there some nicer way to do this? Does it seem reasonable to file a feature request on tidyr's repo?

This is the expected result, and the best way I can come up with at the moment:

d %>% 
  # summarise `update` by using `mutate()`
  group_by(id) %>% 
  mutate(last_update = max(update)) %>% 
  ungroup() %>% 
  # choose `id` and `last_update` as key
  pivot_wider(c(id, last_update),
    names_from = key,
    values_from = val
  )
#> # A tibble: 5 x 5
#>      id last_update     a     b     c
#>   <int> <date>      <int> <int> <int>
#> 1     1 2020-07-04      1     2     3
#> 2     2 2020-07-07      4     5     6
#> 3     3 2020-07-10      7     8     9
#> 4     4 2020-07-13     10    11    12
#> 5     5 2020-07-16     13    14    15

Created on 2020-07-04 by the reprex package (v0.3.0)

close...

#list which columns to get the max of
d %>% 
  pivot_wider(id,
              names_from = key,
              values_from = val
  ) %>% summarise_at(c("a","b","c")
                     ,max)

# or 
# list which column not to get the max of
d %>% 
  pivot_wider(id,
              names_from = key,
              values_from = val
  ) %>% summarise_at(setdiff(names(.),"id")
                     ,max)

Thanks for pointing it out.

d %>% 
  pivot_wider(id,
              names_from = key,
              values_from = c(val,update),
              values_fn = max
  ) %>% mutate(last_update=pmax(!!!syms(names(.)[startsWith(names(.) ,
                                                            "update")]))) %>%
  select(-names(.)[startsWith(names(.) ,
                                      "update")])

     id val_a val_b val_c last_update
  <int> <int> <int> <int> <date>     
1     1     1     2     3 2020-07-04 
2     2     4     5     6 2020-07-07 
3     3     7     8     9 2020-07-10 
4     4    10    11    12 2020-07-13 
5     5    13    14    15 2020-07-16 

gnarly :laughing:

not general, but succinct:

d %>% 
  pivot_wider(id,
              names_from = key,
              values_from = c(val,update),
              values_fn = max
  ) %>% select(1:4,7)

Sorry, my post was not clear about which is the expected result. Thanks @siddharthprabhu for helping!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

That doesn't do what the OP wants. He is asking for a way to arrive at the 5 x 5 tibble in his post with a single call to pivot_wider() or at least in a more elegant manner than having to first calculate the max value before calling pivot_wider().

@yutannihilation I couldn't come up with a better way of doing this besides the grouped mutate. You could try filing a feature request on the tidyr repository.

I ended up filing a feature request: https://github.com/tidyverse/tidyr/issues/990

1 Like