Suppose I have this data:
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
d <- data.frame(
id = rep(1:5, each = 3),
update = c(as.Date("2020-07-01") + 1:15),
key = rep(c("a", "b", "c"), 5),
val = 1:15
)
d
#> id update key val
#> 1 1 2020-07-02 a 1
#> 2 1 2020-07-03 b 2
#> 3 1 2020-07-04 c 3
#> 4 2 2020-07-05 a 4
#> 5 2 2020-07-06 b 5
#> 6 2 2020-07-07 c 6
#> 7 3 2020-07-08 a 7
#> 8 3 2020-07-09 b 8
#> 9 3 2020-07-10 c 9
#> 10 4 2020-07-11 a 10
#> 11 4 2020-07-12 b 11
#> 12 4 2020-07-13 c 12
#> 13 5 2020-07-14 a 13
#> 14 5 2020-07-15 b 14
#> 15 5 2020-07-16 c 15
and want to
- pivot
key
to columns and - get the latest
update
of eachid
at the same time.
I hoped pivot_wider()
chops non-id_cols
column into a list column, so that I can mutate()
it later, but actually it doesn't (it just drops).
d %>%
pivot_wider(id,
names_from = key,
values_from = val
) %>%
rowwise() %>%
mutate(last_update = max(update), .keep = "unused")
# or simply this
d %>%
pivot_wider(id,
names_from = key,
values_from = val,
values_fn = max
)
Are there some nicer way to do this? Does it seem reasonable to file a feature request on tidyr's repo?
This is the expected result, and the best way I can come up with at the moment:
d %>%
# summarise `update` by using `mutate()`
group_by(id) %>%
mutate(last_update = max(update)) %>%
ungroup() %>%
# choose `id` and `last_update` as key
pivot_wider(c(id, last_update),
names_from = key,
values_from = val
)
#> # A tibble: 5 x 5
#> id last_update a b c
#> <int> <date> <int> <int> <int>
#> 1 1 2020-07-04 1 2 3
#> 2 2 2020-07-07 4 5 6
#> 3 3 2020-07-10 7 8 9
#> 4 4 2020-07-13 10 11 12
#> 5 5 2020-07-16 13 14 15
Created on 2020-07-04 by the reprex package (v0.3.0)