I have some data like this:
df <-
tibble::tibble(
x = rep(1:3, each = 2),
y = c(10, NA, NA, 20, NA, NA)
) |>
print()
#> # A tibble: 6 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 1 NA
#> 3 2 NA
#> 4 2 20
#> 5 3 NA
#> 6 3 NA
For each value of x
, I want to capture the first non-missing value of y
or NA if all values of y
are missing. So my desired output is this:
#> # A tibble: 3 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 2 20
#> 3 3 NA
I thought that a grouped summary using coalesce()
would do this, but it's not working...
df |>
dplyr::group_by(x) |>
dplyr::summarise(
y = dplyr::coalesce(y),
.groups = "drop"
)
#> # A tibble: 6 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 1 NA
#> 3 2 NA
#> 4 2 20
#> 5 3 NA
#> 6 3 NA
This is because coalesce()
wants separate arguments, not a vector. So I can get around this by transforming the vector into a list and giving that list as arguments to the function:
library(dplyr)
df |>
group_by(x) |>
summarise(
y = do.call("coalesce", as.list(y)),
.groups = "drop"
)
#> # A tibble: 3 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 2 20
#> 3 3 NA
But this is not ideal. Is there another function that does what coalesce()
does but within a vector?