df <-
tibble::tibble(
x = rep(1:3, each = 2),
y = c(10, NA, NA, 20, NA, NA)
) |>
print()
#> # A tibble: 6 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 1 NA
#> 3 2 NA
#> 4 2 20
#> 5 3 NA
#> 6 3 NA
For each value of x, I want to capture the first non-missing value of y or NA if all values of y are missing. So my desired output is this:
#> # A tibble: 3 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 2 20
#> 3 3 NA
I thought that a grouped summary using coalesce() would do this, but it's not working...
df |>
dplyr::group_by(x) |>
dplyr::summarise(
y = dplyr::coalesce(y),
.groups = "drop"
)
#> # A tibble: 6 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 1 NA
#> 3 2 NA
#> 4 2 20
#> 5 3 NA
#> 6 3 NA
This is because coalesce() wants separate arguments, not a vector. So I can get around this by transforming the vector into a list and giving that list as arguments to the function:
library(dplyr)
df |>
group_by(x) |>
summarise(
y = do.call("coalesce", as.list(y)),
.groups = "drop"
)
#> # A tibble: 3 × 2
#> x y
#> <int> <dbl>
#> 1 1 10
#> 2 2 20
#> 3 3 NA
But this is not ideal. Is there another function that does what coalesce() does but within a vector?
When I fall into the how rabbit hole, I return focus to what—f(x) = y where x is what's to hand, y is what is desired and f is the function to convert one to the other. Usually f must be composite, moving one step closer at a time.
dat <- data.frame(
x = rep(1:3, each = 2),
y = c(10, NA, NA, 20, NA, NA)
)
(the_na <- dat[!complete.cases(dat),] |> unique())
#> x y
#> 2 1 NA
#> 3 2 NA
#> 5 3 NA
(num_pairs <- dat[which(!is.na(dat$y)),])
#> x y
#> 1 1 10
#> 4 2 20
(na_pairs <- setdiff(the_na$x,num_pairs$x))
#> [1] 3
(leftover <- dat[dat$x == 3,] |> unique(x = _))
#> x y
#> 5 3 NA
(result <- (rbind(num_pairs,leftover)))
#> x y
#> 1 1 10
#> 4 2 20
#> 5 3 NA
I guess I could turn this into a function and call it within summary. I was hoping there might be another R or tidyverse function to do it already though.