Coalesce within a vector

jmgirard · December 13, 2022, 10:50pm

I have some data like this:

df <- 
  tibble::tibble(
    x = rep(1:3, each = 2),
    y = c(10, NA, NA, 20, NA, NA)
  ) |> 
  print()
#> # A tibble: 6 × 2
#>       x     y
#>   <int> <dbl>
#> 1     1    10
#> 2     1    NA
#> 3     2    NA
#> 4     2    20
#> 5     3    NA
#> 6     3    NA

For each value of x, I want to capture the first non-missing value of y or NA if all values of y are missing. So my desired output is this:

#> # A tibble: 3 × 2
#>       x     y
#>   <int> <dbl>
#> 1     1    10
#> 2     2    20
#> 3     3    NA

I thought that a grouped summary using coalesce() would do this, but it's not working...

df |> 
  dplyr::group_by(x) |> 
  dplyr::summarise(
    y = dplyr::coalesce(y), 
    .groups = "drop"
  )
#> # A tibble: 6 × 2
#>       x     y
#>   <int> <dbl>
#> 1     1    10
#> 2     1    NA
#> 3     2    NA
#> 4     2    20
#> 5     3    NA
#> 6     3    NA

This is because coalesce() wants separate arguments, not a vector. So I can get around this by transforming the vector into a list and giving that list as arguments to the function:

library(dplyr)
df |> 
  group_by(x) |> 
  summarise(
    y = do.call("coalesce", as.list(y)), 
    .groups = "drop"
  )
#> # A tibble: 3 × 2
#>       x     y
#>   <int> <dbl>
#> 1     1    10
#> 2     2    20
#> 3     3    NA

But this is not ideal. Is there another function that does what coalesce() does but within a vector?

technocrat · December 14, 2022, 7:34pm

When I fall into the how rabbit hole, I return focus to what—f(x) = y where x is what's to hand, y is what is desired and f is the function to convert one to the other. Usually f must be composite, moving one step closer at a time.

dat <- data.frame(
  x = rep(1:3, each = 2),
  y = c(10, NA, NA, 20, NA, NA)
)

(the_na <- dat[!complete.cases(dat),] |> unique())
#>   x  y
#> 2 1 NA
#> 3 2 NA
#> 5 3 NA
(num_pairs <- dat[which(!is.na(dat$y)),])
#>   x  y
#> 1 1 10
#> 4 2 20
(na_pairs <- setdiff(the_na$x,num_pairs$x))
#> [1] 3
(leftover <- dat[dat$x == 3,] |> unique(x = _))
#>   x  y
#> 5 3 NA
(result <- (rbind(num_pairs,leftover)))
#>   x  y
#> 1 1 10
#> 4 2 20
#> 5 3 NA

^{Created on 2022-12-14 by the reprex package (v2.0.1)}

jmgirard · December 14, 2022, 7:42pm

I'm a bit confused. Your code doesn't seem to give the desired output.

technocrat · December 14, 2022, 7:54pm

I was confused, too.

dat <- data.frame(
  x = rep(1:3, each = 2),
  y = c(10, NA, NA, 20, NA, NA)
)

(the_na <- dat[!complete.cases(dat),] |> unique())
#>   x  y
#> 2 1 NA
#> 3 2 NA
#> 5 3 NA
(num_pairs <- dat[which(!is.na(dat$y)),])
#>   x  y
#> 1 1 10
#> 4 2 20
(na_pairs <- setdiff(the_na$x,num_pairs$x))
#> [1] 3
(leftover <- dat[dat$x == 3,] |> unique(x = _))
#>   x  y
#> 5 3 NA
(result <- (rbind(num_pairs,leftover)))
#>   x  y
#> 1 1 10
#> 4 2 20
#> 5 3 NA

jmgirard · December 14, 2022, 8:01pm

I guess I could turn this into a function and call it within summary. I was hoping there might be another R or tidyverse function to do it already though.

nirgrahamuk · December 15, 2022, 12:13am

here are two more options; the first stays in tidyverse; the second borrows Coalesce function present in DescTools


df |> group_by(x) |> 
  fill(y, .direction = "up") |> 
  slice_head(n=1)

library(DescTools)

df |> group_by(x) |>
  summarise(y=Coalesce(y))

technocrat · December 15, 2022, 7:53am

dat <- data.frame(
  x = rep(1:3, each = 2),
  y = c(10, NA, NA, 20, NA, NA)
)

reframe <- function(x,y,z){
  the_na = z[!complete.cases(z),] |> unique()
  num_pairs = z[which(!is.na(z$y)),]
  na_pairs = setdiff(the_na$x,num_pairs$x)
  leftover = z[z$x == 3,] |> unique(x = _)
  result = (rbind(num_pairs,leftover))
  return(result)
}

reframe("x","y",dat)
#>   x  y
#> 1 1 10
#> 4 2 20
#> 5 3 NA

system · January 2, 2023, 4:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.