Mutate multiple columns if grouping variable has all observations missing

I'm trying to mutate the columns "a" and "b" only if the grouping variable "group" has all observations missing. The attempted (and failed) solution changes the group "blue", in which not all observations are missing. The solution could also involve split() or nest().

Any help is greatly appreciated!

Code below:

library(tidyverse)

# sample data
a <- c(NA,NA,1,1,NA,1)
b <- c(1,1,NA,NA,1,NA)
c  <- letters[1:6]
group <- c("yellow","yellow","black","black", "blue", "blue")

(data <- as_tibble(data.frame(a,b,c,group)))

#         a     b c     group 
#    <dbl> <dbl> <fct> <fct> 
# 1    NA     1 a     yellow
# 2    NA     1 b     yellow
# 3     1    NA c     black 
# 4     1    NA d     black 
# 5    NA     1 e     blue  
# 6     1    NA f     blue 

# failed attempt: observations from group "blue" change 
(data %>% 
  dplyr::group_by(group) %>%
  dplyr::mutate(across(1:2, ~ ifelse(all(is.na(.x)), 99999,.x))))

#      a     b c     group 
#   <dbl> <dbl> <fct> <fct> 
# 1 99999     1 a     yellow
# 2 99999     1 b     yellow
# 3     1 99999 c     black 
# 4     1 99999 d     black 
# 5    NA     1 e     blue  
# 6    NA     1 f     blue

# desired output - observations from blue remain the same
a2 <- c(99999,99999,1,1,NA,1)
b2 <- c(1,1,99999,99999,1,NA)
c2  <- letters[1:6]
group2 <- c("yellow","yellow","black","black", "blue", "blue")

(data_desired <- as_tibble(data.frame(a2,b2,c2,group2)))

# a2    b2 c2    group2
# <dbl> <dbl> <fct> <fct> 
# 1 99999     1 a     yellow
# 2 99999     1 b     yellow
# 3     1 99999 c     black 
# 4     1 99999 d     black 
# 5    NA     1 e     blue  
# 6     1    NA f     blue

Interesting puzzle. I tried to do this by using sum(is.na()) instead and think I stumbled upon a bug.

library(tidyverse)

# sample data
a <- c(NA, NA, 1, 1, NA, 1)
b <- c(1, 1, NA, NA, 1, NA)
c <- letters[1:6]
group <- c("yellow", "yellow", "black", "black", "blue", "blue")

data <- as_tibble(data.frame(a, b, c, group))

# Attempting to do it in a single statement gives the wrong answer.
data %>% 
  group_by(group) %>% 
  mutate(a = ifelse(sum(is.na(a)) == n(), 99999, a))
#> # A tibble: 6 x 4
#> # Groups:   group [3]
#>       a     b c     group 
#>   <dbl> <dbl> <chr> <chr> 
#> 1 99999     1 a     yellow
#> 2 99999     1 b     yellow
#> 3     1    NA c     black 
#> 4     1    NA d     black 
#> 5    NA     1 e     blue  
#> 6    NA    NA f     blue

# But breaking it down into two steps works.
data %>% 
  group_by(group) %>% 
  mutate(lgl_test = sum(is.na(a)) == n(),
         a = ifelse(lgl_test == TRUE, 99999, a))
#> # A tibble: 6 x 5
#> # Groups:   group [3]
#>       a     b c     group  lgl_test
#>   <dbl> <dbl> <chr> <chr>  <lgl>   
#> 1 99999     1 a     yellow TRUE    
#> 2 99999     1 b     yellow TRUE    
#> 3     1    NA c     black  FALSE   
#> 4     1    NA d     black  FALSE   
#> 5    NA     1 e     blue   FALSE   
#> 6     1    NA f     blue   FALSE

Created on 2020-06-18 by the reprex package (v0.3.0)

1 Like

interesting when I use if_else instead of ifelse I get an explicit error

data %>% 
  group_by(group) %>% 
  mutate(   anew = if_else( sum(is.na(a)),== n(),999,a)) 

Error: `false` must be length 1 (length of `condition`), not 2

There seems to be a solution, but a bit ugly.
Either use an explicit interim, or an overly complex one liner

data %>% 
  group_by(group) %>% 
  mutate(atemp = sum(is.na(a)),
         anew = if_else(atemp== n(),999,a)) %>% select(-atemp)

data %>% 
  group_by(group) %>% 
  mutate(anew = if_else(rep(head(sum(is.na(a)),1),n())== n(),999,a))

It seems to me that the comparison is happening against all elements in each group (2 in case of the first group black). This problem is avoided when we compare against the logical vector which is done element-wise.

If you ask me, if_else should also compare element-wise as I don't see how the current behaviour is useful. Maybe someone from the tidyverse development team can weigh in?

Thanks all for the input!

Finally, this is how I resolved this with lists and purrr.

library(tidyverse)
library(purrr)

# sample data
a <- c(NA,NA,1,1,NA,1)
b <- c(1,1,NA,NA,1,NA)
c  <- letters[1:6]
group <- c("yellow","yellow","black","black", "blue", "blue")

(data <- as_tibble(data.frame(a,b,c,group)))

# list with groups in which all cases are NA
list1 <- data %>% 
  split(.,.$group) %>%
  map(~select(.x,as.vector(which(colSums(is.na(.)) == nrow(.))))) %>% 
  map(~mutate_all(.x, replace_na, 99999))  

# list with groups in which there is at least one valid observation
list2 <- data %>%
  split(.,.$group) %>%
  map(~select(.x, as.vector(which(colSums(is.na(.)) != nrow(.))))) 

# putting the groups together into a dataframe
list3 <- mapply(cbind, list1, list2, SIMPLIFY=FALSE)

(desired_output <- do.call(rbind.data.frame, list3))

Here is another option (tweaking my original offering to be more generic)


data %>% 
  group_by(group) %>% 
  mutate_if(is.numeric,~if_else(rep(head(sum(is.na(.)),1),n())== n(),999,.))

For reference, I raised an issue on the dplyr GitHub questioning this behaviour and apparently it will be fixed in a future version of if_else().

1 Like

Thanks for your help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.