Imputations based on grouped results

Hi all!
I have a very similar dataset to one presented below:

M1 <- matrix(rnorm(100),nrow=100)
M2 <- rep(c("2011", "2012", "2013", "2014"), times = 25)
M1 <- as.data.frame(sapply(M1, as.numeric))
M4 <- cbind(M1, M2)
colnames(M4) <- c("value", "year")
M4[2:5,1] <- 0

Now, what I would like to do is to eliminate "0" in the table and replace them with a mean value grouped by a given year.

I try to use the following code:

M4 <- M4 %>% group_by(year) %>% mutate(nc = if_else(M4[,1] == 0,  mean(M4[,1]), M4[,1]))

Unfortunately, I get the following error and I do not know how to solve it using similar solutions. I want to avoid splitting the data frame by year, calculating and then binding it again.

Error: Problem with mutate() column nk.
:information_source: nk = if_else(M4[, 1] == 0, mean(M4[, 1]), M4[, 1]).
:information_source: nk must be size 50 or 1, not 200.
:information_source: The error occurred in group 1: year = "2011".

I think a few things are going on and this might work a bit better. I also am guessing you want the group mean without the 0s so I've chosen to not include those in the means.

library(tidyverse)
M1 <- matrix(rnorm(100),nrow=100)
M2 <- rep(c("2011", "2012", "2013", "2014"), times = 25)
M1 <- as.data.frame(sapply(M1, as.numeric))
M4 <- cbind(M1, M2)
colnames(M4) <- c("value", "year")
M4[2:5,1] <- 0

M4_imp <- M4 %>% 
  group_by(year) %>%
  mutate(
    value_temp=if_else(value==0, NA_real_, value),
    group_mean=mean(value_temp, na.rm=TRUE),
    nc = if_else(is.na(value_temp),  group_mean, value)) %>%
  ungroup()

M4_imp
#> # A tibble: 100 x 5
#>      value year  value_temp group_mean        nc
#>      <dbl> <chr>      <dbl>      <dbl>     <dbl>
#>  1  0.0448 2011      0.0448  -0.000777  0.0448  
#>  2  0      2012     NA       -0.0562   -0.0562  
#>  3  0      2013     NA       -0.135    -0.135   
#>  4  0      2014     NA       -0.227    -0.227   
#>  5  0      2011     NA       -0.000777 -0.000777
#>  6  0.782  2012      0.782   -0.0562    0.782   
#>  7 -0.688  2013     -0.688   -0.135    -0.688   
#>  8  0.382  2014      0.382   -0.227     0.382   
#>  9 -0.117  2011     -0.117   -0.000777 -0.117   
#> 10  1.54   2012      1.54    -0.0562    1.54    
#> # ... with 90 more rows

Created on 2021-11-28 by the reprex package (v2.0.1)

Thank you very much, StatSteph! That helped. Based on your solution, it looks like it can be even a bit shortened:

M4 <- M4 %>% group_by(year) %>% mutate(nc = if_else(is.na(value),  mean(value, na.rm = TRUE),  value))

Almost. With your example, there are no NA values, only 0 values so you probably want something like this:

M4 <- M4 %>% group_by(year) %>%
   mutate(
     value=if_else(value==0, NA_real_, value), #change 0s to NA
     nc = if_else(is.na(value),  mean(value, na.rm = TRUE),  value))

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.