Fill not working with group_by

Hi,

I've spent a full day trying to use fill from tidyr to fill missing values by group, like so:

vars_to_fill <- c(3:4,7:8)
df <- df %>% dplyr::arrange(ID, time) %>% dplyr::group_by(ID) %>%
                        tidyr::fill(vars_to_fill)

And I cannot, for the life of me, get it to work with my dataset.
It works with small throwaway datasets that I create, but if I use my dataset or any subset of it, it no longer works.

I apologize for the inconvenience, but I am unable to provide the dataset in question due to confidentiality agreements.

If it helps, the packages that I had loaded were:

library(foreign)
library(Hmisc)
library(RODBC)
library(dplyr)
library(tidyr)
library(reshape)
library(magrittr)

I also had
library(plyr) loaded but I unloaded and restarted the session and still, no luck.

First of, tidyr::fill does work with dplyr::group_by:

library(tidyverse)
df <- data.frame(Month = 1:12, Year = c(2000, rep(NA, 11)))

is_even <- function(x){
  x %% 2 == 0
}

vars_to_fill <- c(2)

df %>% 
  dplyr::mutate(even_month = is_even(Month)) %>%
  dplyr::group_by(even_month) %>%
  tidyr::fill(vars_to_fill)
#> # A tibble: 12 x 3
#> # Groups:   even_month [2]
#>    Month  Year even_month
#>    <int> <dbl> <lgl>     
#>  1     1  2000 FALSE     
#>  2     3  2000 FALSE     
#>  3     5  2000 FALSE     
#>  4     7  2000 FALSE     
#>  5     9  2000 FALSE     
#>  6    11  2000 FALSE     
#>  7     2    NA TRUE      
#>  8     4    NA TRUE      
#>  9     6    NA TRUE      
#> 10     8    NA TRUE      
#> 11    10    NA TRUE      
#> 12    12    NA TRUE

Created on 2018-08-04 by the reprex package (v0.2.0).

So we can then concentrate on the problem you have which is not with tidyr, but somewhere else. First rule of debugging - isolate. Does it still behave the same way if you leave only one variable in your vars_to_fill vector? Does it work for some column, but not others? Is there a combination that reliably produces wrong result? Does it work with, for example, 10 rows?

That being said, I don't know your dataset, but I would imagine (based on the names ID and time) that you can try to make a very small example with no real data. ID can be a vector 1:10, time can be created on the fly too. Variables in vars_to_fill are more difficult to understand, but I'm sure they can be mocked too.

3 Likes