Trying to diagnose why function works once or twice then errors out on same data, even though environment hasn't changed

I'm having a bizarre problem in which a tidyeval function I wrote works fine the first time I run it with a particular data frame, but might or might not work on subsequent attempts. I've provided two reprexes below, just to show a couple of different failure modes. Does anyone know what could be causing this and how to fix it?

library(tidyverse)

fnc = function(data, value.vars, group.vars=NULL) {
  data %>% 
    group_by(across({{group.vars}})) %>% 
    summarise(n=n(), across({{value.vars}}, 
                            list(mean=~mean(.x, na.rm=TRUE),
                                 n.miss=~sum(is.na(.x))), 
                            .names="{.fn}_{.col}"))
}

mtcars %>% fnc(mpg)
#> # A tibble: 1 x 3
#>       n mean_mpg n.miss_mpg
#>   <int>    <dbl>      <int>
#> 1    32     20.1          0

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> # A tibble: 3 x 6
#>   Species     n mean_Petal.Width n.miss_Petal.Wi… mean_Sepal.Width
#> * <fct>   <int>            <dbl>            <int>            <dbl>
#> 1 setosa     50            0.246                0             3.43
#> 2 versic…    50            1.33                 0             2.77
#> 3 virgin…    50            2.03                 0             2.97
#> # … with 1 more variable: n.miss_Sepal.Width <int>

diamonds %>% fnc(c(x,y), c(cut, color))
#> `summarise()` has grouped output by 'cut'. You can override using the `.groups` argument.
#> # A tibble: 35 x 7
#> # Groups:   cut [5]
#>    cut   color     n mean_x n.miss_x mean_y n.miss_y
#>    <ord> <ord> <int>  <dbl>    <int>  <dbl>    <int>
#>  1 Fair  D       163   6.02        0   5.96        0
#>  2 Fair  E       224   5.91        0   5.86        0
#>  3 Fair  F       312   5.99        0   5.93        0
#>  4 Fair  G       314   6.17        0   6.11        0
#>  5 Fair  H       303   6.58        0   6.50        0
#>  6 Fair  I       175   6.56        0   6.49        0
#>  7 Fair  J       119   6.75        0   6.68        0
#>  8 Good  D       662   5.62        0   5.63        0
#>  9 Good  E       933   5.62        0   5.63        0
#> 10 Good  F       909   5.69        0   5.71        0
#> # … with 25 more rows

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

diamonds %>% fnc(c(x,y))
#> Error: Problem with `summarise()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

Created on 2021-02-18 by the reprex package (v1.0.0)

library(tidyverse)

fnc = function(data, value.vars, group.vars=NULL) {
  data %>% 
    group_by(across({{group.vars}})) %>% 
    summarise(n=n(), across({{value.vars}}, 
                            list(mean=~mean(.x, na.rm=TRUE),
                                 n.miss=~sum(is.na(.x))), 
                            .names="{.fn}_{.col}"))
}

diamonds %>% fnc(c(x,y))
#> # A tibble: 1 x 5
#>       n mean_x n.miss_x mean_y n.miss_y
#>   <int>  <dbl>    <int>  <dbl>    <int>
#> 1 53940   5.73        0   5.73        0

mtcars %>% fnc(mpg)
#> # A tibble: 1 x 3
#>       n mean_mpg n.miss_mpg
#>   <int>    <dbl>      <int>
#> 1    32     20.1          0

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> # A tibble: 3 x 6
#>   Species     n mean_Petal.Width n.miss_Petal.Wi… mean_Sepal.Width
#> * <fct>   <int>            <dbl>            <int>            <dbl>
#> 1 setosa     50            0.246                0             3.43
#> 2 versic…    50            1.33                 0             2.77
#> 3 virgin…    50            2.03                 0             2.97
#> # … with 1 more variable: n.miss_Sepal.Width <int>

diamonds %>% fnc(c(x,y), c(cut, color))
#> `summarise()` has grouped output by 'cut'. You can override using the `.groups` argument.
#> # A tibble: 35 x 7
#> # Groups:   cut [5]
#>    cut   color     n mean_x n.miss_x mean_y n.miss_y
#>    <ord> <ord> <int>  <dbl>    <int>  <dbl>    <int>
#>  1 Fair  D       163   6.02        0   5.96        0
#>  2 Fair  E       224   5.91        0   5.86        0
#>  3 Fair  F       312   5.99        0   5.93        0
#>  4 Fair  G       314   6.17        0   6.11        0
#>  5 Fair  H       303   6.58        0   6.50        0
#>  6 Fair  I       175   6.56        0   6.49        0
#>  7 Fair  J       119   6.75        0   6.68        0
#>  8 Good  D       662   5.62        0   5.63        0
#>  9 Good  E       933   5.62        0   5.63        0
#> 10 Good  F       909   5.69        0   5.71        0
#> # … with 25 more rows

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

mtcars %>% fnc(mpg, cyl)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

diamonds %>% fnc(c(x,y), color)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 7 elements.

Created on 2021-02-18 by the reprex package (v1.0.0)

There was a very similar issue very recently on here where the order of calling some functions separately affected the output. Somebody logged it as an issue on github. Unfortunately I cannot find it either on here or on github, but it may be related.

Sorry not to be of more help, but this might help you or somebody else locate the other issue.

Thanks Martin. I haven't been able to find it either. I've posted this as an issue on dplyr github.

Hi Joel's, small suggestion to share a sessionInfo() because it may be version related.

Good suggestion Nir. I reran the first reprex, but with calls to sessionInfo before and after running the function (see below). It turns out that the namespace of two additional packages, fansi and utf8, are loaded after the function is run for the first time. In the second and third calls to sessionInfo(), you can seem them in the namespace package list at positions 35 and 47. Presumably, this is the source of the problem, but I'm not sure what's actually going wrong. I'll try uninstalling both packages (none of the packages in my R setup seem to depend on these two packages and I don't recall installing them explicitly) and see if that fixes the problem.

library(tidyverse)

fnc = function(data, value.vars, group.vars=NULL) {
  data %>% 
    group_by(across({{group.vars}})) %>% 
    summarise(n=n(), across({{value.vars}}, 
                            list(mean=~mean(.x, na.rm=TRUE),
                                 n.miss=~sum(is.na(.x))), 
                            .names="{.fn}_{.col}"))
}

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4    
#> [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3  
#> [9] tidyverse_1.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_4.0.3   
#>  [5] dbplyr_2.1.0      highr_0.8         tools_4.0.3       digest_0.6.27    
#>  [9] lubridate_1.7.9.2 jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0  
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.10      reprex_1.0.0     
#> [17] cli_2.3.0         DBI_1.1.1         yaml_2.2.1        haven_2.3.1      
#> [21] xfun_0.20         withr_2.4.1       xml2_1.3.2        httr_1.4.2       
#> [25] styler_1.3.2      knitr_1.31        hms_1.0.0         generics_0.1.0   
#> [29] fs_1.5.0          vctrs_0.3.6       grid_4.0.3        tidyselect_1.1.0 
#> [33] glue_1.4.2        R6_2.5.0          readxl_1.3.1      rmarkdown_2.6    
#> [37] modelr_0.1.8      magrittr_2.0.1    backports_1.2.1   scales_1.1.1     
#> [41] ellipsis_0.3.1    htmltools_0.5.1.1 rvest_0.3.6       assertthat_0.2.1 
#> [45] colorspace_2.0-0  stringi_1.5.3     munsell_0.5.0     broom_0.7.4      
#> [49] crayon_1.4.0

mtcars %>% fnc(mpg)
#> # A tibble: 1 x 3
#>       n mean_mpg n.miss_mpg
#>   <int>    <dbl>      <int>
#> 1    32     20.1          0

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4    
#> [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3  
#> [9] tidyverse_1.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_4.0.3   
#>  [5] dbplyr_2.1.0      highr_0.8         tools_4.0.3       digest_0.6.27    
#>  [9] lubridate_1.7.9.2 jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0  
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.10      reprex_1.0.0     
#> [17] cli_2.3.0         DBI_1.1.1         yaml_2.2.1        haven_2.3.1      
#> [21] xfun_0.20         withr_2.4.1       xml2_1.3.2        httr_1.4.2       
#> [25] styler_1.3.2      knitr_1.31        hms_1.0.0         generics_0.1.0   
#> [29] fs_1.5.0          vctrs_0.3.6       grid_4.0.3        tidyselect_1.1.0 
#> [33] glue_1.4.2        R6_2.5.0          fansi_0.4.2       readxl_1.3.1     
#> [37] rmarkdown_2.6     modelr_0.1.8      magrittr_2.0.1    backports_1.2.1  
#> [41] scales_1.1.1      ellipsis_0.3.1    htmltools_0.5.1.1 rvest_0.3.6      
#> [45] assertthat_0.2.1  colorspace_2.0-0  utf8_1.1.4        stringi_1.5.3    
#> [49] munsell_0.5.0     broom_0.7.4       crayon_1.4.0

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> # A tibble: 3 x 6
#>   Species     n mean_Petal.Width n.miss_Petal.Wi… mean_Sepal.Width
#> * <fct>   <int>            <dbl>            <int>            <dbl>
#> 1 setosa     50            0.246                0             3.43
#> 2 versic…    50            1.33                 0             2.77
#> 3 virgin…    50            2.03                 0             2.97
#> # … with 1 more variable: n.miss_Sepal.Width <int>

diamonds %>% fnc(c(x,y), c(cut, color))
#> `summarise()` has grouped output by 'cut'. You can override using the `.groups` argument.
#> # A tibble: 35 x 7
#> # Groups:   cut [5]
#>    cut   color     n mean_x n.miss_x mean_y n.miss_y
#>    <ord> <ord> <int>  <dbl>    <int>  <dbl>    <int>
#>  1 Fair  D       163   6.02        0   5.96        0
#>  2 Fair  E       224   5.91        0   5.86        0
#>  3 Fair  F       312   5.99        0   5.93        0
#>  4 Fair  G       314   6.17        0   6.11        0
#>  5 Fair  H       303   6.58        0   6.50        0
#>  6 Fair  I       175   6.56        0   6.49        0
#>  7 Fair  J       119   6.75        0   6.68        0
#>  8 Good  D       662   5.62        0   5.63        0
#>  9 Good  E       933   5.62        0   5.63        0
#> 10 Good  F       909   5.69        0   5.71        0
#> # … with 25 more rows

iris %>% fnc(c(Petal.Width, Sepal.Width), Species)
#> Error: Can't subset elements that don't exist.
#> x Location 35 doesn't exist.
#> ℹ There are only 3 elements.

diamonds %>% fnc(c(x,y))
#> Error: Problem with `summarise()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

mtcars %>% fnc(mpg)
#> Error: Problem with `summarise()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.4     purrr_0.3.4    
#> [5] readr_1.4.0     tidyr_1.1.2     tibble_3.0.6    ggplot2_3.3.3  
#> [9] tidyverse_1.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_4.0.3   
#>  [5] dbplyr_2.1.0      highr_0.8         tools_4.0.3       digest_0.6.27    
#>  [9] lubridate_1.7.9.2 jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0  
#> [13] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.10      reprex_1.0.0     
#> [17] cli_2.3.0         DBI_1.1.1         yaml_2.2.1        haven_2.3.1      
#> [21] xfun_0.20         withr_2.4.1       xml2_1.3.2        httr_1.4.2       
#> [25] styler_1.3.2      knitr_1.31        hms_1.0.0         generics_0.1.0   
#> [29] fs_1.5.0          vctrs_0.3.6       grid_4.0.3        tidyselect_1.1.0 
#> [33] glue_1.4.2        R6_2.5.0          fansi_0.4.2       readxl_1.3.1     
#> [37] rmarkdown_2.6     modelr_0.1.8      magrittr_2.0.1    backports_1.2.1  
#> [41] scales_1.1.1      ellipsis_0.3.1    htmltools_0.5.1.1 rvest_0.3.6      
#> [45] assertthat_0.2.1  colorspace_2.0-0  utf8_1.1.4        stringi_1.5.3    
#> [49] munsell_0.5.0     broom_0.7.4       crayon_1.4.0

Created on 2021-02-19 by the reprex package (v1.0.0)

Yes, here's the issues that might be related:

2 Likes
1 Like

Thanks Stephanie! Those definitely look like the same underlying problem. After reading the second issue in your post, I installed the development version of dplyr from github and the issue went away. I will update my github issue to link to the issues you and Nir shared.