sparklyr + dplyr + rlang: Error during wrapup: object "column" not found

Hello everyone,

The error occurred while I was using sparklyr with an Apache Spark cluster. The actual dataset is sensitive so I am going to give a reprex with mtcars on my laptop.

I am unable to reproduce the error in this reprex, but the error message I get for the real dataset I am using is:
Error during wrapup: object "column1" not found.

This is strange since " column1" actually exists in the spark dataframe.

library(sparklyr)
# spark_install()
options(sparklyr.java9 = TRUE)
sc <- spark_connect(master = "local")

# Data --------------------------------------------------------------------
# I am going to use mtcars as an example (My actual dataset is sensitive, so
# I am unable to share that)
data <- head(mtcars, 6)
data
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

expr <- "dplyr::case_when(carb <= 2 ~'0 to 2', carb <= 4 ~'3 to 4', TRUE ~ '5+')"
col_name <- "carb"

# Works OK for local dataframes -------------------------------------------
data_local <- data
data_local %>% 
  dplyr::mutate(
    !!col_name := !!rlang::parse_expr(expr)
  )
#>    mpg cyl disp  hp drat    wt  qsec vs am gear   carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4 3 to 4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4 3 to 4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4 0 to 2
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3 0 to 2
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3 0 to 2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3 0 to 2

# This example works here, but doesn't work on my actual dataset ----------
data_remote <- dplyr::copy_to(dest = sc, 
                                df = data, 
                                name = "data_remote", 
                                overwrite = TRUE)

data_remote %>% 
  dplyr::mutate(
    !!col_name := !!rlang::parse_expr(expr)
  )
#> # Source: spark<?> [?? x 11]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear carb  
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> 
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4 3 to 4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4 3 to 4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4 0 to 2
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3 0 to 2
#> 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3 0 to 2
#> 6  18.1     6   225   105  2.76  3.46  20.2     1     0     3 0 to 2

Created on 2019-11-29 by the reprex package (v0.3.0)

Many Thanks

So if the error were to occur in the example above, it would have read:
Error during wrapup: object "carb" not found.

Have you tried upgrading packages?

In my environment, I see no error:

data_remote %>% 
  dplyr::mutate(
    !!col_name := !!rlang::parse_expr(expr)
  )
# Source: spark<?> [?? x 11]
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear carb  
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> 
1  21       6   160   110  3.9   2.62  16.5     0     1     4 3 to 4
2  21       6   160   110  3.9   2.88  17.0     0     1     4 3 to 4
3  22.8     4   108    93  3.85  2.32  18.6     1     1     4 0 to 2
4  21.4     6   258   110  3.08  3.22  19.4     1     0     3 0 to 2
5  18.7     8   360   175  3.15  3.44  17.0     0     0     3 0 to 2
6  18.1     6   225   105  2.76  3.46  20.2     1     0     3 0 to 2
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sparklyr_1.0.5.9000

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       pillar_1.4.2     compiler_3.5.1   dbplyr_1.4.2     r2d3_0.2.3      
 [6] base64enc_0.1-3  tools_3.5.1      zeallot_0.1.0    digest_0.6.20    packrat_0.5.0   
[11] jsonlite_1.6     tibble_2.1.3     pkgconfig_2.0.2  rlang_0.4.0      DBI_1.0.0       
[16] cli_1.1.0        rstudioapi_0.10  curl_4.2         yaml_2.2.0       parallel_3.5.1  
[21] withr_2.1.2      httr_1.4.1       dplyr_0.8.3      askpass_1.1      generics_0.0.2  
[26] pins_0.3.0       vctrs_0.2.0      htmlwidgets_1.3  rappdirs_0.3.1   rprojroot_1.3-2 
[31] tidyselect_0.2.5 glue_1.3.1       forge_0.2.0      R6_2.4.0         fansi_0.4.0     
[36] purrr_0.3.2      magrittr_1.5     backports_1.1.4  htmltools_0.3.6  ellipsis_0.2.0.1
[41] assertthat_0.2.1 config_0.3       utf8_1.1.4       openssl_1.4.1    crayon_1.3.4 
2 Likes

Awesome! The error is due to package versions.
Our packages in production are quite old.
Thanks for your time, @javierluraschi

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.