Hello everyone,
The error occurred while I was using sparklyr with an Apache Spark cluster. The actual dataset is sensitive so I am going to give a reprex with mtcars on my laptop.
I am unable to reproduce the error in this reprex, but the error message I get for the real dataset I am using is:
Error during wrapup: object "column1" not found.
This is strange since " column1" actually exists in the spark dataframe.
library(sparklyr)
# spark_install()
options(sparklyr.java9 = TRUE)
sc <- spark_connect(master = "local")
# Data --------------------------------------------------------------------
# I am going to use mtcars as an example (My actual dataset is sensitive, so
# I am unable to share that)
data <- head(mtcars, 6)
data
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
expr <- "dplyr::case_when(carb <= 2 ~'0 to 2', carb <= 4 ~'3 to 4', TRUE ~ '5+')"
col_name <- "carb"
# Works OK for local dataframes -------------------------------------------
data_local <- data
data_local %>%
dplyr::mutate(
!!col_name := !!rlang::parse_expr(expr)
)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 3 to 4
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 3 to 4
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 0 to 2
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 0 to 2
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 0 to 2
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 0 to 2
# This example works here, but doesn't work on my actual dataset ----------
data_remote <- dplyr::copy_to(dest = sc,
df = data,
name = "data_remote",
overwrite = TRUE)
data_remote %>%
dplyr::mutate(
!!col_name := !!rlang::parse_expr(expr)
)
#> # Source: spark<?> [?? x 11]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 3 to 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 3 to 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 0 to 2
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 0 to 2
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 0 to 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 0 to 2
Created on 2019-11-29 by the reprex package (v0.3.0)
Many Thanks