spark_apply() signature is as follows:
spark_apply <- function(x,
f,
columns = NULL,
memory = TRUE,
group_by = NULL,
packages = NULL,
context = NULL)
Meaning, you can only pass one data frame, if you need to pass two data frames, you have two options.
If the data is small enough, you can use the context parameter as follows:
analysis1 <- analysis %>%
filter(count() > 1 & k < count()) %>%
spark_apply(function(select_data, context) {
if(select_data$type == 'offer'){
mutate(context$analysis1, Won_offer = 1)
}
}, group_by = c("updated_actual_related_customer", "sku"), context = list(analysis1 = analysis1)) %>%
compute("analysis1")
If the analysis1 data is too large, you would have to join first the two data frames by row with dplyr and then operate over the joined data frame inside spark_apply().