problem with replacing summarise_all() with summarise(across()) in sparklyr

I've tried to change the summarise_all() with summarise(across()) equivalent in the following code, but I've got an error.

library(sparklyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

sc <- spark_connect('local', version = '2.4')
data <- copy_to(sc, mtcars)

# 1st query
data %>%
  mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
  group_by(transmission) %>%
  summarise_all(mean)
#> Warning: Missing values are always removed in SQL.
#> Use `mean(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: spark<?> [?? x 12]
#>   transmission   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 manual        24.4  5.08  144.  127.  4.05  2.41  17.4 0.538     1  4.38  2.92
#> 2 automatic     17.1  6.95  290.  160.  3.29  3.77  18.2 0.368     0  3.21  2.74

# 2nd query
data %>%
  mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
  group_by(transmission) %>%
  summarise(across(.fns = mean))
#> Error: Can't rename variables in this context.

Created on 2021-02-05 by the reprex package (v0.3.0)

I can't run the following line for some reason:
sc <- spark_connect('local', version = '2.4')

In any case, the problem might be that you need summarise(across(everything(), mean)) to specify that you want to summarize across all columns:

mtcars %>%
  mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
  group_by(transmission) %>%
  summarise(across(everything(), mean))

## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 12
##   transmission   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 automatic     17.1  6.95  290.  160.  3.29  3.77  18.2 0.368     0  3.21  2.74
## 2 manual        24.4  5.08  144.  127.  4.05  2.41  17.4 0.538     1  4.38  2.92

my code will work in R as you said, but when I use it on a data form spark ( data) it makes an error.

are you sure that you installed the sparklyr and your spark version is 2.4?

I used install.packages("sparklyr")....

now use sparklyr::spark_installed_versions()and replace 2.4 with your spark version and then run the code.

i found the answer to this problem.
the support for summarise(across(...) is added to the new version of sparklyr.
you sould install the new version of sparklyr from github.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.