I've tried to change the summarise_all() with summarise(across()) equivalent in the following code, but I've got an error.
library(sparklyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
sc <- spark_connect('local', version = '2.4')
data <- copy_to(sc, mtcars)
# 1st query
data %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise_all(mean)
#> Warning: Missing values are always removed in SQL.
#> Use `mean(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: spark<?> [?? x 12]
#> transmission mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 manual 24.4 5.08 144. 127. 4.05 2.41 17.4 0.538 1 4.38 2.92
#> 2 automatic 17.1 6.95 290. 160. 3.29 3.77 18.2 0.368 0 3.21 2.74
# 2nd query
data %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise(across(.fns = mean))
#> Error: Can't rename variables in this context.
Created on 2021-02-05 by the reprex package (v0.3.0)
emman
February 5, 2021, 9:57am
2
I can't run the following line for some reason:
sc <- spark_connect('local', version = '2.4')
In any case, the problem might be that you need summarise(across(everything(), mean)) to specify that you want to summarize across all columns:
mtcars %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise(across(everything(), mean))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 12
## transmission mpg cyl disp hp drat wt qsec vs am gear carb
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 automatic 17.1 6.95 290. 160. 3.29 3.77 18.2 0.368 0 3.21 2.74
## 2 manual 24.4 5.08 144. 127. 4.05 2.41 17.4 0.538 1 4.38 2.92
emman:
mtcars %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise(across(everything(), mean))
my code will work in R as you said, but when I use it on a data form spark ( data) it makes an error.
are you sure that you installed the sparklyr and your spark version is 2.4?
emman
February 5, 2021, 10:23am
4
I used install.packages("sparklyr")....
now use sparklyr::spark_installed_versions()and replace 2.4 with your spark version and then run the code.
i found the answer to this problem.
the support for summarise(across(...) is added to the new version of sparklyr.
you sould install the new version of sparklyr from github.
system
Closed
February 17, 2021, 11:24am
7
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.