I've tried to change the summarise_all()
with summarise(across())
equivalent in the following code, but I've got an error.
library(sparklyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
sc <- spark_connect('local', version = '2.4')
data <- copy_to(sc, mtcars)
# 1st query
data %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise_all(mean)
#> Warning: Missing values are always removed in SQL.
#> Use `mean(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: spark<?> [?? x 12]
#> transmission mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 manual 24.4 5.08 144. 127. 4.05 2.41 17.4 0.538 1 4.38 2.92
#> 2 automatic 17.1 6.95 290. 160. 3.29 3.77 18.2 0.368 0 3.21 2.74
# 2nd query
data %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise(across(.fns = mean))
#> Error: Can't rename variables in this context.
Created on 2021-02-05 by the reprex package (v0.3.0)
emman
February 5, 2021, 9:57am
2
I can't run the following line for some reason:
sc <- spark_connect('local', version = '2.4')
In any case, the problem might be that you need summarise(across(everything(), mean))
to specify that you want to summarize across all columns:
mtcars %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise(across(everything(), mean))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 12
## transmission mpg cyl disp hp drat wt qsec vs am gear carb
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 automatic 17.1 6.95 290. 160. 3.29 3.77 18.2 0.368 0 3.21 2.74
## 2 manual 24.4 5.08 144. 127. 4.05 2.41 17.4 0.538 1 4.38 2.92
emman:
mtcars %>%
mutate(transmission = ifelse(am == 0, "automatic", "manual")) %>%
group_by(transmission) %>%
summarise(across(everything(), mean))
my code will work in R as you said, but when I use it on a data form spark ( data
) it makes an error.
are you sure that you installed the sparklyr
and your spark
version is 2.4
?
emman
February 5, 2021, 10:23am
4
I used install.packages("sparklyr")
....
now use sparklyr::spark_installed_versions()
and replace 2.4
with your spark version and then run the code.
i found the answer to this problem.
the support for summarise(across(...)
is added to the new version of sparklyr
.
you sould install the new version of sparklyr
from github.
system
Closed
February 17, 2021, 11:24am
7
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.