SparkR: window function count() using gapply

I'm trying to implement simple query on Spark using "gapply", but face troubles.
This code works well.

        library(dplyr)
       df <- createDataFrame(iris)
       createOrReplaceTempView(df, "iris")
      display(SparkR::sql("SELECT *, COUNT(*) OVER(PARTITION BY Species) AS RowCount FROM iris"))

But I can't realize it via gapply

display(df %>% SparkR::group_by(df$Species) 
           %>% gapply(function(key, x) { y <- data.frame(x, SparkR::count()) }, 
               "Sepal_Length double, Sepal_Width double, Petal_Length double, Petal_Width double, Species string, RowCount integer"))

returns error

SparkException: R unexpectedly exited. Caused by: EOFException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 235.0 failed 4 times, most recent failure: Lost task 0.3 in stage 235.0 (TID 374) (10.150.202.5 executor 1): org.apache.spark.SparkException: R unexpectedly exited. R worker produced errors: Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘count’ for signature ‘"missing"’ Calls: compute ... computeFunc -> data.frame -> -> Execution halted

Is it possible to implement the window function "count" with gapply using pipes from dplyr ?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.