I am trying to run the sparklyr
package on RStudio Cloud. Installing works, connecting sometimes works sometimes I get this error:
Invalid maximum heap size: -Xmx0.8
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
When it works I still have problems copying data to spark using the copy_to()
or spark_read_csv()
commands. Both do not seem to work.
Unfortunately, rstudio.cloud instances are currently limited to 1GB of RAM so isn't a great place to be running Spark.
It is for teaching so an example with not too much data would be fine as well. When I configure spark to use less than 1GB it still gives problems.
josh
July 6, 2018, 1:07pm
4
"0.8" is not a valid value for -Xmx
. You need to supply an integer, and it can be followed by "k", "m" or "g".
https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html#BABHDABI
Thanks! Now I get a different error when I run the copy_to()
command:
Error: Unexpected state in sparklyr backend, terminating connection: failed to invoke spark command
18/07/07 08:45:05 INFO DAGScheduler: Job 0 finished: collect at utils.scala:43, took 1.197268 s
josh
July 7, 2018, 1:39pm
6
Just before running copy_to()
, what is the output of gc()
?
My guess is you are running out of memory.
The result of the gc()
command just before running copy_to()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1093099 58.4 2033582 108.7 2033582 108.7
Vcells 6843222 52.3 22637548 172.8 18639753 142.3