sparklyr: Worker failed to run rscript: java.io.IOException: Cannot run program "Rscript" (in directory "."): error=2, No such file or directory

I am facing below error while executing the script in Rstudio,

(19/09/19 09:24:25 ERROR sparklyr: Worker (8951) failed to run rscript: ,java.io.IOException: Cannot run program "Rscript" (in directory "."): error=2, No such file or directory)
19/09/19 09:24:25 WARN storage.BlockManager: Putting block rdd_25_0 failed due to an exception
19/09/19 09:24:25 WARN storage.BlockManager: Block rdd_25_0 could not be removed as it was not found on disk or in memory
19/09/19 09:24:25 ERROR executor.Executor: Exception in task 0.3 in stage 5.0 (TID 9)
java.io.IOException: Cannot run program "Rscript" (in directory "."): error=2, No such file or directory

As far as I know it is not able to find Rscript path in worker nodes,
but the worker nodes are installed with R version. By default I can set Rscript path in /.bash_profile.

I tried below cmd to set path,
conf[["spark.r.command"]] <- "/sys_apps_01/R/R-3.2.0/bin/Rscript"
but it is giving me below error,

19/09/19 08:36:34 INFO sparklyr: Worker (7155) launching command /sys_apps_01/R/R-3.2.0/bin/Rscript --vanilla /dfs/8/yarn/nm/usercache/e090479/appcache/application_1568480968769_12575/container_e87_1568480968769_12575_01_000003/./sparkworker.R 7155 41119 FALSE;8880;localhost;FALSE
19/09/19 08:36:34 INFO sparklyr: Worker (7155) is starting R process
/sys_apps_01/R/R-3.2.0/bin/exec/R: error while loading shared libraries: libgfortran.so.3: cannot open shared object file: No such file or directory
19/09/19 08:36:34 INFO sparklyr: Worker (7155) completed wait using lock for RScript

Finally what I am thinking is I have to do source of R profile,
source /sys_apps_01/R/R-3.2.0/profile/env.sh

Is there anyway to run the above source cmd in worker nodes, before executing other commands from Rstudio ?

All the nodes in your Spark cluster need to have the same system libraries installed for spark_apply() to properly work, see spark.rstudio.com/guides/distributed-r/#requirements.

My guess is that something like sudo apt-get install libgfortran3 was run in the driver node, but not in each worker node in the cluster. sparklyr does not provide support to install any system dependencies at the moment; therefore, the recommended approach is to initialize a Spark cluster with all the system dependencies required by all tools and R packages. sparklyr will transfer packages across nodes but does not auto-install system dependencies.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.