We have a new rstudio-server
setup on JupyterHub built via Jupyter rsession-proxy
as available here (not sure how relevant this is):
https://github.com/jupyterhub/jupyter-rsession-proxy/blob/master/jupyter_rsession_proxy/init.py
I am trying to do some simple commands in SparkR
but hit an error that doesn't happen in terminal or IRKernel
of Jupyter notebook:
The following is in ~/.Rprofile
DEFAULT_CONFIG = list(
spark.cores.max = '8',
spark.sql.sources.partitionColumnTypeInference.enabled = 'false',
spark.executor.memory = '1g',
spark.task.maxFailures = '10',
spark.kubernetes.container.image = 'x' # suppressed
)
.start_sandbox = function() {
library(SparkR, pos = 3L)
library(magrittr, pos = 3L)
eval(substitute(sparkR.session(
appName = 'myTestApp',
enableHiveSupport = TRUE,
sparkConfig = DEFAULT_CONFIG
)), envir = .GlobalEnv)
}
Now if I start Rstudio and run
.start_sandbox()
iris = iris
names(iris) = gsub('.', '_', names(iris), fixed = TRUE)
irisSDF = createDataFrame(iris)
irisSDF %>% head
Errors:
Error in (function (cl, name, valueClass) :
assignment of an object of class "list" is not valid for @'sdf' in an object of class "SparkDataFrame"; is(value, "jobj") is not TRUE
But works as expected on R invoked from IRKernel or from terminal R:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Also works if we try to get the head
by piping without assignment:
iris %>% createDataFrame %>% head
The only difference I can see that might lead to this is in how R is invoked:
commandArgs()
# RStudio
# [1] "RStudio" "--interactive"
# Terminal R
# [1] "/opt/conda/lib/R/bin/exec/R"
# Jupyter IRKernel
# [1] '/opt/conda/lib/R/bin/exec/R' '--slave' '-e' 'IRkernel::main()' '--args' '/home/jovyan/.local/share/jupyter/runtime/kernel-4696a305-a022-45a0-be30-74bb2d4e8fa4.json'
Any idea what is happening here?