Deviance Residuals:
Min 1Q Median 3Q Max
-3.47339 -1.37936 -0.06554 1.05105 4.39057
Coefficients:
(Intercept) cyl_cyl_8.0 cyl_cyl_4.0 disp hp drat
16.15953652 3.29774653 1.66030673 0.01391241 -0.04612835 0.02635025
wt qsec vs am gear carb
-3.80624757 0.64695710 1.74738689 2.61726546 0.76402917 0.50935118
R-Squared: 0.8816
Root Mean Squared Error: 2.041
Is there a way to display standard errors when running this regression?
Is there a way to cluster standard errors in sparklyr?
I have also been trying to run a linear model with multiple group fixed effects in sparklyr. In base R, I have done so with felm. Does anyone have experience doing this in sparklyr?
Solutions using SparkR are also highly appreciated.
For question 1, you can print the standard error of the coefficients and the intercept with the following:
library(sparklyr)
spark_version <- "2.4.4" # This is the version of Spark I ran this example code with,
# but I think everything that follows should work in all versions of Spark anyways
sc <- spark_connect(master = "local", version = spark_version)
cached_cars <- copy_to(sc, mtcars)
model <- cached_cars %>%
ml_linear_regression(mpg ~ .)
coeff_std_errs <- invoke(model$model$.jobj, "summary") %>%
invoke("coefficientStandardErrors") %>%
print(coeff_std_errs)
We probably should make those numbers part of the summary output in R.
I'm not sure if I understood what question 2 and question 3 meant exactly. Please elaborate, with a small example, or a link to relevant maths formula, if possible. I'll be more than happy to see what can be done in sparklyr to address those use cases.
Regarding 2) and 3) Spark ML doesn't support multilevel modeling. A quick search turned up https://github.com/linkedin/photon-ml which might be worth considering if it has features many users want.