Hi
We installed Arrow in the Rstudio environment. We are able to validate Arrow functionality without SparklyR context. With SparklyR we are getting error.
Please help.
Standalone Arrow Test ---------------------------------------------------
library(arrow)
a <- read_csv_arrow(file = "/home/rmunda1/installed_packages.csv")
head(a)
#> # A tibble: 6 x 4
#> `` Package Version Priority
#>
#> 1 abind abind 1.4-5
#> 2 acepack acepack 1.4.1
#> 3 actuar actuar 2.3-1
#> 4 ada ada 2.0-5
#> 5 adabag adabag 4.2
#> 6 ade4 ade4 1.7-13
detach("package:arrow")
SparklyR Arrow Test ---------------------------------------------------With SparklyR:
library(arrow)
ptm <- proc.time()
collected <- sdf_len(sc, 10^6) %>% collect()
#> Error in record_batch_stream_reader(stream): could not find function "record_batch_stream_reader"
proc.time() - ptm
#> user system elapsed
#> 0.120 0.021 5.416
detach("package:arrow")
ptm <- proc.time()
collected <- sdf_len(sc, 10^6) %>% collect()
proc.time() - ptm
#> user system elapsed
#> 0.066 0.015 0.619
Thanks
Karan
Could you share the output of sessioninfo::session_info()
?
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.0 (2019-04-26)
os CentOS Linux 7 (Core)
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2020-04-14
─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
arrow 0.16.0.1 2020-02-10 [2] CRAN (R 3.6.0)
askpass 1.1 2019-01-13 [2] CRAN (R 3.6.0)
assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [2] CRAN (R 3.6.0)
base64enc 0.1-3 2015-07-28 [2] CRAN (R 3.6.0)
bit 1.1-14 2018-05-29 [2] CRAN (R 3.6.0)
bit64 0.9-7 2017-05-08 [2] CRAN (R 3.6.0)
bitops 1.0-6 2013-08-17 [2] CRAN (R 3.6.0)
broom 0.5.2 2019-04-07 [2] CRAN (R 3.6.0)
cellranger 1.1.0 2016-07-27 [2] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [2] CRAN (R 3.6.0)
colorspace 1.4-1 2019-03-18 [2] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [2] CRAN (R 3.6.0)
DBI * 1.0.0 2018-05-02 [2] CRAN (R 3.6.0)
dbplyr 1.4.2 2019-06-17 [2] CRAN (R 3.6.0)
digest 0.6.19 2019-05-20 [2] CRAN (R 3.6.0)
dplyr * 0.8.2 2019-06-29 [2] CRAN (R 3.6.0)
ellipsis 0.3.0 2019-09-20 [2] CRAN (R 3.6.0)
fansi 0.4.0 2018-10-05 [2] CRAN (R 3.6.0)
forcats * 0.4.0 2019-02-17 [2] CRAN (R 3.6.0)
forge 0.2.0 2019-02-26 [2] CRAN (R 3.6.0)
generics 0.0.2 2018-11-29 [2] CRAN (R 3.6.0)
ggplot2 * 3.2.0 2019-06-16 [2] CRAN (R 3.6.0)
glue 1.3.1 2019-03-12 [2] CRAN (R 3.6.0)
gtable 0.3.0 2019-03-25 [2] CRAN (R 3.6.0)
h2o 3.24.0.4 2019-07-02 [2] local
haven 2.1.1 2019-07-04 [2] CRAN (R 3.6.0)
hms 0.5.0 2019-07-09 [2] CRAN (R 3.6.0)
htmltools 0.3.6 2017-04-28 [2] CRAN (R 3.6.0)
htmlwidgets 1.3 2018-09-30 [2] CRAN (R 3.6.0)
httr 1.4.1 2019-08-05 [2] CRAN (R 3.6.0)
jsonlite 1.6 2018-12-07 [2] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.0)
lazyeval 0.2.2 2019-03-15 [2] CRAN (R 3.6.0)
lubridate 1.7.4 2018-04-11 [2] CRAN (R 3.6.0)
magrittr * 1.5 2014-11-22 [2] CRAN (R 3.6.0)
modelr 0.1.4 2019-02-18 [2] CRAN (R 3.6.0)
munsell 0.5.0 2018-06-12 [2] CRAN (R 3.6.0)
nlme 3.1-139 2019-04-09 [2] CRAN (R 3.6.0)
openssl 1.4 2019-05-31 [2] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [2] CRAN (R 3.6.0)
pkgconfig 2.0.2 2018-08-16 [2] CRAN (R 3.6.0)
purrr * 0.3.2 2019-03-15 [2] CRAN (R 3.6.0)
r2d3 0.2.3 2018-12-18 [2] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [2] CRAN (R 3.6.0)
Rcpp 1.0.1 2019-03-17 [2] CRAN (R 3.6.0)
RCurl 1.95-4.12 2019-03-04 [2] CRAN (R 3.6.0)
readr * 1.3.1 2018-12-21 [2] CRAN (R 3.6.0)
readxl 1.3.1 2019-03-13 [2] CRAN (R 3.6.0)
rJava * 0.9-11 2019-03-29 [2] CRAN (R 3.6.0)
RJDBC * 0.2-7.1 2018-04-16 [2] CRAN (R 3.6.0)
rlang * 0.4.0 2019-06-25 [2] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.6.0)
rsparkling * 0.2.25 2019-07-02 [2] local
rstudioapi 0.10 2019-03-19 [2] CRAN (R 3.6.0)
rvest 0.3.4 2019-05-15 [2] CRAN (R 3.6.0)
scales 1.0.0 2018-08-09 [2] CRAN (R 3.6.0)
sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.6.0)
sparklyr * 1.0.1 2019-05-17 [2] CRAN (R 3.6.0)
stringi 1.4.3 2019-03-12 [2] CRAN (R 3.6.0)
stringr * 1.4.0 2019-02-10 [2] CRAN (R 3.6.0)
tibble * 2.1.3 2019-06-06 [2] CRAN (R 3.6.0)
tidyr * 0.8.3 2019-03-01 [2] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [2] CRAN (R 3.6.0)
tidyverse * 1.2.1 2017-11-14 [2] CRAN (R 3.6.0)
utf8 1.1.4 2018-05-24 [2] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [2] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [2] CRAN (R 3.6.0)
xml2 1.2.0 2018-01-24 [2] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [2] CRAN (R 3.6.0)
[1] /home/XXXXXXXX/R/x86_64-redhat-linux-gnu-library/3.6
[2] /usr/lib64/R/library
[3] /usr/share/R/library
@Blair09M - Hi James - Any update on this please?
I haven't been able to reproduce the issue. Do you notice any glaring differences in this reprex?
library(sparklyr)
sc <- spark_connect(master = "local")
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
ptm <- proc.time()
collected <- sdf_len(sc, 10^6) %>% collect()
proc.time() - ptm
#> user system elapsed
#> 0.361 0.036 5.339
detach("package:arrow")
ptm <- proc.time()
collected <- sdf_len(sc, 10^6) %>% collect()
proc.time() - ptm
#> user system elapsed
#> 0.080 0.005 0.585
Created on 2020-04-21 by the reprex package (v0.3.0)
Session info
sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] sparklyr_1.1.0.9001
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.4 compiler_3.6.2 pillar_1.4.3 dbplyr_1.4.2
#> [5] highr_0.8 r2d3_0.2.3 base64enc_0.1-3 tools_3.6.2
#> [9] bit_1.1-15.2 digest_0.6.25 jsonlite_1.6.1 evaluate_0.14
#> [13] tibble_2.1.3 pkgconfig_2.0.3 rlang_0.4.5 DBI_1.1.0
#> [17] rstudioapi_0.11 yaml_2.2.1 parallel_3.6.2 xfun_0.12
#> [21] withr_2.1.2 httr_1.4.1 stringr_1.4.0 dplyr_0.8.5
#> [25] knitr_1.28 vctrs_0.2.4 askpass_1.1 rappdirs_0.3.1
#> [29] generics_0.0.2 htmlwidgets_1.5.1 bit64_0.9-7 rprojroot_1.3-2
#> [33] tidyselect_1.0.0 glue_1.3.2 forge_0.2.0 R6_2.4.1
#> [37] rmarkdown_2.1 purrr_0.3.3 magrittr_1.5 backports_1.1.5
#> [41] htmltools_0.4.0 ellipsis_0.3.0 assertthat_0.2.1 renv_0.9.3-58
#> [45] arrow_0.17.0 config_0.3 stringi_1.4.6 openssl_1.4.1
#> [49] crayon_1.3.4