Hi, I was just reading the article https://spark.rstudio.com/
But I am not sure what is the difference between working on R directly working under install.packages("sparklyr")
packages,
Could you let me know, I am confused
Hi, I was just reading the article https://spark.rstudio.com/
But I am not sure what is the difference between working on R directly working under install.packages("sparklyr")
packages,
Could you let me know, I am confused
When you work with sparklyr
your data gets copied into a spark instance (think of it as a database engine), then you can manipulated it using dplyr
-like commands, but under the hood all the processing gets done by spark much faster than it's possible on R.
Cool. Here is the sample I got.
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "2.1.0")
sc <- spark_connect(master = "local")
library(dplyr)
iris_tbl <- copy_to(sc,iris)
Here Iris dataset is getting copied to spark right? So in order to copy the data into a spark, first we need to load data into R right? If that is the case, my data is very huge, I cannot even import in R due to which I cannot copy into Spark.
please correct me if my understanding is wrong
You can read tabular data directly into spark with spark_read_csv()
Thanks let me try ............
Hi I tried to load the data using spark_read_csv()
using below
spark_read_csv(sc,name = "as", path = "D:/New folder/Copy.csv",header = TRUE)
It is working. But the entire rows are not getting extracted. There are 2 lakh rows and only 1000 rows are extracted. May I know why?
Sorry I can't reproduce your issue, could you give any other details that might be relevant?
Take a look at this book to learn how to work with sparklyr
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.