Read Data example in spark.rstudio.com/get-started/read-data.html# does not show how to read data

Harlananelson · August 24, 2022, 3:57pm

Am I not getting this? Looking for a good example of reading an RDD using sparklyr. Would expect a section labeled "Reading Data" to show that but instead it shows how to write data to an RDD, then skips an example of reading the data.

library(sparklyr)

sc <- spark_connect(master = "local")

tbl_mtcars <- copy_to(sc, mtcars, "spark_mtcars")

It can be seen that copy_to is writing not reading.

Something closer to

dplyr::tbl(sc, in_schema("schema", "tablename")

would use dplyr to read a table, but I was looking for working syntax.

So after the copy_to statement, the tutorial should include the sanity check

src_databases(sc)

so the user can verify the copy,
then (assuming this works)

flights_tbl  <- dplyr::tbl(sc,'flights'))

This last step is not necessary for example to work, but is necessary for people who did not first have to copy the data to an RDD to know how to read it.

I would prefer an example using dbplyr::in_schema or DBI::id because few people are accessing data that is not stored in a schema. This is the other problem with many of these types of examples, not showing a read from a schema makes the example unusable for most people working in a production environment. It makes the r language look like something only hobbyists use.

What a rant, but this type of problem occurs too often.

system · September 14, 2022, 3:57pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.