Hello everyone...
We've installed the latest versions of R (4.4.3) and Posit Rstudio (2024.12.1-563) and when we do the spark connection sends the next warning:
*In sprintf(versions$pattern, version$spark, version$hadoop) : 2 arguments not used by format 'spark-3.5.5-bin-hadoop3'
Although the program works, I don't like warning messages, here our code:
library(sparklyr)
library(dplyr)
setwd("ruta")
ti <- proc.time()
conf <- spark_config()
conf$`sparklyr.shell-driver-memory` <- "16G"
conf$spark.memory.fraction <- 0.7
sparkc <- spark_connect(master = "local",
version = "3.5.5",
config = conf)
spark_version(sparkc)
DatChile <- spark_read_csv(sparkc, name = "datChile",
path = "Microdato_Censo2017-Personas.csv",
delimiter = ";", header = TRUE)
proc.time() - ti
ti <- proc.time()
cnt <- DatChile %>% tally()
cnt
proc.time() - ti
spark_disconnect(sparkc)
We have Java 17 and Apache Spark 3.5.5
Thank you very much for your attention.
Miguel Araujo