Input File Name Function w/Sparklyr

nicholassharkey · April 15, 2020, 11:18am

In Spark's Python API I can add the file name easily with the following code:

df = spark.read.load(input_csv_file_path, format="csv", header='true', inferSchema='true')
df = df.withColumn("input_file", f.input_file_name())

In SparkR there is an input_file_name function in the documentation https://spark.apache.org/docs/latest/api/R/index.html but the documentation merely says input_file_name(x = "missing") and I don't understand its usage.

I tried using this example from S/O for the following R code:

library(sparklyr)
library(tidyverse)


input_csv_file_path <- '/Users/me/my_path/*.csv'

df <- spark_read_csv(sc, name = 'df', path = input_csv_file_path)
df <- df %>% mutate(id = input_file_name())

df1 <- as.data.frame(df)

There is no error but the id field is blank.

nicholassharkey · April 22, 2020, 11:19am

Hoping to keep this open long enough for an answer.

system · May 13, 2020, 11:20am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.