In Spark's Python API I can add the file name easily with the following code:
df = spark.read.load(input_csv_file_path, format="csv", header='true', inferSchema='true')
df = df.withColumn("input_file", f.input_file_name())
In SparkR there is an input_file_name
function in the documentation https://spark.apache.org/docs/latest/api/R/index.html but the documentation merely says input_file_name(x = "missing")
and I don't understand its usage.
I tried using this example from S/O for the following R code:
library(sparklyr)
library(tidyverse)
input_csv_file_path <- '/Users/me/my_path/*.csv'
df <- spark_read_csv(sc, name = 'df', path = input_csv_file_path)
df <- df %>% mutate(id = input_file_name())
df1 <- as.data.frame(df)
There is no error but the id
field is blank.