Spark R - Setting Max Columns

Hi all,

I would like to read a csv file (with ~ 100.000 columns) using spark_read_csv. I could succeed with small datasets containing few hundreds columns but failed with larger dataset. The error message mentioning to change an option : settings.setMaxColumns(int) because I have exceed the limit of 20480 columns.

I've checked in the sparklyr vignette and couldn't find this option. Only thing I found was not useful at all.
option : A list of strings with additional options. Super not useful ! Do you know where I can find the list of available options for spark_read_csv in R ? Tried to google it but I was not successful.

The only I could find is maxCharsPerColumn (see below).. but not the one I need:

DB <- spark_read_csv(sc, "DB.csv", options = list(maxCharsPerColumn = 100000L))

Thanks !

This thread explains how to configure that option.

You can specify thee maxColumns option as follows,

csv_data <- spark_read_csv(sc,
                           path = "path-to-csv.csv",
                           options = list(maxColumns = 100000L))

All the options are available here, created this suggestion in GitHub repo as well.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.