I'd like to use the col_types parameter in read_csv to dynamically specify column types based on known column types (say from a separate metadata file).
Why doesn't the approach outlined below work and is there any easier/better way to do it?
I've been using the following approach:
- get a list of the column types from a .csv using spec_csv()
- test whether the column types/formatting match the external source (metadata)
- when mismatches occur, update the spec_csv() output to match the external source (e.g. metadata)
- use the updated spec_csv output as the input to col_type() when using read_csv().
This seems to work well for dates and numerics, but not for factors.
For example:
#generate testing .csv file:
readr::write_csv(iris, "iris.csv")
# get col specs from csv
spec <- spec_csv("iris.csv")
# edit/update the spec_csv output:
# (for this example it's easy to do it by hand, but imagine there are hundreds or thousands of columns that need to be specified)
class(spec$cols$Species ) <- "col_factor"
spec$cols$Species$ordered <- FALSE
spec$cols$Species$include_na <- FALSE
factors <- c("virginica", "setosa", "versicolor")
spec$cols$Species$levels <- factors
# use updated spec_csv output to specify columns types:
# this does not generate any errors, but also does not change the column type to factor:
test <- readr::read_csv("iris.csv", col_type = spec)
is.factor(test$Species) #FALSE
#this generates an error:
test <- readr::read_csv("iris.csv", col_type = list(spec))
#Error: Some `col_types` are not S3 collector objects: 1
#generates the same error as above... at this point, it's just trial and error on my end, which is why I'm posting here:
test <- readr::read_csv("iris.csv", col_type = cols(spec))
#I also tried converting spec into a col_spec. As I suspect it also failed:
spec2 <- as.col_spec(spec)
test <- readr::read_csv("iris.csv", col_type = spec2)
What does this error mean? How do I resolve it? Is there a better way to dynamically specify column types when using read_csv?