What are permissible column objects of the form "col_*()" used in readr?

jesse · June 1, 2018, 10:50pm

readr::read_csv is misreading some column types in a file I am loading so I want to use cols to set them manually. In ?read_csv it says the col_types argument should be "One of ‘NULL’, a ‘cols()’ specification, or a string. See ‘vignette("column-types")’ for more details." Well, vignette("column-types") gives vignette("column-types") not found so I tried ?cols. It says it accepts "column objects created by ‘col_*() or their abbreviated character names." What are the acceptible functions or abbreviated character names and where do I find that information? Readr 1.1.1 btw.

jcblum · June 2, 2018, 12:55am

Check out the Column Parsers section of the readr documentation site:
https://readr.tidyverse.org/reference/index.html#section-column-parsers

The broken vignette reference is definitely annoying. On the upside, the readr site’s documentation has been fixed in that regard (I guess since it reflects the latest development version?)

Leon · June 2, 2018, 7:39am

It doesn't per se misread the column types - It will look at the first 1,000 entries and then based on that make an educated guess as to the type of the column. If you after the first 1,000 entries somehow have an entry, which defines the column type to be different than the first 1,000 then you can get some quirky behaviour. Take a look at n_max, which can be used to fix issues:

> ?read_csv
read_csv(file, col_names = TRUE, col_types = NULL,
  locale = default_locale(), na = c("", "NA"), quoted_na = TRUE,
  quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf,
  guess_max = min(1000, n_max), progress = show_progress())```