readr include_spec = FALSE

brianstamper · April 10, 2020, 5:22pm

I frequently read delimited data that may have hundreds of columns and I am specifying that I only want a certain few using col_types = cols_only(foo = 'c', bar = 'i'). This leaves me with a tibble whose spec attribute includes a giant list of columns I didn't read labelled col_skip(). This is not particularly useful information, and on the Environment pane in RStudio this spec attribute information tends to get in the way.

Currently I just put a line like attr(my_tbl, 'spec') <- NULL after each read_*, but it looks and feels ungraceful to do so. Surely there is a better way?

jimhester · April 10, 2020, 5:36pm

The spec attribute is removed as soon as the tibble is subset in any way, including an empty subset, put a [] after your reading code if you are concerned about this.

x <- readr::read_csv("a,b,c\n1,2,3\n")
str(x)
#> tibble [1 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#>  $ a: num 1
#>  $ b: num 2
#>  $ c: num 3
#>  - attr(*, "spec")=
#>   .. cols(
#>   ..   a = col_double(),
#>   ..   b = col_double(),
#>   ..   c = col_double()
#>   .. )
str(x[])
#> tibble [1 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ a: num 1
#>  $ b: num 2
#>  $ c: num 3

^{Created on 2020-04-10 by the reprex package (v0.3.0)}

brianstamper · April 10, 2020, 6:03pm

This doesn't fix the appearance in the Environment pane, one needs to actually remove the attributes. I usually avoid modifying objects so the result of a subsetting operation would be assigned to some other object (i.e., I avoid foo <- foo %>% ...), so the original attributes from the read operation stick around.

If I am specifying the columns and there is no "guess" happening, why is the spec attribute included?

jimhester · April 10, 2020, 6:04pm

x <- readr::read_csv("a,b,c\n1,2,3\n")[]

brianstamper · April 13, 2020, 2:21pm

I see. I may just stick with the attr<- version since it makes it more clear what I am doing. I can imagine someone looking at my code and wondering why there's a [] at the end of my reads.

In my probably biased opinion, perhaps there should be an "include_spec" parameter, defaulting to include_spec = is.null(col_types).

system · May 4, 2020, 2:21pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.