I frequently read delimited data that may have hundreds of columns and I am specifying that I only want a certain few using col_types = cols_only(foo = 'c', bar = 'i'). This leaves me with a tibble whose spec attribute includes a giant list of columns I didn't read labelled col_skip(). This is not particularly useful information, and on the Environment pane in RStudio this spec attribute information tends to get in the way.
Currently I just put a line like attr(my_tbl, 'spec') <- NULL after each read_*, but it looks and feels ungraceful to do so. Surely there is a better way?
The spec attribute is removed as soon as the tibble is subset in any way, including an empty subset, put a [] after your reading code if you are concerned about this.
x <- readr::read_csv("a,b,c\n1,2,3\n")
str(x)
#> tibble [1 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#> $ a: num 1
#> $ b: num 2
#> $ c: num 3
#> - attr(*, "spec")=
#> .. cols(
#> .. a = col_double(),
#> .. b = col_double(),
#> .. c = col_double()
#> .. )
str(x[])
#> tibble [1 × 3] (S3: tbl_df/tbl/data.frame)
#> $ a: num 1
#> $ b: num 2
#> $ c: num 3
This doesn't fix the appearance in the Environment pane, one needs to actually remove the attributes. I usually avoid modifying objects so the result of a subsetting operation would be assigned to some other object (i.e., I avoid foo <- foo %>% ...), so the original attributes from the read operation stick around.
If I am specifying the columns and there is no "guess" happening, why is the spec attribute included?
I see. I may just stick with the attr<- version since it makes it more clear what I am doing. I can imagine someone looking at my code and wondering why there's a [] at the end of my reads.
In my probably biased opinion, perhaps there should be an "include_spec" parameter, defaulting to include_spec = is.null(col_types).