I was wondering if there is a way to apply a cols() object to an existing data_frame instead of just when using read_csv.
For example, for several reasons I'm using fread() to read some data (mostly because of automatically detecting how many rows to skip when reading a csv). But down the line I'd rather use data_frame instead of data.table. And I want to make sure the file read has the correct column types, which is also saved somewhere.
So I want some function apply_col_types() that could work like this:
dt <- data.table(A = c('1','2','3',NA,'5'), B = rep('a',5)) # this would actually be coming from fread
cl <- c('n', 'c') # this is also saved somewhere upstream
df <- dt %>% as_data_frame() %>% apply_col_types(cl)
library(data.table)
dt <- data.table(A = c('1','2','3',NA,'5'), B = rep('a',5))
cl <- c('n', 'c')
colnames(dt) <- cl
dt
#> n c
#> 1: 1 a
#> 2: 2 a
#> 3: 3 a
#> 4: <NA> a
#> 5: 5 a
@technocrat I think @leobarlach means checking / applying columns type and not renaming column.
@leobarlach in data.table, fread already guess and apply column types, it is the same feature than readr. There is also the colClasses argument to override the default guessing. Is it not enough ? Are the classes determined by fread good or not ?
But as mentioned by others I think there are possibly better approaches, e.g. using the colClasses argument in fread.
Or if you want to use readr for everything could you read the first few rows in readr with n_max = 10 or similar to determine which columns you want to keep and generate the cols() specification, then read the rest with that specification?
A more complete example would help us suggest better alternatives.
@leobarlach If your question's been answered would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: