Which one? If you mean the second part (with table(nb_cols)
), that's on purpose: I read each file into a list without assumptions on the columns, then we can analyze this list with ncol()
or colnames()
to see what is in the files and decide about the right way to process them. Then, we could either reprocess this list before assembling with bind_rows()
, or write a new function to read and pre-process each file, and start in a fresh script.
If it's my alternate read_dat()
, this is an example of such a function that pre-processes each file when reading, and before trying to assemble them. The assumption is that some files have 20
columns, of names given by mycols
, and some files have 29
columns, including the 20
columns from mycols
plus others that we're not interested in. In that case, this function first reads the file, if it had 20 columns we're done, if it had 29 columns, we select the ones with names in mycols
and return this subset dataframe.
I might be misunderstanding what you want: I thought you had a bunch of files with 20 or 29 columns, and wanted to assemble them in a single dataframe. For this you first need to ensure they have the same columns, then assemble them with dplyr::bind_rows()
or do.call(rbind)
or purrr::list_rbind()
or something.
If you want to have separate lists depending on the number of columns in the input file, I think the easiest is to first read all files in a list (where the list elements have varying numbers of columns), then use split()
to split the list into multiple lists, then you can bind_rows
each of these lists. Something like that (untested code):
measurement_list <- lapply(file_list, read_dat)
nb_cols <- sapply(measurement_list, ncol)
list_of_list_of_dfs <- split(measurement_list, nb_cols)
list_dfs_by_ncol <- lapply(list_of_list_of_dfs, bind_rows)
If I'm still misunderstanding, maybe it would help to re-explain what the input files look like and what the desired output is.