Adding a new column with filenames for the list of files in a for loop in R

I have a time series data. I stored the data in txt files under daily subfolders in Monthly folders.

setwd(".../2018/Jan")
parent.folder <-".../2018/Jan"  
sub.folders <- list.dirs(parent.folder, recursive=TRUE)[-1] #To read the sub-folders under parent folder
r.scripts <- file.path(sub.folders)
A_2018 <- list()
for (j in seq_along(r.scripts)) {
  A_2018[[j]] <- dir(r.scripts[j],"\\.txt$")}

Of these .txt files, I removed some of the files which I don't want to use for the further analysis, using the following code.

trim_to_two <- function(x) {
  runs = rle(gsub("^L1_\\d{4}_\\d{4}_","",x))
  return(cumsum(runs$lengths)[which(runs$lengths > 2)] * -1)
}

A_2018_new <- list()
for (j in seq_along(A_2018)) {
  A_2018_new[[j]] <- A_2018[[j]][trim_to_two(A_2018[[j]])]
  }

Then, I want to make a rowbind by for loop for the whole .txt files. Before that, I would like to remove some lines in each txt file, and add one new column with file name. The following is my code.

for (i in 1:length(A_2018_new)) {
  
  for (j in 1:length(A_2018_new[[i]])){
       
    filename <- paste(str_sub(A_2018_new[[i]][j], 1, 14))
        
    assign(filename, read_tsv(complete_file_name, skip = 14, col_names = FALSE), 
           )
    
    Y <- r.scripts %>% str_sub(46, 49)
    MD <- r.scripts %>% str_sub(58, 61)
    HM <- filename %>% str_sub(9, 12)
    Turn <- filename %>% str_sub(14, 14)
    time_minute <- paste(Y, MD, HM, sep="-")
    
    Map(cbind, filename, SampleID = names(filename))
    }
} 

But I didn't get my desired output. I tried to code using other examples. Could anyone help to explain what my code is missing.

You could do something like this: Gerke Lab | Import a Directory of CSV Files at Once Using {purrr} and {readr}

especially the bit about adding a source indicator near the end.

Thanks, @williaml ! When I tried the similar code from the link you provided, it doesn't work for the folders in which there are many sub-folders and many files in each sub-folder. Anyway, thanks a lot for your kind sharing.

1 Like

It does work for sub-folders. You just need to define the appropriate file names with paths.

1 Like

Which one do you mean @martin.R ?

I mean you don't need to do anything complicated by looping through directories.

Simply define which files you want by starting with this:

my_files <- list.file(path = ".../2018/Jan/", pattern = "\\.txt$", full.names = TRUE, recursive = TRUE)

Either adjust the regex pattern or remove elements from the vector as desired, then read them in together:

read_tsv(my_files, id = "file_name")

This will read all the files in my_files together regardless of sub-directories.

1 Like

Thank you @martin.R . If I want to drop some of the files in each sub-folder using the following function, how to integrate it with your code.

trim_to_two <- function(x) {
  runs = rle(gsub("^L1_\\d{4}_\\d{4}_","",x))
  return(cumsum(runs$lengths)[which(runs$lengths > 2)] * -1)
}

I am sorry I am new to using this function.

Just adjust the my_files vector however you see fit. You could use e.g. ?setdiff to deduct the files you don't want from all tsv files.

I don't know enough about your files to be able to advise further.

Hi @martin.R
I have a time series data in a sub-folder under a big folder. The file format is in txt file. The Directory is as follow:
C:/.../.../Astation/2018/L1_0101_0601_A.txt
C:/.../.../Astation/2018/L1_0101_0603_A.txt
C:/.../.../Astation/2018/L1_0101_0605_B.txt
.
.
and so on.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.