read.table of "list.files()"

dario_gd · February 15, 2019, 7:30pm

hy guys, i have a folder in my pc and i see the file that are in it with the function "list.files()"

so i have:

results <- list.files("../Tests/",recursive = T)

the class of results is "character", and in every position there is the file that i have in the folder

if i do a View(results) is something like this

[2]
101/Testler Export/801-Yurume Ileri/Test_1/340506.txt
[3]
101/Testler Export/801-Yurume Ileri/Test_1/340527.txt
[4]
101/Testler Export/801-Yurume Ileri/Test_1/340535.txt
[5]
101/Testler Export/801-Yurume Ileri/Test_1/340537.txt
[6]
101/Testler Export/801-Yurume Ileri/Test_1/340539.txt
[7]
101/Testler Export/801-Yurume Ileri/Test_1/340540.txt
[[8]
101/Testler Export/801-Yurume Ileri/Test_2/340506.txt

in each position there is a file txt..

I would like to write a "for" in which I say:

if (the file end with "340506" or " 340527")

read the files (that's fine with a "read.table") and merge them, for example with "rbind".

cderv · February 15, 2019, 7:50pm

You can use pattern = argument in list.files to only select the path that meets the pattern. Also fs is another option, see example https://fs.r-lib.org/reference/dir_ls.html

Also you can consider purrr and map_df function to read and row bind in the process.

This example with readxl to read several worksheets to one dataframe can help illustrate the process
https://readxl.tidyverse.org/articles/articles/readxl-workflows.html#concatenate-worksheets-into-one-data-frame

so something like this to complete with your case

fs::dir_ls(folder, regexp = ...) %>%
   purrr::map_df(readr::read_csv)

andresrcs · February 15, 2019, 8:07pm

Applying what Christophe said to your own data, would look like this

library(tidyverse)
list_of_files <- list.files(path = "../Tests/",
                            recursive = TRUE,
                            pattern = "340506\\.txt$|340527\\.txt$",
                            full.names = TRUE)
df <- list_of_files %>%
  map_df(read_table)

And, as I said to you before please ask your questions with a REPRoducible EXample (reprex)

dario_gd · February 16, 2019, 10:40am

sorry, but i'm a beginnner in R and i don't know how to use your answer without an example

dario_gd · February 16, 2019, 10:45am

the first code

list_of_files <- .....

it's ok

the second one, return this on the console

Parsed with column specification:
cols(
  `// Start Time: 0` = col_character()
)

I think because I usually, for the file structure, I opened it with a

df <- read.table (file="340506", header = T, fill = T, skip = 4 )

how can i do the same thing with "read_table" ? (that you used)

andresrcs · February 16, 2019, 12:17pm

It's pretty much the same, but you can also use read.table() if you feel more comfortable with that.

library(tidyverse)
list_of_files <- list.files(path = "../Tests/",
                            recursive = TRUE,
                            pattern = "340506\\.txt$|340527\\.txt$",
                            full.names = TRUE)
df <- list_of_files %>%
  map_df(read.table, header = T, fill = T, skip = 4)

dario_gd · February 16, 2019, 3:43pm

sorry if I keep asking, but I'm not good at all.

ok.. know I have

list_of_files <- list.files(path = "../Tests/",
                            recursive = TRUE,
                            pattern = "340506\\.txt$|340527\\.txt$",
                            full.names = TRUE)
df <- list_of_files %>%
  map_df(read.table, header = T, fill = T, skip = 4)

the results of

list_of_files[1]

is

"../Tests/101/Testler Export/801-Yurume Ileri//Test_1/340535.txt"

i want to add to df, new columns to identifie subject, act and test; i have tried with

  df$subject <- strsplit(list_of_files[1], "/")[[1]][3:3]     
  # return "101"
  df$act     <- strsplit(list_of_files[1], "/")[[1]][6:6]      
 # return "801-yureme ileri"
  df$test    <- strsplit(list_of_files[1], "/")[[1]][7:7]     
 # return " test_1"

so, if i want to add this new columns to the all file read with "read.table" where i can put a "for" like this?

list_of_files <- list.files(path = "../Tests/", header = TRUE, 
                                                    pattern = "340535\\.txt$",
                                                              full.names = TRUE)
df <- list_of_files %>%
  map_df(read.table, header = T, fill = T, skip = 4)

for(i in 1: length(list_of_files)){
  
  df$subject <- strsplit(list_of_files[i], "/")[[1]][3:3]     
  df$act     <- strsplit(list_of_files[i], "/")[[1]][6:6]    
  df$test    <- strsplit(list_of_files[i], "/")[[1]][7:7]   
  df$sensor  <- strsplit(list_of_files[i], "/")[[1]][8:8]

}

cderv · February 16, 2019, 5:56pm

One way would be to do that inside the map

list_of_files <- list.files(path = "../Tests/", header = TRUE, 
                                                    pattern = "340535\\.txt$",
                                                              full.names = TRUE)
df <- list_of_files %>%
  map_df( ~ {
    file_path <- .x
    tab <- read.table(.x, header = T, fill = T, skip = 4)
 
    tab$subject <- strsplit(file_path, "/")[[1]][3:3]     
    tab$act     <- strsplit(file_path, "/")[[1]][6:6]    
    tab$test    <- strsplit(file_path, "/")[[1]][7:7]   
    tab$sensor  <- strsplit(file_path, "/")[[1]][8:8]
    tab
})

The other way would be to use dplyr to manipulate your resulting table. If you name list_of_files, .id argument in map_df, would be useful to store the name in a column. if not you could store the index, then be able to subset list_of_files with this index.
I would suggest reading
https://r4ds.had.co.nz/
to see how to manipulate some data with the tidyverse.

andresrcs · February 16, 2019, 6:08pm

Another way to do it using regular expressions and mutate

library(tidyverse)
library(stringr)

list_of_files <- list.files(path = "../Tests/",
                            recursive = TRUE,
                            pattern = "340506\\.txt$|340527\\.txt$",
                            full.names = TRUE)
df <- list_of_files %>%
  setNames(nm = .) %>% 
  map_df(read.table, header = T, fill = T, skip = 4, .id = "file_name") %>% 
  mutate(subject = str_extract(file_name, "(?<=Tests/)[0-9]+(?=/)"),
         act = str_extract(file_name, "(?<=Export/).+(?=//)")
         )

dario_gd · February 16, 2019, 6:51pm

is returned this error

Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'file_name' not found.

andresrcs · February 16, 2019, 6:53pm

Sorry, I forgot the .id argument

map_df(read.table, header = T, fill = T, skip = 4, .id = "file_name") %>%

This is exactly why we usually ask for a reproducible example, is hard to test the solution beforehand with out sample data.

dario_gd · February 16, 2019, 7:13pm

ok... excuseme Andresrcs, butI'm in crisis.

I also posted another topic with the complete code https://forum.posit.co/t/error-in-data-frame-tmp-subject-value-replacement-has-1-row-data-has-0/24057 .. in the meantime I'm trying your code .. I hope it works

my code I think is not efficient at all .. but I'm not practical for R and I'm trying to do something working at least

dario_gd · February 16, 2019, 7:33pm

your solution is very very fast, to read 12 milion of istances.
still gives me problems .. if I do not ask too much, you can look at the link I put, so as to have a general idea and on the line of what you wrote here, you give me a tip on how to proceed.
I'm sorry and thank you very much

andresrcs · February 16, 2019, 7:46pm

What problems are you talking about? I sincerely don't understand what else are you needing, the solution that I already gave you is almost a direct substitute for your code in the other topic.
Could you elaborate a little more on your request?

dario_gd · February 16, 2019, 8:08pm

can you write please also the code to add columns of " test" and "sensor"

these are two possible "list_of_files[i]

"../Tests//101/Testler Export/801-Yurume Ileri/Test_1/340535.txt"

 "../Tests//102/Testler Export/811-Sandalye/Test_3/340535.txt"

i would in the first case

subject  101
act    "801-Yurume Ileri"
test    " test_3"
sensor "340535"

I have run the code you wrote to me, but in the column of subject it is NA,
in the columns of act is not returned only "801-Yurume Ileri" , but "801-Yurume Ileri/Test_1"

than...
where I can study and learn how to use syntax like you did

(?<=Tests/)[0-9]+(?=/)

so that I can improve.

I do not want to be repetitive, but thank you very much

andresrcs · February 16, 2019, 9:13pm

The file names that you gave last are different than the one you gave first, that is why the code was failing, now should work, at least for the examples you are giving.

library(stringr)
file_name <- c("../Tests/101/Testler Export/801-Yurume Ileri//Test_1/340535.txt",
               "../Tests//101/Testler Export/801-Yurume Ileri/Test_1/340535.txt",
               "../Tests//102/Testler Export/811-Sandalye/Test_3/340535.txt")

str_extract(file_name, "(?<=Tests//?)[0-9]+(?=/)")
#> [1] "101" "101" "102"
str_extract(file_name, "(?<=Export/)[^/]+(?=/+T)")
#> [1] "801-Yurume Ileri" "801-Yurume Ileri" "811-Sandalye"
str_extract(file_name, "(?<=/)[^/]+(?=/[:digit:]+.txt)")
#> [1] "Test_1" "Test_1" "Test_3"
str_extract(file_name, "(?<=/)[:digit:]+(?=.txt)")
#> [1] "340535" "340535" "340535"

^{Created on 2019-02-16 by the reprex package (v0.2.1)}

So, this code should work with your data

library(tidyverse)
library(stringr)

list_of_files <- list.files(path = "../Tests/",
                            recursive = TRUE,
                            pattern = "340506\\.txt$|340527\\.txt$",
                            full.names = TRUE)
df <- list_of_files %>%
  setNames(nm = .) %>% 
  map_df(read.table, header = T, fill = T, skip = 4, .id = "file_name") %>% 
  mutate(subject = str_extract(file_name, "(?<=Tests//?)[0-9]+(?=/)"),
         act = str_extract(file_name, "(?<=Export/)[^/]+(?=/+T)"),
         test = str_extract(file_name, "(?<=/)[^/]+(?=/[:digit:]+.txt)"),
         sensor = str_extract(file_name, "(?<=/)[:digit:]+(?=.txt)")
  )

What I'm using is called "Regular Expressions", if you're unfamiliar with them, I’d recommend starting at

dario_gd · February 16, 2019, 9:37pm

realy thanks. know i tried to study it.

ps : the code work

andresrcs · February 16, 2019, 9:40pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · February 23, 2019, 9:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.