Manipulate list files ?

Hello !
I'm a beginner in R and I have some difficulties to import my data in R.

I try to import 40 csv file, all in one document, and I would like then to manipulate them and analyse the data for my thesis.

I could do a list of my files, but I can't find how to manipulate them ; they do not have exactly the same columns (some have one more column that's useless) so I can't bind them. So I really wish I could do a for loop or something after my code to select the column I would like to keep for each of my files (I tried select from dplyr with a for loop but it doesn't seem to work).

I also would like to delete some rows that are useless in my files before binding them, is it possible to also do it for each of my files ?

I didin't wrote much code because they don't work, so I just showed only the code that was working.

Thanks for your answer

#> Code comment

files = list.files("C:\Users\Tomi-\OneDrive\Documents\Articles\MĂ©moire\MĂ©moire\Data\LengthEstimation_23833_2020-05-12_12h51.12_4265b370-944f-11ea-a73b-d05099d383f7", pattern="csv")

data_list = lapply(files, read.csv, sep=",",header = TRUE, row.names=NULL, fill=TRUE)


You can write a function that takes in a data frame and selects the columns you want (or don't want)

clean_columns <- function(data_frame) {
  # add your code to select the right columns
  data_frame %>% select(i_want_this_one, and_this_one)

and then use one of the apply functions; probably lapply

cleaned_data_list <- lapply(data_list, clean_columns)

Then you can try binding.

Does that work?

Well it should probably work, but I was wondering if I could do this for all my files at once to not have to do this 40 times with a for loop

Thank you for your help :slight_smile:

Hmm, it appears bind_rows should create matching columns if they share a name. Have you tried that, and did it give you an error?

Your first answer actually look like it work as magic, I just tried it. Thank you for that. And do you know how I can delete rows using a function like this ?

I wonder that it worked like magic, since select is not base R, you don't know how to select rows and are loop affine. Maybe going back to basics will help you.
For example you can "delete" a row with

x <- x[x$row != 0, ] 

I tried it like this :

cleaned_data_list <- cleaned_data_list[cleaned_data_list$row != 10, ]

But I get an error :

Error in cleaned_data_list[cleaned_data_list$row != 10, ] :
incorrect number of dimensions

Did I do this right ? And how does this work ? The number here means that I will delete the 10th row ?

Oh it's fine I found it ! Thank you mikeR

Ncleaned_data_list <- list()
for (i in 1:n) {
Ncleaned_data_list[[i]] <- cleaned_data_list[[i]][-c(1),]

1 Like

Congrats! You should write a function doing this, and then call the function via lapply,map or something, it'll look much better.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.