I have a table in excel containing 6-digit csv file IDs in 2 different columns (A, B).
I want to
- create dataframes for all column A files, all column B files.
- combine all the A dfs into an A_all df, all the B dfs into a B_all df.
- merge the A_all, B_all into one DATA df.
I have a function performing #3 but need help on #1 and #2.
I would appreciate some suggestions on how to approach this (preferably without for looping) to get me started
I have the following idea, but am in lack of which functions to use:
1 I start by making a df with 2 columns with file IDs
library (readxl)
file_ids <- read_excel("path/file_ids.xls")
my idea is then to append the file ID number from the file_ids df to the file path
A <- paste("csvfilepath/filename_", file_ids$colA, ".csv", sep="")
B <- paste("csvfilepath/filename_", file_ids$colB, ".csv", sep="")
and create dataframes for each and every A file and each and every B file
A_file_id <- read_csv(A)
B_file_id <- read_csv(B)
2 combine A datasets to an A_all dataset and B datasets to a B_all dataset
A_all <- apply( some full_join/merge function to all the A_file_id dfs
B_all <- apply( some full_join/merge function to all the B_file_id dfs
(Not all A dfs have same number of columns, not all B dfs have same number of columns)
3 merge into one df (due to some missing person_ID)
DATA <- merge(A_all, B_all, by.x="person_ID", by.y="person_ID", all.x = TRUE, all.y = TRUE)
Thanks in advance!