Hi! I am struggling with the import of a large and very chaotic excel database. Already tried many functions like readxl etc. but it seems like my database is too complicated.
i have 96 worksheets in the file (each is one month, database includes multiple years)
i would like to have them in one large dataframe/tibble
columns are not always the same. range is from 4 to 10 columns per sheet.
i would like to import 4 columns per sheet, but their headers are not precise. eg. "DOB" and "Date of Birth" etc.
Does someone know a workaround in readxl or similar to specify the columns it want to import?
Like providing a vector to look out for while importing?
E.g. column "DOB" in R should contain all variables from columns "DOB", "Date of Birth" or "Birth"
column "Operation" in R should contain all variables from columns "OP", "Operation", or "Procedure" and so on...
If you can't fix the problem using Excel Find & Replace, which is where I would start if I had Excel, can you import every sheet, fix the column names and then select the columns you want? The work flow would look something like the following except that I only fixed two columns.