As an environmental statistician I often have to work with multiple messy data files (usually Excel sheets). I can spend a lot of time writing code to read in each of these files/sheets and joining them. Often there are parsing errors and tables have multi-row headers that R functions can't easily read. Often I find errors.
I have thought about trying to write a general purpose script for collecting and joining multiple table of data, but I'm not sure it's feasible. I can also try to standardise the input files more but it's hard to change other people's 20 years of Excel habits.
Does anyone have any tips and tricks you have developed for handling this situation? Maybe you force clients to provide you the data in csv? Maybe you convert each input table to a more R-friendly format (e.g. RDS)? I know one colleague who tends to push all data to a Postgres database. Any ideas?