Hi,
We process a lot of Excel files and currently do that using Excel macros. I'm looking at converting some of our processes to R as it should make them much more efficient.
The problem I am having is that the nature of our data is that the date column can contain a wide variety of data types. Some examples are simply a date (formatted as dd/mm/ccyy), a year (four digit integer), date ranges (e.g. 20/05/2020 - 07/07/2020), text descriptions ("Summer 2018"), ranges as text ("Mar 1978 - Jan 1979"), year ranges ("1935-1937").
Any of our data files may have dates in a single format or in various formats.
Ultimately we store the "Date" as a string in our database.
I'm trying to force R to treat the date column as text but all my attempts so far have resulted in dttm values or a conversion to an integer. I've tried various different packages including readxl, xlsx, gdata, openxlsx and I've tried specifying the column types on import.
I've tried creating an empty dataframe with the columns defined as character & reading into that but it converts the column when adding the data.
I've also tried converting the date column from a date format to character format, but can't get that to work reliably either.
I could save the Excel files as CSV and try processing them like that but I'd rather avoid extra steps in Excel if I can.
Can anyone help? Is there a particular package that it would be best to persevere with? Despite spending all morning searching I haven't found any examples of anyone trying to do that same thing (most people seem to have character dates that that want to convert to date formats).
Thanks!