Dear R experts
I am reading in a .csv file with readr, where the data is mostly dates and times in separate columns. The 'date' columns are fine to convert with col_type and col_date(%d/%m/%Y), but the 'time' values in the columns are not consistently 4 digits to enable col_time(%H%M) to work properly: there is no leading zero in some of the entries (e.g. "931" representing 09:31).
I know that both the stringr function 'str_pad()' or the Base-R function 'sprintf()' can be used to pad the time column digits out to four, and separately, from that point, col_time(%H%M) will correctly convert the format to a time. However I'm struggling to put these two things together in the readr process.
I have tried:
nesting both the functions within the readr process
a pipe to direct the padded 4-digit output of str_pad() to col_time()
padding the column with str_pad() before defining all the columns with col_type
...with no overall success.
I would be very grateful of any insight or suggestions on how to achieve this.
Many thanks
In effect, yes, but the specifics are a little different (though the underlying functions are the same) when you're not actively reading it in (you'll use parse_time(), for example, instead of col_time()). From the readr docs re Column parsers:
Column parsers define how a single column is parsed, or a parse a single vector. Each parser comes in two forms: parse_xxx() which is used to parse vectors that already exist in R and col_xxx() which is used to parse vectors as they are loaded by a read_xxx() function.
Ah - I also hadn't appreciated that parse_xxx() will work on columns.
Managed top get it working in a two-stage process after the readr import, and the times are now converting correctly.
Many thanks indeed for your help!