Parsing Dates in multiple Columns from characters.

Hi there,

Thank you for your time in advanced.

I am confronted with a Tibble imported with read_csv() which contains columns with dates that were considered as characters, but are in fact dates.

I believe R has to recognize these as dates to perform date range filters, therefore I must parse these columns into the correct data type which is date. (Is that correct or as characters R understands it? In all cases, this must be a "good practice" kind of thing right?)

The tibble dates are formated as such:

"16-Jan-17" "2-Mar-17" "4-Mar-17" "9-Feb-17" "8-Mar-17"

I could just repeat this for every column:

parse_date(df_all$`Origin ETD`, "%d-%b-%y")
   [1] "2017-01-16" "2017-03-02" "2017-03-04"

I was looking to create a more elegant solution, I thought lapply() could be used with parse_date().

First I thought of creating a list which I could store temporarily each column together, so after identifying these columns I would save them in a list.

x <- list(df_all$`Origin ETD`,df_all$`Destination ETA`, df_all$`Added`,
 df_all$`First Load`, df_all$`Last Discharge`, df_all$`ETD First Load`,
  df_all$`ETA Last Discharge` ,df_all$`Discharge`, 
df_all$`ETD Load`, df_all$`ETA Discharge`)

lapply(x, parse_date(x, "%d-%b-%y"))

My return was: :man_facepalming:

Error in parse_vector(x, col_date(format), na = na, locale = locale, trim_ws = trim_ws) :
is.character(x) is not TRUE

What would be the best way to do this without just repeating the code and changing the syntactical column name for each case?

I'm learning dplyr and would be interesting to have a base R approach and dplyr approach for learning purposes.

Another approach would be to make my own function, but it is not clear for me how to make a FOR loop which would cycle through syntactical column names.

If I could handle the syntactical column names I believe the for loop would look something like:

x <- c(`Origin ETD`,`Destination ETA`, `Added`, `First Load`, `Last Discharge`, 
`ETD First Load`,
          `ETA Last Discharge` ,`Discharge`, `ETD Load`, `ETA Discharge`)
parse_colums  <- function(x){
for (i in seq_along(x)) {  
parse_date(x[i], "%d-%b-%y")

I believe I am somewhat on the right track... ish... :man_shrugging:

I would greatly appreciate feedback as I try to sharpen my R skills.

Best regards,

1 Like
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>     date
my_dates <- c("16-Jan-17", "2-Mar-17", "4-Mar-17", "9-Feb-17", "8-Mar-17")
dmy(my_dates) -> good_dates
#> [1] "Date"
#> [1] "2017-01-16" "2017-03-02" "2017-03-04" "2017-02-09" "2017-03-08"

Created on 2020-03-26 by the reprex package (v0.3.0)

In conjunction with mutate, of course

1 Like

Hi @luisferlante, you can use the tidyverse command mutate_at() to take care of your date columns, and I think the following should help:

df %>% 
    vars(`Origin ETD`, `ETD First Load`), 
    format = "%d-%b-%y"
1 Like

Hi there,

Thank you for your response!

I tried your suggestion but unfortunately it did not work. This is the first time I've heard mutate_at(), took a look at the documentation but couldn't find anything I could emulate to understand.

> df_all %>% mutate_at(
+   vars(`Origin ETD`,
+        `Destination ETA`,
+        `Added`,
+        `ETD First Load`,
+        `ETA Last Discharge`,
+        `ETD Load`,
+        `ETA Discharge`),
+   as_date, 
+   format = "%d-%b-%y")
Error in strptime(x, format, tz) : invalid 'tz' value

Based on what I understood, the it is not like the regular mutate() that was suggested by technocrat where we have to do something like:

new column = dmy(original column))

I also thought it could be that the dates were not in the same format but I double checked and they are:

> df_all %>% select(`Origin ETD`,
+                   `Destination ETA`,
+                   `Added`,
+                   `ETD First Load`,
+                   `ETA Last Discharge`,
+                   `ETD Load`,
+                   `ETA Discharge`) %>% str()
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':	43856 obs. of  7 variables:
 $ Origin ETD        : chr  "16-Jan-17" "2-Mar-17" "4-Mar-17" "9-Feb-17" ...
 $ Destination ETA   : chr  "21-Feb-17" "2-Apr-17" "4-Apr-17" "20-Apr-17" ...
 $ Added             : chr  "27-Dec-16" "8-Feb-17" "8-Feb-17" "8-Feb-17" ...
 $ ETD First Load    : chr  NA "2-Mar-17" "9-Mar-17" "22-Mar-17" ...
 $ ETA Last Discharge: chr  NA "2-Apr-17" "28-Mar-17" "20-Apr-17" ...
 $ ETD Load          : chr  NA "2-Mar-17" "16-Mar-17" "22-Mar-17" ...
 $ ETA Discharge     : chr  NA "31-Mar-17" "28-Mar-17" "10-Apr-17" ...

Thanks for the assistance.

You may need a .fun = as_date instead of just as_date.

1 Like
format = "%d-%b-%y", tz = ""


He comes to the rescue! Awesome, now if you don't mind, what is the tz?

I looked at the mutate_at() documentation and hasn't seen anything mentioning tz.

It says "_at affects variables selected with a character or vars()"

# Determined columns that require date format changed.
dates_cols <-  c("Origin ETD",
                 "Destination ETA",
                 "Job Opened",
                 "ETD First Load",
                 "ETA Last Discharge",
                 "ETD Load",
                 "ETA Discharge")

# Columns that require proper dates:
# Checking visually dates if the same format %d-%b-%y
df_all %>% select(dates_cols) %>% str()

# Understanding mutate_at():
df_all <- df_all %>% mutate_at(
    vars(dates_cols), # Providing var containing the colnames.
    as_date, # setting the format or could be another function
    format = "%d-%b-%y", # format
    tz = "") %>% # ?? no idea

Is my interpretation of mutate_at() correct? If you could develop this a little I would appreciate it deeply.

Much appreciated, as always!

@luisferlante format and tz are parameters passed to as_date(). So you'll find them in the documentation for as_date() and not mutate_at().


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.