General date manipulations

Hi, I have many files with dates which need converting to date format and fixed.
I have this sample file:

data.source <- data.frame(
         stringsAsFactors = FALSE,
              check.names = FALSE,
    `Registration Number` = c("aaa", "bbb", "ccc", "ddd", "eee"),
             `Event Date` = c("2023-04-30",
                              "2023-04-13","2023-04-27","2023-04-22",
                              "2023-04-27"),
             `Delivery Date` = c("2018-08-20", NA, "2022-05-11", NA, NA),
             `Appointment Date Time` = c(NA,
                              "1900-01-02 00:00:00","2023-04-28 08:30:00",
                              "1900-01-02 00:00:00","1900-01-02 00:00:00"),
          `Callback Date` = c(NA,
                              "2023-03-10 09:00:00","1900-01-02 00:00:00",
                              "1900-01-02 00:00:00","1900-01-02 00:00:00")
)

str(data.source)

I fix dates manually one by one using this code:

colnames(data.source) <- make.names(colnames(data.source), unique = TRUE)
str(data.source)

data.source$Event.Date <- as.Date(data.source$Event.Date)
data.source$Delivery.Date <- as.Date(data.source$Delivery.Date)
data.source$Appointment.Date.Time <- as.Date(data.source$Appointment.Date.Time)
data.source$Callback.Date <- as.Date(data.source$Callback.Date)

summary(data.source)

data.source$Appointment.Date.Time[data.source$Appointment.Date.Time < as.Date("1990-01-01")] <- NA
data.source$Callback.Date[data.source$Callback.Date < as.Date("1990-01-01")] <- NA

summary(data.source)

but this is manual as each file might contain date variables with different names.
I am sure there is a way of recoding all variables containing "date" in their names in one go.

Can we do that in R? Maybe in dplyr or tidyr?

We can use dplyr to mutate across the columns that satisfy ends_with("Date")

library(dplyr)
data.dates <- data.source %>% mutate(across(ends_with("Date"), as.Date))

Almost but same names include "Date" in the middle (or it might be in the beginning)

Also, some dates need fixing using this:

data.source$Appointment.Date.Time[data.source$Appointment.Date.Time < as.Date("1990-01-01")] <- NA
data.source$Callback.Date[data.source$Callback.Date < as.Date("1990-01-01")] <- NA

but I believe this fix might be included in the main code somehow

In that case I suggest we extend the nice solution offered by @salamanderSam somewhat, replacing ends_with by a more general contains

library(dplyr)
data.dates <- data.source %>% 
   mutate(across(contains("Date"), as.Date))

It will apply the as.Date() function to all columns containing the "Date" string, regardless of position.

Thank you, I applied a couple of codes. Now I am wondering how I could fix this type of errors "1990-01-01" globally. Is there any way to fix it somehow? Something like:
if (ends_with("Date") , and <"1990-01-01" replace by NA

in R you can write your own functions; so you can wrap as.Date in your own function and add additional code to replace by NA under conditions.

my_as_date <- function(x){
  y <- as.Date(x)
  y[y < as.Date("1990-01-01")] <- NA
  y
}

data.source %>% 
  mutate(across(contains("Date"), my_as_date))
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.