Convert to standard datetime from odd format

Dear Posit
I need your help to convert odd date formats to a standard date and datetime formats.
desired_date and desired_date2 are my target columns.
Repex is pasted here.
I hope you will help me in this.
Thank you.

mydata <- tibble::tribble(
            ~delivery_mode, ~referral_in_service_type, ~service_date, ~desired_date,        ~desired_date2,
                        1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
                        1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
                        2L,                      202L,     15012025L,  "15/01/2025", "2025-01-15 00:00:00",
                        1L,                      202L,     22012025L,  "22/01/2025", "2025-01-22 00:00:00"
            )
mydata
#> # A tibble: 4 x 5
#>   delivery_mode referral_in_service_type service_date desired_date desired_date2
#>           <int>                    <int>        <int> <chr>        <chr>        
#> 1             1                      202      2012025 20/01/2025   2025-01-20 0~
#> 2             1                      202      2012025 20/01/2025   2025-01-20 0~
#> 3             2                      202     15012025 15/01/2025   2025-01-15 0~
#> 4             1                      202     22012025 22/01/2025   2025-01-22 0~

Created on 2025-08-27 with reprex v2.1.1

Here is one approach. Note I converted your tibble to a data.table just because the syntax is simpler for me.
Then I converted the data back to a tibble---probably not needed but you started with a tibble.

# load packages -----------------------------------------------------------
suppressMessages(library(data.table));
suppressMessages(library(tidyverse))

# Load data and convert to data.table------------------------------------

mydata <- tibble::tribble(
            ~delivery_mode, ~referral_in_service_type, ~service_date, ~desired_date,        ~desired_date2,
                        1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
                        1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
                        2L,                      202L,     15012025L,  "15/01/2025", "2025-01-15 00:00:00",
                        1L,                      202L,     22012025L,  "22/01/2025", "2025-01-22 00:00:00"
            )


DT <- as.data.table(mydata) 

DT[ , desired_date := dmy(desired_date)]
DT[ , desired_date2 := ymd_hms(desired_date2)]
DT[ , desired_date2 := as_date(date(desired_date2))]

dat1 <- as_tibble(DT)

Unless you have specific rules regarding month and day of month priority and leading zeros in service_date, I'm afraid that format is ambiguous.

For example, what would be the valid encoding for the 1st of January?
112025? Or 1012025 ? Or is the latter used for 10th of January instead?
Or what would be the expected date for 1112025? 1st of November? 11th of January?

You'd probably want to trace back to the origin of service_dateand fix it or ask it to be fixed there, even if it's not part of a workflow or process that's directly under your own control.

1 Like

Hi, thanks for your comments.
Yes, i have no control over the source data.
Service date format = dropping a leading zero, so 1012025 is Jan 1, 2025
If you could help convert service date to desired formats then I would really appreciate.
Thank you so much.

@jrkrideau
I need to convert the service date to the desired columns as shown.
service date format = dropping a leading zero in a sense that 1012025 is Jan 1, 2025
Thank you.

If the shorter service dates are missing a leading zero, then why is the desired date for service date 2012025L "20/01/2025" and not "02/01/2025"?

1 Like

@ prubin
Ah, so we convert `service_date" to a character variable, add a 0 to anything that needs it and convert to date? It's too late here for me to try it but it looks feasible.

@prubin @jrkrideau @margusl sorry, repex reposted for correct desired dates to be converted from service date.

Thanks.

mydata <- tibble::tribble(
            ~delivery_mode, ~referral_in_service_type, ~service_date, ~desired_date,        ~desired_date2,
                        1L,                      202L,      2012025L,  "2/01/2025", "2025-01-02 00:00:00",
                        1L,                      202L,      2012025L,  "2/01/2025", "2025-01-02 00:00:00",
                        2L,                      202L,     15012025L,  "15/01/2025", "2025-01-15 00:00:00",
                        1L,                      202L,     22012025L,  "22/01/2025", "2025-01-22 00:00:00"
            )
mydata
#> # A tibble: 4 x 5
#>   delivery_mode referral_in_service_type service_date desired_date desired_date2
#>           <int>                    <int>        <int> <chr>        <chr>        
#> 1             1                      202      2012025 2/01/2025    2025-01-02 0~
#> 2             1                      202      2012025 2/01/2025    2025-01-02 0~
#> 3             2                      202     15012025 15/01/2025   2025-01-15 0~
#> 4             1                      202     22012025 22/01/2025   2025-01-22 0~

Created on 2025-08-28 with reprex v2.1.1

I think this will do what you want. I suspect there is a less verbose way to do it but this seems to work.

# load packages -----------------------------------------------------------
suppressMessages(library(data.table));
suppressMessages(library(tidyverse))

# Load data and convert to data.table------------------------------------

mydata <- tibble::tribble(
            ~delivery_mode, ~referral_in_service_type, ~service_date, ~desired_date,        ~desired_date2,
                        1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
                        1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
                        2L,                      202L,     15012025L,  "15/01/2025", "2025-01-15 00:00:00",
                        1L,                      202L,     22012025L,  "22/01/2025", "2025-01-22 00:00:00"
            )


DT <- as.data.table(mydata) 


# Reformat service_date ---------------------------------------------------
DT[, service_date := as.character(service_date)]

short <- DT[nchar(service_date) == 7]
long <-  DT[nchar(service_date) == 8]

short[, service_date := paste0("0", service_date)]

DT1 <- rbind(short, long) ; DT1

str_sub(DT1$service_date, 3, 3) <-  "-"
str_sub(DT1$service_date, 6, 6) <-  "-"
# Convert dates -----------------------------------------------------------


DT1[,  service_date := dmy(service_date)]
DT1[ , desired_date := dmy(desired_date)]
DT1[ , desired_date2 := ymd_hms(desired_date2)]
DT1[ , desired_date2 := as_date(desired_date2)]

dat1 <- as_tibble(DT)

@jrkrideau I think this reproduces the original (incorrect) results but not the corrected version. In particular, the target for the first row is now "2/01/2025" but I get "20/01/2025" running your code.

Try the following, which I think matches the revised target (with one minor difference: a leading 0 in service dates for January through September).

suppressMessages(library(tidyverse))

mydata <- tibble::tribble(
  ~delivery_mode, ~referral_in_service_type, ~service_date, ~desired_date,        ~desired_date2,
  1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
  1L,                      202L,      2012025L,  "20/01/2025", "2025-01-20 00:00:00",
  2L,                      202L,     15012025L,  "15/01/2025", "2025-01-15 00:00:00",
  1L,                      202L,     22012025L,  "22/01/2025", "2025-01-22 00:00:00"
)

result <- mydata |> 
            select(delivery_mode, referral_in_service_type, service_date) |> 
            mutate(y = service_date %% 10000,
                   m = (service_date %/% 10000) %% 100,
                   d = service_date %/% 1000000) |>
            mutate(desired_date = sprintf("%02d/%02d/%4d", d, m, y),
                   desired_date2 = sprintf("%4d-%02d-%02d 00:00:00", y, m, d)) |>
            select(-c(d, m, y))

Created on 2025-08-28 with reprex v2.1.1

Looks like a stupid typo when I was "cleaning up" the code.

dat1 <- as_tibble(DT)

should read

dat1 <- as_tibble(DT1)

Problem is that such formats can be either engineered or happen by accident, in latter case leading zeros might as well be dropped from all other fields as well, (i.e. from moths)
For quick data check you could run all(nchar(mydata$service_date) > 6), if it's FALSE I'm afraid those expectations are not valid.


You can pad strings to desired length with stringr::str_pad(), coercion from integers happens atomagically; and you can find a list of date/time format specifiers for format() from ?strptime

library(tidyverse)
mydata |> 
  select(matches("date")) |> 
  mutate(
    d     = str_pad(service_date, 8, pad = "0") |> dmy(),
    date  = format(d, "%e/%m/%Y") |> trimws(),
    date2 = format(d, "%F 00:00:00"),
    d     = NULL
  )
#> # A tibble: 4 × 5
#>   service_date desired_date desired_date2       date       date2              
#>          <int> <chr>        <chr>               <chr>      <chr>              
#> 1      2012025 2/01/2025    2025-01-02 00:00:00 2/01/2025  2025-01-02 00:00:00
#> 2      2012025 2/01/2025    2025-01-02 00:00:00 2/01/2025  2025-01-02 00:00:00
#> 3     15012025 15/01/2025   2025-01-15 00:00:00 15/01/2025 2025-01-15 00:00:00
#> 4     22012025 22/01/2025   2025-01-22 00:00:00 22/01/2025 2025-01-22 00:00:00
1 Like

@margusl Thank you very much for your expertise.
I have no idea about regex.
All work.
See you soon.

@margusl Thank you very much. Appreciated

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.