I have quite a big and messy dataset, where all the dates have different formats:
either 01.03.2022 or 03/01/2022
I would like to transform them to the same format.
So far I managed to change all "/" into a ".", but now the month and the day are in the wrong order.
Does anyone know how to switch strings seperated by the first slash, but not by the first point?
Furthermore it need to applied to the whole dataframe, because all date variables have the same problem.
Thank you so much.
## create a tibble
data <- tibble(
name = c("Josh", "Jasi", "Sophie", "Leni"),
date = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000")
)
## change / to a . in the date
data <- data.frame(lapply(data, function(x) {
gsub("/", ".", x)
}))
## print new data
print(data)
The code throws warnings because of the way case_when evaluates the test conditions but the result is correct.
library(tibble)
#> Warning: package 'tibble' was built under R version 4.1.2
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
data <- tibble(
name = c("Josh", "Jasi", "Sophie", "Leni"),
date = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000")
)
data <- data |> mutate(date = case_when(
str_detect(date, "/") ~ mdy(date),
TRUE ~ dmy(date)
))
#> Warning: 2 failed to parse.
#> Warning: 2 failed to parse.
data
#> # A tibble: 4 x 2
#> name date
#> <chr> <date>
#> 1 Josh 1990-01-17
#> 2 Jasi 1865-09-24
#> 3 Sophie 2000-03-13
#> 4 Leni 2000-04-14
If the date columns follow a naming convention, e.g. the column name always contains the text "date", you can use the across() function within mutate() to affect those columns. I wrote a function to handle the date conversion so that the call to mutate(across()) would be cleaner.
library(dplyr)
library(tibble)
library(stringr)
library(lubridate)
data <- tibble(
name = c("Josh", "Jasi", "Sophie", "Leni"),
date = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000"),
date2 = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000")
)
DateConvert <- function(V) {
case_when(
str_detect(V, "/") ~ mdy(V),
TRUE ~ dmy(V)
)
}
data <- data |> mutate(across(.cols = contains("date"), .fns = DateConvert))
data
# A tibble: 4 x 3
name date date2
<chr> <date> <date>
1 Josh 1990-01-17 1990-01-17
2 Jasi 1865-09-24 1865-09-24
3 Sophie 2000-03-13 2000-03-13
4 Leni 2000-04-14 2000-04-14