Date Adjustment from Character to Date format (Lubridate)

luisferlante · January 13, 2020, 10:55pm

Hi there,

I am very new to R and apologies for any mistakes, since this is my first post.

I have a dataset which I am trying to change the date column data type from Character to Date format.
The current format of the date is: "28-Mar-17 13:58" and categorized as character format.
I've tried:

df1$ETA <- as.POSIXlt(df1$ETA)
str(df1$ETA)
df1$ETA <- parse_date_time(df1$ETA, order = "dmy")
str(df1$ETA)

This is what I get:

 df1$ETA <- as.POSIXlt(df1$ETA)
Error in as.POSIXlt.character(df1$ETA) : 
  character string is not in a standard unambiguous format
> str(df1$ETA)
 chr [1:43187] "21-Feb-17 15:55" "02-Apr-17 11:19" "04-Apr-17 00:00" "20-Apr-17 16:17" "29-Mar-17 00:00" "29-Mar-17 00:00" "10-Apr-17 12:52" ...
> df1$ETA <- parse_date_time(df1$ETA, order = "dmy")
Warning message:
All formats failed to parse. No formats found. 
> str(df1$ETA)
 POSIXct[1:43187], format: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...

Could you please assist me on shining a light on how to address this problem?

Thank you for your time and assistance.
Best regards,
LF.

mattwarkentin · January 13, 2020, 11:09pm

Hi @luisferlante,

The ludridate package has many functions that help with coercing character vectors into date or date-time vectors. Since your date strings are stored as day-month-year hour:minute, you can use the lubridate::dmy_hm() function. Here is an example:

library(lubridate)
library(dplyr)

df1 <- tibble(
  date = c('28-Mar-17 13:58')
)

df1 %>% 
  mutate(date = dmy_hm(date))
#> # A tibble: 1 x 1
#>   date               
#>   <dttm>             
#> 1 2017-03-28 13:58:00

^{Created on 2020-01-13 by the reprex package (v0.3.0)}

technocrat · January 13, 2020, 11:14pm

Beat me to it. All I can add is you can do it on a bare character

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
tofix <-  "28-Mar-17 13:58"
fixed <- ymd_hm(tofix)
class(fixed)
#> [1] "POSIXct" "POSIXt"
fixed
#> [1] "2028-03-17 13:58:00 UTC"

^{Created on 2020-01-13 by the reprex package (v0.3.0)}

luisferlante · January 14, 2020, 1:39am

Thank you for your reply!

I've tried what you suggested, please bear with me since I'm just getting to understand the %>% function.

Trying to adapt what you suggested I got the below:

> #MERGING BOTH DF INTO ONE.
> df1 <- data.frame(merge(eta_dt,etd_dt,  all = TRUE))
> df1 %>% 
+   mutate(df1$ETA = dmy_hm(df1$ETA))
Error: unexpected '=' in:
"df1 %>% 
  mutate(df1$ETA ="

Please find below my full script at the moment, sorry but I have notes in them. I tried using %>% function as you can see to change the column names but could not make it work so had to try another approach.

library(tidyverse)
library(ggplot2)
library(lubridate)


### LOADING DATA
eta_dt <- read_csv(choose.files())
etd_dt <- read_csv(choose.files())
#ett_dt <- read_csv(choose.files())

#ENSURE THEY ARE DF
eta_dt <-  data.frame(eta_dt)
etd_dt <-  data.frame(etd_dt)
#ett_dt <-  data.frame(ett_dt)
class(eta_dt)

#SETTING SHIPMENT ID AS FACTORS for rbind (not sure if necessary)
eta_dt$Shipment.ID <- as.factor(eta_dt$Shipment.ID)
etd_dt$Shipment.ID <- as.factor(etd_dt$Shipment.ID)

#### DATA HAS COLS WITH SAME name FIXING THAT FOR EACH DT SET

#### renaming columns, was trying to use the RENAME FUNCTION but found in stackflow that could use "select" function. pipe does seem 
#to work but doesn't save it, it only works
#when pipped, forum submission query made.

#etd_dt %>% rename(
#  Shipment.Kind = Type_1,
#  Container.Type = Type_2
#)
#class(etd_dt)

colnames(eta_dt)[29] <- 'Shipment.Kind'
colnames(eta_dt)[215] <- 'Container.Type'
colnames(eta_dt[29])
colnames(eta_dt)[215]


######Found below example, of using base R code for this. Another way of doing above.
#names(etd_dt)[names(etd_dt) == 'Type_1'] <- 'Shipment.Kind'
#names(etd_dt)[names(etd_dt) == 'Type_2'] <- 'Container.Type'
#colnames(etd_dt)[29]
#colnames(etd_dt)[215]
#This code pretty much does the following:
#1. names(df) looks into all the names in the df
#2. [names(df) == old.var.name] extracts the variable name you want to check
#3. <- 'new.var.name' assigns the new variable name.

str(eta_dt)
str(etd_dt)

#MERGING BOTH DF INTO ONE.
df1 <- data.frame(merge(eta_dt,etd_dt,  all = TRUE))


#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################


df1 %>% 
  mutate(df1$ETA = dmy_hm(df1$ETA))




#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################





####################################################################################################################################
####################################################################################################################################
#################################  CORRECTING DATA TYPES FOR DF1 TO PROCEED TO ANALYSIS ############################################
####################################################################################################################################
####################################################################################################################################


df1$Consol.ID <- as.factor(df1$Consol.ID)
df1$Shipment.ID <- as.factor(df1$Shipment.ID)
df1$Trans. <- as.factor(df1$Trans.)
df1$Cont. <- as.factor(df1$Cont.)
df1$Origin <- as.factor(df1$Origin)
df1$Dest. <- as.factor(df1$Dest.)
df1$Creating.User.Login <- as.factor(df1$Creating.User.Login)

Thank you for your time once again and thank you so much for assisting me in this learning process.
Best regards,
LF.

andresrcs · January 14, 2020, 1:54am

Syntax is a little different when you work with the pipe operator (%>%) and tidyverse functions, try with this

df1 %>% 
    mutate(ETA = dmy_hm(ETA))

This would be equivalent to this line on a base R style syntax

mutate(.data = df1, ETA = dmy_hm(ETA))

luisferlante · January 14, 2020, 2:01am

Thank you for your reply.

I believe that your approach worked because I could use more familiar syntax of base R.

Please find below my script as reference of how it is at the moment.

library(tidyverse)
library(ggplot2)
library(lubridate)


### LOADING DATA
eta_dt <- read_csv(choose.files())
etd_dt <- read_csv(choose.files())
#ett_dt <- read_csv(choose.files())

#ENSURE THEY ARE DF
eta_dt <-  data.frame(eta_dt)
etd_dt <-  data.frame(etd_dt)
#ett_dt <-  data.frame(ett_dt)
class(eta_dt)

#SETTING SHIPMENT ID AS FACTORS for rbind (not sure if necessary)
eta_dt$Shipment.ID <- as.factor(eta_dt$Shipment.ID)
etd_dt$Shipment.ID <- as.factor(etd_dt$Shipment.ID)

#### DATA HAS COLS WITH SAME name FIXING THAT FOR EACH DT SET

#### renaming columns, was trying to use the RENAME FUNCTION but found in stackflow that could use "select" function. pipe does seem 
#to work but doesn't save it, it only works
#when pipped, forum submission query made.

#etd_dt %>% rename(
#  Shipment.Kind = Type_1,
#  Container.Type = Type_2
#)
#class(etd_dt)

colnames(eta_dt)[29] <- 'Shipment.Kind'
colnames(eta_dt)[215] <- 'Container.Type'
colnames(eta_dt[29])
colnames(eta_dt)[215]


######Found below example, of using base R code for this. Another way of doing above.
#names(etd_dt)[names(etd_dt) == 'Type_1'] <- 'Shipment.Kind'
#names(etd_dt)[names(etd_dt) == 'Type_2'] <- 'Container.Type'
#colnames(etd_dt)[29]
#colnames(etd_dt)[215]
#This code pretty much does the following:
#1. names(df) looks into all the names in the df
#2. [names(df) == old.var.name] extracts the variable name you want to check
#3. <- 'new.var.name' assigns the new variable name.

str(eta_dt)
str(etd_dt)

#MERGING BOTH DF INTO ONE.
df1 <- data.frame(merge(eta_dt,etd_dt,  all = TRUE))


#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################

class(df1$ETA)
df1$ETA <- dmy_hm(df1$ETA)
class(df1$ETA)
df1$ETA

class(df1$ETD)
df1$ETD <- dmy_hm(df1$ETD)
class(df1$ETD)
df1$ETD

class(df1$Created.Time..UTC.)
df1$Created.Time..UTC. <- dmy_hm(df1$Created.Time..UTC.)
class(df1$Created.Time..UTC.)
df1$Created.Time..UTC.

class(df1$Last.Edited.Time..UTC.)
df1$Last.Edited.Time..UTC. <- dmy_hm(df1$Last.Edited.Time..UTC.)
class(df1$Last.Edited.Time..UTC.)
df1$Last.Edited.Time..UTC.

class(df1$Issue.Date)
df1$Issue.Date <- dmy_hm(df1$Issue.Date)
class(df1$Issue.Date)
df1$Issue.Date


str(df1)

#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################





####################################################################################################################################
####################################################################################################################################
#################################  CORRECTING DATA TYPES FOR DF1 TO PROCEED TO ANALYSIS ############################################
####################################################################################################################################
####################################################################################################################################


df1$Consol.ID <- as.factor(df1$Consol.ID)
df1$Shipment.ID <- as.factor(df1$Shipment.ID)
df1$Trans. <- as.factor(df1$Trans.)
df1$Cont. <- as.factor(df1$Cont.)
df1$Origin <- as.factor(df1$Origin)
df1$Dest. <- as.factor(df1$Dest.)
df1$Creating.User.Login <- as.factor(df1$Creating.User.Login)

technocrat · January 14, 2020, 2:35am

Great. Please mark the solution for the benefit of those to follow.

Regarding the %>% operator, and the tidyverse it does take some getting use to. Here's a pattern that might provide some insight. (I've used intermediate variables, but often it's

my_df <- my_df %>% ...

suppressPackageStartupMessages(library(dplyr)) 
head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
sepals <- iris %>% select(Sepal.Length, Sepal.Width, Species)
petals <- iris %>% select(Petal.Length, Petal.Width, Species)
sepals1 <- sepals %>% mutate(Sepal.Length = log(Sepal.Length))
petals1 <- petals %>% mutate(Petal.Width = Petal.Width^2)
iris1 <- inner_join(sepals1, petals1, by="Species") %>% rename(Flower = Species) %>% select(Flower, everything())
head(iris1)
#>   Flower Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 setosa     1.629241         3.5          1.4        0.04
#> 2 setosa     1.629241         3.5          1.4        0.04
#> 3 setosa     1.629241         3.5          1.3        0.04
#> 4 setosa     1.629241         3.5          1.5        0.04
#> 5 setosa     1.629241         3.5          1.4        0.04
#> 6 setosa     1.629241         3.5          1.7        0.16

^{Created on 2020-01-13 by the reprex package (v0.3.0)}

system · January 21, 2020, 2:35am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.