I am very new to R and apologies for any mistakes, since this is my first post.
I have a dataset which I am trying to change the date column data type from Character to Date format.
The current format of the date is: "28-Mar-17 13:58" and categorized as character format.
I've tried:
df1$ETA <- as.POSIXlt(df1$ETA)
Error in as.POSIXlt.character(df1$ETA) :
character string is not in a standard unambiguous format
> str(df1$ETA)
chr [1:43187] "21-Feb-17 15:55" "02-Apr-17 11:19" "04-Apr-17 00:00" "20-Apr-17 16:17" "29-Mar-17 00:00" "29-Mar-17 00:00" "10-Apr-17 12:52" ...
> df1$ETA <- parse_date_time(df1$ETA, order = "dmy")
Warning message:
All formats failed to parse. No formats found.
> str(df1$ETA)
POSIXct[1:43187], format: NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
Could you please assist me on shining a light on how to address this problem?
Thank you for your time and assistance.
Best regards,
LF.
The ludridate package has many functions that help with coercing character vectors into date or date-time vectors. Since your date strings are stored as day-month-year hour:minute, you can use the lubridate::dmy_hm() function. Here is an example:
library(lubridate)
library(dplyr)
df1 <- tibble(
date = c('28-Mar-17 13:58')
)
df1 %>%
mutate(date = dmy_hm(date))
#> # A tibble: 1 x 1
#> date
#> <dttm>
#> 1 2017-03-28 13:58:00
I've tried what you suggested, please bear with me since I'm just getting to understand the %>% function.
Trying to adapt what you suggested I got the below:
> #MERGING BOTH DF INTO ONE.
> df1 <- data.frame(merge(eta_dt,etd_dt, all = TRUE))
> df1 %>%
+ mutate(df1$ETA = dmy_hm(df1$ETA))
Error: unexpected '=' in:
"df1 %>%
mutate(df1$ETA ="
Please find below my full script at the moment, sorry but I have notes in them. I tried using %>% function as you can see to change the column names but could not make it work so had to try another approach.
library(tidyverse)
library(ggplot2)
library(lubridate)
### LOADING DATA
eta_dt <- read_csv(choose.files())
etd_dt <- read_csv(choose.files())
#ett_dt <- read_csv(choose.files())
#ENSURE THEY ARE DF
eta_dt <- data.frame(eta_dt)
etd_dt <- data.frame(etd_dt)
#ett_dt <- data.frame(ett_dt)
class(eta_dt)
#SETTING SHIPMENT ID AS FACTORS for rbind (not sure if necessary)
eta_dt$Shipment.ID <- as.factor(eta_dt$Shipment.ID)
etd_dt$Shipment.ID <- as.factor(etd_dt$Shipment.ID)
#### DATA HAS COLS WITH SAME name FIXING THAT FOR EACH DT SET
#### renaming columns, was trying to use the RENAME FUNCTION but found in stackflow that could use "select" function. pipe does seem
#to work but doesn't save it, it only works
#when pipped, forum submission query made.
#etd_dt %>% rename(
# Shipment.Kind = Type_1,
# Container.Type = Type_2
#)
#class(etd_dt)
colnames(eta_dt)[29] <- 'Shipment.Kind'
colnames(eta_dt)[215] <- 'Container.Type'
colnames(eta_dt[29])
colnames(eta_dt)[215]
######Found below example, of using base R code for this. Another way of doing above.
#names(etd_dt)[names(etd_dt) == 'Type_1'] <- 'Shipment.Kind'
#names(etd_dt)[names(etd_dt) == 'Type_2'] <- 'Container.Type'
#colnames(etd_dt)[29]
#colnames(etd_dt)[215]
#This code pretty much does the following:
#1. names(df) looks into all the names in the df
#2. [names(df) == old.var.name] extracts the variable name you want to check
#3. <- 'new.var.name' assigns the new variable name.
str(eta_dt)
str(etd_dt)
#MERGING BOTH DF INTO ONE.
df1 <- data.frame(merge(eta_dt,etd_dt, all = TRUE))
#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################
df1 %>%
mutate(df1$ETA = dmy_hm(df1$ETA))
#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################
####################################################################################################################################
####################################################################################################################################
################################# CORRECTING DATA TYPES FOR DF1 TO PROCEED TO ANALYSIS ############################################
####################################################################################################################################
####################################################################################################################################
df1$Consol.ID <- as.factor(df1$Consol.ID)
df1$Shipment.ID <- as.factor(df1$Shipment.ID)
df1$Trans. <- as.factor(df1$Trans.)
df1$Cont. <- as.factor(df1$Cont.)
df1$Origin <- as.factor(df1$Origin)
df1$Dest. <- as.factor(df1$Dest.)
df1$Creating.User.Login <- as.factor(df1$Creating.User.Login)
Thank you for your time once again and thank you so much for assisting me in this learning process.
Best regards,
LF.
I believe that your approach worked because I could use more familiar syntax of base R.
Please find below my script as reference of how it is at the moment.
library(tidyverse)
library(ggplot2)
library(lubridate)
### LOADING DATA
eta_dt <- read_csv(choose.files())
etd_dt <- read_csv(choose.files())
#ett_dt <- read_csv(choose.files())
#ENSURE THEY ARE DF
eta_dt <- data.frame(eta_dt)
etd_dt <- data.frame(etd_dt)
#ett_dt <- data.frame(ett_dt)
class(eta_dt)
#SETTING SHIPMENT ID AS FACTORS for rbind (not sure if necessary)
eta_dt$Shipment.ID <- as.factor(eta_dt$Shipment.ID)
etd_dt$Shipment.ID <- as.factor(etd_dt$Shipment.ID)
#### DATA HAS COLS WITH SAME name FIXING THAT FOR EACH DT SET
#### renaming columns, was trying to use the RENAME FUNCTION but found in stackflow that could use "select" function. pipe does seem
#to work but doesn't save it, it only works
#when pipped, forum submission query made.
#etd_dt %>% rename(
# Shipment.Kind = Type_1,
# Container.Type = Type_2
#)
#class(etd_dt)
colnames(eta_dt)[29] <- 'Shipment.Kind'
colnames(eta_dt)[215] <- 'Container.Type'
colnames(eta_dt[29])
colnames(eta_dt)[215]
######Found below example, of using base R code for this. Another way of doing above.
#names(etd_dt)[names(etd_dt) == 'Type_1'] <- 'Shipment.Kind'
#names(etd_dt)[names(etd_dt) == 'Type_2'] <- 'Container.Type'
#colnames(etd_dt)[29]
#colnames(etd_dt)[215]
#This code pretty much does the following:
#1. names(df) looks into all the names in the df
#2. [names(df) == old.var.name] extracts the variable name you want to check
#3. <- 'new.var.name' assigns the new variable name.
str(eta_dt)
str(etd_dt)
#MERGING BOTH DF INTO ONE.
df1 <- data.frame(merge(eta_dt,etd_dt, all = TRUE))
#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################
class(df1$ETA)
df1$ETA <- dmy_hm(df1$ETA)
class(df1$ETA)
df1$ETA
class(df1$ETD)
df1$ETD <- dmy_hm(df1$ETD)
class(df1$ETD)
df1$ETD
class(df1$Created.Time..UTC.)
df1$Created.Time..UTC. <- dmy_hm(df1$Created.Time..UTC.)
class(df1$Created.Time..UTC.)
df1$Created.Time..UTC.
class(df1$Last.Edited.Time..UTC.)
df1$Last.Edited.Time..UTC. <- dmy_hm(df1$Last.Edited.Time..UTC.)
class(df1$Last.Edited.Time..UTC.)
df1$Last.Edited.Time..UTC.
class(df1$Issue.Date)
df1$Issue.Date <- dmy_hm(df1$Issue.Date)
class(df1$Issue.Date)
df1$Issue.Date
str(df1)
#####################################################################################################################################
############################################ DATES MGMT #############################################################################
#####################################################################################################################################
####################################################################################################################################
####################################################################################################################################
################################# CORRECTING DATA TYPES FOR DF1 TO PROCEED TO ANALYSIS ############################################
####################################################################################################################################
####################################################################################################################################
df1$Consol.ID <- as.factor(df1$Consol.ID)
df1$Shipment.ID <- as.factor(df1$Shipment.ID)
df1$Trans. <- as.factor(df1$Trans.)
df1$Cont. <- as.factor(df1$Cont.)
df1$Origin <- as.factor(df1$Origin)
df1$Dest. <- as.factor(df1$Dest.)
df1$Creating.User.Login <- as.factor(df1$Creating.User.Login)
Great. Please mark the solution for the benefit of those to follow.
Regarding the %>% operator, and the tidyverse it does take some getting use to. Here's a pattern that might provide some insight. (I've used intermediate variables, but often it's