How can I convert a column of a dataset from character to datetime?

Todd_Swan · May 24, 2023, 7:41pm

I am new to R and doing my first case study but stuck on my first task from the Google Data Analytics course. I have 2 columns in 1 of 12 datasets that is formatted as 'col_character()' that needs to be changed to the same format of the other datasets as 'col_datetime(format = "") before combining them into one dataset.

I've been testing different suggestions from this forum and other online sources, but none seem to work. Here are a few that I have tried:

m05_2022$started_at <- as_datetime(m05_2022$started_at, format="%m/%d/%Y %H,%M,%S")

m05_2022$started_at <- mdy_hms(m05_2022$started_at)
str(m05_2022)

m05_2022 <- mutate(m05_2022, started_at=as.datetime(started_at), ended_at= as.datetime(ended_at))

Below is the script that I have started with:

title: "R Notebook"
output:
html_document:
df_print: paged

Install required packages

tidyverse for data import and wrangling

lubridate for date functions

ggplot for visualization

install.packages("tidyverse")
library(tidyverse)
library(lubridate)
library(ggplot2)

#Upload datasets

setwd("/Users/ToddS/Desktop/Cyclistic_Data")
getwd()
m05_2022 <- read_csv("202205-divvy-tripdata.csv")
m06_2022 <- read_csv("202206-divvy-tripdata.csv")
m07_2022 <- read_csv("202207-divvy-tripdata.csv")
m08_2022 <- read_csv("202208-divvy-tripdata.csv")
m09_2022 <- read_csv("202209-divvy-tripdata.csv")
m10_2022 <- read_csv("202210-divvy-tripdata.csv")
m11_2022 <- read_csv("202211-divvy-tripdata.csv")
m12_2022 <- read_csv("202212-divvy-tripdata.csv")
m01_2023 <- read_csv("202301-divvy-tripdata.csv")
m02_2023 <- read_csv("202302-divvy-tripdata.csv")
m03_2023 <- read_csv("202303-divvy-tripdata.csv")
m04_2023 <- read_csv("202304-divvy-tripdata.csv")

Compare column names

colnames(m05_2022)
colnames(m06_2022)
colnames(m07_2022)
colnames(m08_2022)
colnames(m09_2022)
colnames(m10_2022)
colnames(m11_2022)
colnames(m12_2022)
colnames(m01_2023)
colnames(m02_2023)
colnames(m03_2023)
colnames(m04_2023)

Inspect the dataframes

str(m05_2022)
str(m06_2022)
str(m07_2022)
str(m08_2022)
str(m09_2022)
str(m10_2022)
str(m11_2022)
str(m12_2022)
str(m01_2023)
str(m02_2023)
str(m03_2023)
str(m04_2023)

Converting m05_2022 columns 'started_at' and 'ended_at' from character to datetime

???

Any insight would be greatly appreciated!

technocrat · May 24, 2023, 7:49pm

The {lubridate} package has functions to do this that are easier to use.

string <- “2023-05-25 12:45:30 PDT”
ymd_hms(string)

Functions for other common formats are included.

Todd_Swan · May 24, 2023, 9:20pm

My apologies, I'm not sure what I'm doing wrong. I used:

library(lubridate)
m05_2022$started_at <- mdy_hms(m05_2022$started_at)

but it's giving me the warning:
Warning: All formats failed to parse. No formats found.

...and the structure is still showing as:
started_at = col_character()

Is this not the correct structure or is there something else that I should be looking for in this column that's causing this not to categorize correctly?

FJCC · May 24, 2023, 10:02pm

I don't see anything wrong with your code. Please post the output of

dput(head(m05_2022))

EconProf · May 25, 2023, 3:45am

You are close, but a typical value for started_at is 11/20/2021 13:45. Notice that there are no seconds, only hour and minute. Try mdy_hm().

EconProf · May 28, 2023, 3:02am

It looks like I was wrong about the started_at and ended_at variables in the csv files. I downloaded the files for April and May of 2020 (202004-divvy-tripdata.csv and 202005-divvy-tripdata.csv) and a typical value is actually "2020-04-26 5:45:14 PM", which the read_csv() function will correctly convert to datetime format!

library(tidyverse) 
# which will load the core packages, including ggplot2, dplyr and lubridate
m04_2020 <- read_csv("202004-divvy-tripdata.csv")

Below is the result of dput(head(m04_2020)) and the structure of that data frame. Note the POSIXct format for both started_at and ended_at.

m04_2020 <- structure(list(ride_id = c("A847FADBBC638E45", "5405B80E996FF60D", 
"5DD24A79A4E006F4", "2A59BBDF5CDBA725", "27AD306C119C6158", "356216E875132F61"
), rideable_type = c("docked_bike", "docked_bike", "docked_bike", 
"docked_bike", "docked_bike", "docked_bike"), started_at = structure(c(1587923114, 
1587143334, 1585763653, 1586263819, 1587205379, 1588269347), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), ended_at = structure(c(1587924723, 1587143823, 1585764516, 
1586264551, 1587208554, 1588269671), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), start_station_name = c("Eckhart Park", "Drake Ave & Fullerton Ave", 
"McClurg Ct & Erie St", "California Ave & Division St", "Rush St & Hubbard St", 
"Mies van der Rohe Way & Chicago Ave"), start_station_id = c(86, 
503, 142, 216, 125, 173), end_station_name = c("Lincoln Ave & Diversey Pkwy", 
"Kosciuszko Park", "Indiana Ave & Roosevelt Rd", "Wood St & Augusta Blvd", 
"Sheridan Rd & Lawrence Ave", "Streeter Dr & Grand Ave"), end_station_id = c(152, 
499, 255, 657, 323, 35), start_lat = c(41.8964, 41.9244, 41.8945, 
41.903, 41.8902, 41.8969), start_lng = c(-87.661, -87.7154, -87.6179, 
-87.6975, -87.6262, -87.6217), end_lat = c(41.9322, 41.9306, 
41.8679, 41.8992, 41.9695, 41.8923), end_lng = c(-87.6586, -87.7238, 
-87.623, -87.6722, -87.6547, -87.612), member_casual = c("member", 
"member", "member", "member", "casual", "member")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

str(m04_2020)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    6 obs. of  13 variables:
#>  $ ride_id           : chr  "A847FADBBC638E45" "5405B80E996FF60D" "5DD24A79A4E006F4" "2A59BBDF5CDBA725" ...
#>  $ rideable_type     : chr  "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
#>  $ started_at        : POSIXct, format: "2020-04-26 17:45:14" "2020-04-17 17:08:54" ...
#>  $ ended_at          : POSIXct, format: "2020-04-26 18:12:03" "2020-04-17 17:17:03" ...
#>  $ start_station_name: chr  "Eckhart Park" "Drake Ave & Fullerton Ave" "McClurg Ct & Erie St" "California Ave & Division St" ...
#>  $ start_station_id  : num  86 503 142 216 125 173
#>  $ end_station_name  : chr  "Lincoln Ave & Diversey Pkwy" "Kosciuszko Park" "Indiana Ave & Roosevelt Rd" "Wood St & Augusta Blvd" ...
#>  $ end_station_id    : num  152 499 255 657 323 35
#>  $ start_lat         : num  41.9 41.9 41.9 41.9 41.9 ...
#>  $ start_lng         : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
#>  $ end_lat           : num  41.9 41.9 41.9 41.9 42 ...
#>  $ end_lng           : num  -87.7 -87.7 -87.6 -87.7 -87.7 ...
#>  $ member_casual     : chr  "member" "member" "member" "member" ...

^{Created on 2023-05-27 with reprex v2.0.2}

system · June 18, 2023, 3:03am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.