How to convert date column which is in class "character'' into a date class.

str(flight_dataset)
tibble [2,201 × 13] (S3: tbl_df/tbl/data.frame)
schedtime : num [1:2201] 1455 1640 1245 1715 1039 ... carrier : chr [1:2201] "OH" "DH" "DH" "DH" ...
deptime : num [1:2201] 1455 1640 1245 1709 1035 ... dest : chr [1:2201] "JFK" "JFK" "LGA" "LGA" ...
distance : num [1:2201] 184 213 229 229 229 228 228 228 228 228 ... date : chr [1:2201] "37987" "37987" "37987" "37987" ...
flightnumber: num [1:2201] 5935 6155 7208 7215 7792 ... origin : chr [1:2201] "BWI" "DCA" "IAD" "IAD" ...
weather : num [1:2201] 0 0 0 0 0 0 0 0 0 0 ... dayweek : num [1:2201] 4 4 4 4 4 4 4 4 4 4 ...
daymonth : num [1:2201] 1 1 1 1 1 1 1 1 1 1 ... tailnu : chr [1:2201] "N940CA" "N405FJ" "N695BR" "N662BR" ...
$ delay : chr [1:2201] "ontime" "ontime" "ontime" "ontime" ...

format.Date(flight_dataset$date)
Error in as.POSIXlt.character(x) :
character string is not in a standard unambiguous format

class(flight_dataset$date)
[1] "character"

flight_dataset$date <- as.POSIXct.Date(flight_dataset$date,tz = "UTC")
Error in unclass(x) * 86400 : non-numeric argument to binary operator

I tried to use as.Date(datafarame$date, format = "%y%m%d").........the output was NA.

I am stuck here can someone guide me please. Thank you.

I don't see the date column in the output from str(), so I can't give you detailed advice. In the as.Date function, set the format argument to show how the character string represents the date, not to the format that you want for the resulting numeric date.

as.Date("3/21/23", format = "%m/%d/%y")
[1] "2023-03-21"

Hi there....

class(flight_dataset$date)
[1] "character"

str(flight_dataset)
tibble [2,201 × 13] (S3: tbl_df/tbl/data.frame)
schedtime : num [1:2201] 1455 1640 1245 1715 1039 ... carrier : chr [1:2201] "OH" "DH" "DH" "DH" ...
deptime : num [1:2201] 1455 1640 1245 1709 1035 ... dest : chr [1:2201] "JFK" "JFK" "LGA" "LGA" ...
distance : num [1:2201] 184 213 229 229 229 228 228 228 228 228 ... date : chr [1:2201] "37987" "37987" "37987" "37987" ...
flightnumber: num [1:2201] 5935 6155 7208 7215 7792 ... origin : chr [1:2201] "BWI" "DCA" "IAD" "IAD" ...
weather : num [1:2201] 0 0 0 0 0 0 0 0 0 0 ... dayweek : num [1:2201] 4 4 4 4 4 4 4 4 4 4 ...
daymonth : num [1:2201] 1 1 1 1 1 1 1 1 1 1 ... tailnu : chr [1:2201] "N940CA" "N405FJ" "N695BR" "N662BR" ...
$ delay : chr [1:2201] "ontime" "ontime" "ontime" "ontime" ...

I hope you can see the str() date column output

str(flight_dataset)
tibble [2,201 × 13] (S3: tbl_df/tbl/data.frame)
schedtime : num [1:2201] 1455 1640 1245 1715 1039 ... carrier : chr [1:2201] "OH" "DH" "DH" "DH" ...
deptime : num [1:2201] 1455 1640 1245 1709 1035 ... dest : chr [1:2201] "JFK" "JFK" "LGA" "LGA" ...
$ distance : num [1:2201] 184 213 229 229 229 228 228 228 228 228 ...

date : chr [1:2201] "37987" "37987" "37987" "37987" ... flightnumber: num [1:2201] 5935 6155 7208 7215 7792 ...
origin : chr [1:2201] "BWI" "DCA" "IAD" "IAD" ... weather : num [1:2201] 0 0 0 0 0 0 0 0 0 0 ...
dayweek : num [1:2201] 4 4 4 4 4 4 4 4 4 4 ... daymonth : num [1:2201] 1 1 1 1 1 1 1 1 1 1 ...
tailnu : chr [1:2201] "N940CA" "N405FJ" "N695BR" "N662BR" ... delay : chr [1:2201] "ontime" "ontime" "ontime" "ontime" ...

FAQ Asking Questions

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

Your date column consists of character representations of integers. I'm guessing those are dates from a spreadsheet. You can convert them with as.Date using the origin argument.

DATE <- c("37987", "37987", "37987", "37987")
> as.Date(as.numeric(DATE), origin = "1899-12-30")
[1] "2004-01-01" "2004-01-01" "2004-01-01" "2004-01-01"

Does the year 2004 make sense for you data?
So,

flight_dataset$date <- as.Date(as.numeric(flight_dataset$date), origin = "1899-12-30")

Dear Sir/Madam,
Greeting.

Honestly, I appreciate your help.

I tried my best to improvise your recent reply and this is the output.

A tibble: 2,201 × 13
schedtime carrier deptime dest distance date flightnumber origin weather dayweek daymonth

1 1455 OH 1455 JFK 184 37987 5935 BWI 0 4 1
2 1640 DH 1640 JFK 213 37987 6155 DCA 0 4 1
3 1245 DH 1245 LGA 229 37987 7208 IAD 0 4 1
4 1715 DH 1709 LGA 229 37987 7215 IAD 0 4 1
5 1039 DH 1035 LGA 229 37987 7792 IAD 0 4 1
6 840 DH 839 JFK 228 37987 7800 IAD 0 4 1
7 1240 DH 1243 JFK 228 37987 7806 IAD 0 4 1
8 1645 DH 1644 JFK 228 37987 7810 IAD 0 4 1
9 1715 DH 1710 JFK 228 37987 7812 IAD 0 4 1
10 2120 DH 2129 JFK 228 37987 7814 IAD 0 4 1

:information_source: 2,191 more rows

:information_source: 2 more variables: tailnu , delay

:information_source: Use print(n = ...) to see more rows

View(flightdelays)
Error in View : object 'flightdelays' not found

flight_data <- read_excel("~/Downloads/SIMPLILEARN/R/Final Project/Flight Delay/flightdelays.xlsx")

View(flight_data)

flight_data$schedtime <- as.POSIXct.numeric(flight_data$schedtime)

The output is as follows

class(flight_data$schedtime)
[1] "POSIXct" "POSIXt"

POSIXct[1:2201], format: "1969-12-31 19:24:15" "1969-12-31 19:27:20" "1969-12-31 19:20:45" ...

flight_dataset$date <- as.Date(as.numeric(flight_dataset$date), origin = "1969-12-31")
Error: object 'flight_dataset' not found
flight_data$date <- as.Date(as.numeric(flight_data$date), origin = "1969-12-31")
Warning message:
In as.Date(as.numeric(flight_data$date), origin = "1969-12-31") :
NAs introduced by coercion

I tried to use 1969-12-31 from the schedtime column.

Please correct me.

You need to understand the differences between R Date values, POSIXct, and spreadsheet dates.

  1. An R Date is the number of days since 1970-01-01. The value of January 16, 2024 is 19738. That is, just over 54 years since 1970-01-01
 TODAY <- as.Date("2024-01-16")
> as.numeric(TODAY)
[1] 19738
> 19738/365.25
[1] 54.0397
  1. POSIXct date-times are the number of seconds since 1970-01-01 00:00:00. It has been about 1.7 billion seconds since 1970-01-01 00:00:00, which is about 54 years
TODAY_Midnight <- as.POSIXct("2024-01-16 00:00:00")
> as.numeric(TODAY_Midnight)
[1] 1705388400
> 1705388400/60/60/24/365.25 #convert to years
[1] 54.0405

You can convert numeric values to dates with either as.Date or as.POSIXct

 as.Date(19738, origin = "1970-01-01")
[1] "2024-01-16"
> as.POSIXct(1705388400)
[1] "2024-01-16 MST"
  1. Spreadsheet dates or date-times are the number of days since 1899-12-30 (ignoring a problem with dates before 1900-03-01). That is why I used
as.Date(as.numeric(DATE), origin = "1899-12-30")

in my previous post. I think those integers are the underlying values of spreadsheet dates.

Your schedtime column has values like 1455 and 1640. If you pass that to as.POSIXct(), it translates 1455 to the date-time that is 1455 seconds after 1970-01-01 00:00:00 UTC and then adjusts for your time zone. That gives you a date-time of 1969-12-31 19:24:15.
I think it is more likely that 1455 represents the time of day 14:55 and the date information is stored separately in the date column as 37987, which is 2004-01-01 according to my previous calculation. Does a date-time of 2004-01-01 14:55 make sense for the first row of your data?

1 Like

Thank you so much for helping me to understand it. I have a few more doubts but I can't take advantage of your goodness. Where can I refer to know these all?

I was going through CRAN to understand it. I checked some books to figure it out but I wasn't able to understand it. I appreciate it.

If you have questions about dates and times, this tutorial looks good, though I have not looked at all of it.

Hi there...
To obtain a strong foundation in the R language

what should I do? Which books do I need to read? I am interested in pursuing data analytics. I am a cook to date. I am new to this field. Trying to learn R from various sources like books, especially and few websites.

Thank you for the R-blogger.

Here is an excellent book for learning R.
https://r4ds.hadley.nz/

Hi there...

I am not getting the date format (%Y%m%d) after using the strptime (). It's the same dataset.

$ date : chr [1:2201] "37987" "37987" "37987" "37987" ...

release_date <- strptime("flightdata$date", format = %Y%m%d)
Error: unexpected SPECIAL in "release_date <- strptime("flightdata$date", format = %Y%"

release_date <- strptime("flightdata$date", format = "%Y%m%d")
print(release_date)
[1] NA --------------------- Why am I getting NA.
class(release_date)
[1] "POSIXlt" "POSIXt"

release_date <- strptime(flightdata$date,format = "%Y%m%d")
class(release_date)
[1] "POSIXlt" "POSIXt"
View(release_date)..............output is NA

What is wrong with me? Please help me.

To convert the class character date column to the class date column I am using the strptime().

strptime() takes in a character and outputs a date-time but the format of the input character has to be provided.

strptime("21/3/23", format = "%d/%m/%y")
[1] "2023-03-21 MDT"

"37987" is not a representation of a date-time using that uses day, month, and year or any of the other formatting codes of strptime() (see the Details section of the help file for strptime()). "37987" is a character version of the number of days since 1899-12-30. strptime() does not have a method to deal with that.
Why are you using strptime() instead of as.Date()?

as.Date(as.numeric("37987"), origin = "1899-12-30")
[1] "2004-01-01"

Hi...

I tried the below code after posting the previous query. I just thought of sharing it with you and clear a few doubts about it...

flightdata$date <- as.Date(as.numeric(flightdata$date,origin = "1970-01-01"))
Warning message:
In as.Date(as.numeric(flightdata$date, origin = "1970-01-01")) :
NAs introduced by coercion

flightdata$date <- as.Date(as.numeric(flightdata$date,origin = "1899-12-30"))

print(flightdata$date)
[1] "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02"
[7] "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02"
[13] "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02"
[19] "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02"
[25] "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02" "2074-01-02"

flightdata$date <- as.Date(as.numeric(flightdata$date,origin = "1970-01-01")) ........How did I get this "2074-01-02"? I also get NA values in the last few columns.

  1. Can I use any arithmetic operations-oriented functions to get 2004-01-01? or am I asking an invalid question?
as.numeric(flightdata$date,origin = "1970-01-01")

this is incorrect... as.numeric() doesnt have an origin= param, rather it is as.Date() that does.

Hi there...

Okay....understood. Thank you.

@Pavan1
To be as clear as possible, you tried this:

flightdata$date <- as.Date(as.numeric(flightdata$date,origin = "1970-01-01"))

You should have used this:

flightdata$date <- as.Date(as.numeric(flightdata$date),origin = "1899-12-30")

The origin date and the placement of parentheses are different.

1 Like

Yes, I tried...

Now, I want to convert schedtime column which is in the class numeric to class time. I have used as.POSIXct () and the output is as follows...

flightdata$schedtime <- as.POSIXct(flightdata$schedtime)
output:- "1969-12-31 19:24:15" "1969-12-31 19:27:20"................I have a feeling it is wrong because of the year 1969 and the timing too.

class(flightdata$schedtime)
[1] "POSIXct" "POSIXt"

I have used as. data.table package to remove the year..... I don't know whether it is a right or wrong way to do it.
flightdata$schedtime <- as.ITime(flightdata$schedtime)

class(flightdata$schedtime)
[1] "ITime"

Kindly let me know my mistake. Thank you

I'm not aware of a class time in base R. The hms package allows converting numbers or characters to a class "hms", which is basically stored as seconds but displayed as a time. If your values in schedtime are really times stored as integers, that is, 1454 represents 14:54, you can write a simple function to convert those to hms values.

library(hms)
DF <- data.frame(schedtime = c(1454, 1640, 1245))
DF
#>   schedtime
#> 1      1454
#> 2      1640
#> 3      1245
TimeConvert <- function(x) {
  hms(seconds = rep(0, length(x)), 
      minutes = as.numeric(substr(x,3,4)), 
      hours = as.numeric(substr(x, 1,2)))
}
DF$schedtime <- TimeConvert(DF$schedtime) 
DF
#>   schedtime
#> 1  14:54:00
#> 2  16:40:00
#> 3  12:45:00

Created on 2024-01-18 with reprex v2.0.2