Splitting date/time variable - trouble with code - unstandard time format

EDresearcher · April 25, 2020, 3:56am

I'm a PhD student (not that experienced in R), and I'm trying to recode a string variable, called RecordedDate into two separate variables: a Date variable (stored in a standard date format) and a Time variable (stored in a standard time format...preferably). My dataframe is called edeq2.1.

An example my observation format for this variable is: 7/28/2018 6:43. See below for example dataset:

RecordedDate = c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 6:43", "8/4/2018 8:36")
edeq2.1<- data.frame(RecordedDate)

I tried two methods, as suggested to me:

I tried using the anytime package, as suggested to me, but when for values where the value part of the time is one digit (such as the 7/28/2018 6:43" entry), the time part of the variable is not coded correctly (recoded value = 2018-07-28 00:00). Adding a useR=TRUE argument didn't help.

This was the code:

RecordedDate = c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 6:43", "8/4/2018 8:36")
edeq2.1<- data.frame(RecordedDate)
edeq2.1$RecordedDate_recoded <- anytime::anytime(edeq2.1$RecordedDate))
edeq2.1$Time<- format(edeq2.1$pt, "%H:%M" )
edeq2.1$Date<- format(edeq2.1$pt, "%Y-%m-%d")

I also tried the split string function, strsplit:

edeq2.1$Recoded_split=strsplit(edeq2.1$RecordedDate, " ")

But I'm unsure of how to rename and save the split values after that into a Date and Time variable. Additionally, I could probably convert the Date variable into standard time, but would have trouble doing this for the Time variable because of the nonstandard formatting. Any suggestions to fix my problem would be greatly appreciated! Even a suggestion that helps me separate and name the date and time variable successfully (without reformatting) would still be helpful.

siddharthprabhu · April 25, 2020, 4:53am

Consider using the lubridate package for date and datetime objects. For times without date components, I like using the hms package.

library(tidyverse)

edeq2.1 <- data.frame(RecordedDate = c("8/6/2018 18:56", "7/26/2018 10:43", 
                                       "7/28/2018 6:43", "8/4/2018 8:36"))

edeq2.1 %>% 
  separate(RecordedDate, into = c("Date", "Time"), sep = " ", remove = FALSE) %>%
  mutate(Date = lubridate::as_date(Date, format = "%m/%d/%Y"),
         Time = hms::as_hms(str_c(Time, ":00")))
#>      RecordedDate       Date     Time
#> 1  8/6/2018 18:56 2018-08-06 18:56:00
#> 2 7/26/2018 10:43 2018-07-26 10:43:00
#> 3  7/28/2018 6:43 2018-07-28 06:43:00
#> 4   8/4/2018 8:36 2018-08-04 08:36:00

^{Created on 2020-04-25 by the reprex package (v0.3.0)}

EDresearcher · April 26, 2020, 2:33pm

Thank you so much! This was extremely helpful!

The only problem is there are a couple of weird observations that I want to treat as NA that qualtrics downloads into my dataset for the first two rows. The values are:
{"ImportId":"recordedDate","timeZone":"America/New_York"} & Recorded Date
So my actual dataset I guess is more like:

RecordedDate = c("Recorded Date","{ImportId:recordedDate,timeZone:America/New_York}", "8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 6:43", "8/4/2018 8:36")
edeq2.1<- data.frame(RecordedDate)

Is there anyway to modify this code to ignore values without deleting them from my dataset?

Thank you so much again though! This was extremely helpful!

EDresearcher · April 26, 2020, 3:03pm

Also, I am realizing this code separates in a display table, but doesn't seem store the variables in my dataset, so if I resave my dataset, the "date" and "time" variables were not there.

I still appreciate this code though. It was useful to know, but doesn't fully resolve my problem.

siddharthprabhu · April 26, 2020, 3:15pm

The second row in fact holds valuable information; it specifies the time zone that the timestamps are in.

Given this, I would suggest first converting it into a datetime object with the proper time zone and then doing the rest. That will also have the desired side effect of converting the first two observations into NA values since they do not follow the specified date format.

dplyr does not modify in-place so you need to assign the transformed data frame to a variable to store the results. I did not do this in the previous post. Please see below.

library(tidyverse)

edeq2.1 <- data.frame(RecordedDate = c("Recorded Date","{ImportId:recordedDate,timeZone:America/New_York}", 
                                       "8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 6:43", "8/4/2018 8:36"),
                      stringsAsFactors = FALSE)

edeq2.1 <- edeq2.1 %>% 
  mutate(RecordedDate = lubridate::as_datetime(RecordedDate, format = "%m/%d/%Y %H:%M", tz = "America/New_York")) %>% 
  separate(RecordedDate, into = c("Date", "Time"), sep = " ", remove = FALSE)

edeq2.1
#>          RecordedDate       Date     Time
#> 1                <NA>       <NA>     <NA>
#> 2                <NA>       <NA>     <NA>
#> 3 2018-08-06 18:56:00 2018-08-06 18:56:00
#> 4 2018-07-26 10:43:00 2018-07-26 10:43:00
#> 5 2018-07-28 06:43:00 2018-07-28 06:43:00
#> 6 2018-08-04 08:36:00 2018-08-04 08:36:00

^{Created on 2020-04-26 by the reprex package (v0.3.0)}

Note: I'm not using as_date() and as_hms() here so the resulting columns are just character vectors. If you need them as date and hms objects (useful if you're doing date and time-specific arithmetic), you can convert them using the mutate() call from my last post.

EDresearcher · April 26, 2020, 3:20pm

Thank you so much! This was great! I am new to R, so I appreciate you spending the extra time showing me how to do this. Not only is this useful for this problem, but helps me understand some of the process/packages that will be useful in the future! Thanks again

system · May 3, 2020, 3:20pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.