Using as.difftime() but getting 'NA secs' in resulting column

Hello, I am new to R and am working on a homework assignment dealing with cleaning and transforming a dataset. Here is a glimpse of my dataset currently:

> glimpse(q4_2016)
Rows: 683,832
Columns: 20
$ ride_id            <chr> "12979228", "12979227", "12979226", "12979225", "12…
$ started_at         <chr> "12/31/2016 23:57:52", "12/31/2016 23:53:18", "12/3…
$ ended_at           <chr> "1/1/2017 00:06:44", "1/1/2017 00:08:13", "1/1/2017…
$ rideable_type      <chr> "5076", "5114", "1026", "504", "4451", "5643", "48"…
$ tripduration       <dbl> 532, 895, 931, 970, 980, 179, 1863, 1867, 1656, 108…
$ start_staion_id    <dbl> 502, 195, 195, 199, 199, 47, 177, 177, 195, 264, 15…
$ start_station_name <chr> "California Ave & Altgeld St", "Columbus Dr & Rando…
$ end_station_id     <dbl> 258, 25, 25, 35, 35, 125, 140, 140, 195, 52, 42, 77…
$ end_station_name   <chr> "Logan Blvd & Elston Ave", "Michigan Ave & Pearson …
$ member_casual      <chr> "casual", "casual", "casual", "member", "member", "…
$ date               <date> 2016-12-31, 2016-12-31, 2016-12-31, 2016-12-31, 20…
$ month              <chr> "2016-12-31", "2016-12-31", "2016-12-31", "2016-12-…
$ day                <chr> "31", "31", "31", "31", "31", "31", "31", "31", "31…
$ year               <chr> "2016", "2016", "2016", "2016", "2016", "2016", "20…
$ day_of_week        <chr> "Saturday", "Saturday", "Saturday", "Saturday", "Sa…
$ start_date         <chr> "12/31/2016", "12/31/2016", "12/31/2016", "12/31/20…
$ start_time         <chr> "23:57:52", "23:53:18", "23:53:07", "23:51:31", "23…
$ end_date           <chr> "1/1/2017", "1/1/2017", "1/1/2017", "1/1/2017", "1/…
$ end_time           <chr> "00:06:44", "00:08:13", "00:08:38", "00:07:41", "00…
$ ride_length        <drtn> NA secs, NA secs, NA secs, NA secs, NA secs, NA se…

I am trying to change the 'ride_length' column to read as actual seconds and formatted as num. Here is the code chunk that got me to the glimpse up above:

q4_2016$ride_length <- as.difftime(q4_2016$ended_at,q4_2016$started_at)

I tried taking away the 'as.' in the 'as.difftime' function but it returns an error about not being in an standard unambiguous format. Is there something I am missing in my code chunk to fix this issue? Any help would be appreciated.

as.difftime does not subtract two times. It takes in times intervals and converts them to difftime values, like this

as.difftime(c("01:55:22", "01:15:25"))
Time differences in hours
[1] 1.922778 1.256944

You should convert your started_at and ended_at columns to be numeric times and then subtract those two columns to get ride_length. The conversion to numeric times can be done with as.POSIXct() or with the mdy_hms() function from the lubridate package. If you use as.POSIXct(), use the format argument to tell the function the format of the incoming character representation.

Thanks for your help so far. I have lubridate loaded, but when I type in

mdy_hms(started_at)

I get back 'all formats failed to parse'. Am I missing something in syntax? Like

mdy_hms(q4_2016$started_at)

mdy_hms(started_at, format(as.numeric))

?

Try

q4_2016$started_at <- mdy_hms(q4_2016$started_at)

Thanks again. That worked partially. I was able to convert both 'started_at' and 'ended_at' columns to numeric, and successfully subtracted them. My glimpse shows my 'ride_length' column in mins, not secs.
I tried multiplying the column by 60, which I hoped would automatically convert minutes to seconds. No luck. I then tried using the format function and setting the units = 'secs' but this added double quotes around the variables without converting to secs.

Here is my column in the wrong format:

$ ride_length        <drtn> 8.866667 mins, 14.916667 mins, 15.516667 mins, ...

What do I use to convert to seconds? The column type is throwing me off.

You can use as.numeric() to change those values into regular numbers and then do the multiplication.

Thanks. I tried as.numeric and received 'NAs introduced by coercion'.

This is what I typed in:

q4_2016$ride_length <- as.numeric("ride_length")

The column type changed from 'drtn' to 'dbl', which I just learned is double precision, but still essentially the same as numeric. If it technically changed the column to numeric under a different name, why is the resulting column reading as NAs? Is there a way for it to simply read as num?

Here is the kind of calculation I was suggesting.

DF <- data.frame(started_at = c("12/31/2016 23:57:52", "12/31/2016 23:53:18"),
                 ended_at = c("1/1/2017 00:06:44", "1/1/2017 00:08:13"))
DF  
#>            started_at          ended_at
#> 1 12/31/2016 23:57:52 1/1/2017 00:06:44
#> 2 12/31/2016 23:53:18 1/1/2017 00:08:13

library(lubridate)  

DF$started_at <- mdy_hms(DF$started_at)  
DF$ended_at <- mdy_hms(DF$ended_at)  

#method 1
DF$ride_length <- DF$ended_at - DF$started_at

DF$ride_length <- as.numeric(DF$ride_length) * 60

DF
#>            started_at            ended_at ride_length
#> 1 2016-12-31 23:57:52 2017-01-01 00:06:44         532
#> 2 2016-12-31 23:53:18 2017-01-01 00:08:13         895

Created on 2023-10-11 with reprex v2.0.2

In the first row, the ride lasts 8:52 or 532 seconds.

That worked! Thank you so much. Before you go, would you mind giving me an explanation for why my codes weren't working? Here are the ones I used previously before you gave me the working code:

q4_2016$ride_length <- (q4_2016$ended_at - q4_2016$started_at)
q4_2016$ride_length <- as.numeric("ride_length")

You don't have to answer, but it would help me to learn and not repeat this mistake and would be much appreciated.

The first line of your code is fine. The second line returns NA because "ride_length" is just a character value, a piece of text. Running as.numeric("ride_length") has the same effect as as.numeric("HHH"). You might have tried as.numeric(ride_length), with no quotes, and that would return an "object not found" error. What you want to achieve is running as.numeric() on the column named ride_length that is inside of the data frame q4_2016. The syntax for that is as.numeric(q4_2016$ride_length). The notation q4_2016$ride_length means "the thing named ride_length inside of q4_2016".

This has to do with how R handles the relationship between values and names. At the top level, you have what is called the Global Environment. That is where q4_2016 is and you can see that name displayed in the Environment tab at the top right of RStudio. If you run x <- 6, you will make an object (name) x, it will have the value 6, and you will see it in the environment pane. Because q4_2016 is a data frame, it is its own environment and it contains a set of values, a column, named ride_length, But you can't see that ride_length from the Global Environment unless you tell the code to look inside q4_2016 by writing q4_2016$ride_length. You can even make a variable named ride_length in the Global environment without affecting the ride_length inside of q4_2016. Run ride_length <- 10 and you will see ride_length in your environment tab yet q4_2016 will still have a column named ride_length whose value is unaffected. I would avoid having a variable in the Global Environment with the same name as a column in a data frame, just to avoid me getting confused, but R does not care.

Here is a possible source of confusion. You will often see column names of data frames used without the preceding data frame name and $ symbol used inside of tidyverse functions. For example,

library(dplyr)
RideStats <- q4_2016 |> group_by(day_of_week) |> summarize(AvgRide = mean(ride_length)

Don't worry if you are not familiar with those functions. Note that day_of_week and ride_length are bare, without quotes or q4_2016$ prefixed. That works because the functions "know" to look in q4_2016 because it has been passed to the functions by using q4_2016 |>. The group_by() function sees the name day_of_week and looks first in q4_2016. This works in functions written to handle that syntax but not in R functions in general.

1 Like

That really clears up alot for me. I was taught alot of this stuff inside of tidyverse, but I wasn't working with tidyverse loaded for this project apparently. Your explanation has made so much more sense than what I was taught about the basics of R. Now I understand how the $ and " " functions inside and outside of tidyverse. Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.