I got some help on SO help this morning working with timezones in lubridate. My problem was that the lubridate functions appear to only accept a timezone of length 1, and I had a situation where I wanted to convert a bunch of strings in one data frame column to a timezone specified in another column.
The answerer suggested wrapping the mutate call in rowwise and ungroup, which works great! But it's not intuitive, and I'm wondering how performant it would be on large datasets. It would be great if the tz arguments in ymd_hms, with_tz, force_tz and others used tidyeval as well so that a column could be specified for them.
I also noticed while working on this problem that when tibbles with date-time columns are printed, they print using one of the time zones in the column—I think maybe the latest time zone in the column or the tibble? This was a little unintuitive to me: I expected that they would print using either UTC or my local time zone. Does anyone know if there's any way to configure this behaviour?
a) The core functions that lubridate is abstracting can't handle a vector of timezones, or
b) The timezone attribute is associated with the entire POSIXct vector, not the individual elements.
Or both. The second explanation could explain why I'm seeing a tibble print out with a single time zone: when I ungroup, vectors with different timezones are getting converted to a common one. Is that a fair guess?
If it is, I guess there's nothing that can be done Time zones suuuuuuuuuck.
I also really like this solution of using a list column! Although list columns can be a bit more complicated to deal with, I think it's important to prioritise predictable output, and using a list column means that you can have elements with different timezones.
That's a good workaround. Another one that I used is to group_by timezone and then convert each group using only one timezone. I've figured that at most there are about 600 timezones (you can see them all with OlsonNames()), so while that is not ideal, it still won't be a giant bottleneck.
But I agree, timezones suck
There is no need to nest/unnest, I think. It can be costly, so group_by/ungroup is likely to be faster and simpler (but that's personal preference, of course).
This should work:
It looks like you are correct. I ran some benchmarking comparing both methods and the accepted answer on SO and yours is the fastest and they all give the same answer on my machine.