Best practices with time zones?

I have a large time series data set with time stamps in UTC. I have an auxiliary data set with times in a certain time zone, the local time when the data was recorded (note that this is not my local time zone). I am making plots by specifying time intervals of interest. These are normally things like the morning of a certain date, and they "make sense" in the local time for the auxiliary data.

I am wondering if there is a good way to avoid having to add the time zone string to every date I have to type in. It seems somewhat excessive... but maybe it's the most clear? It's tempting to just add the offset to the UTC dataset and then just use the local times but have the system "think" everything is UTC, but that seems like a bad idea...

I'm generally trying to use lubridate. Let me know if you have any tips on this subject. Thanks!

I am not in any way an expert on such things but I would definitely not mislabel the data. It is too easy to forget you did that and end up in a tangle. You can enter the data without a time zone, so it gets the default UTC and then set it for all of the new data with the lubridate tz() function. You can also shift the original UTC data using the with_tz() function and just work in the local time. I prefer to make a new column of the shifted times, preserving the original values but that is more due to a personal preference for not overwriting original data than anything else.

I believe a good practice would be to store the data as POSIXct. That way all the information about the time zones is retained, and you would avoid a nasty surprise if & when you forget about the timezone shift in later stages of your analysis.

PRG <- as.POSIXct("2020-01-06 12:00", tz = "CET") # midday in Prague
LON <- as.POSIXct("2020-01-06 13:00", tz = "UTC") # one hour + one timezone off from Prague 

print(PRG - LON)
>Time difference of -2 hours
1 Like

Thanks for the replies. Makes sense I think... but here's a concrete example of the sort of thing I have in my code:

starttime=force_tz(ymd_hms("2019-08-01T12:00:00"),"US/Mountain")
endtime=force_tz(ymd_hms("2019-08-31T00:00:00"),"US/Mountain")
plotsubset <- df %>% filter(t_stamp %within% interval(starttime,endtime)) 
ggplot(plotsubset) +  geom_line(aes(x=t_stamp,y=temperature))

And then I have a bunch of repeats of this sort of code with varying start/end times to plot different quantities at different times. So I have the string "US/Mountain" all over the place along with "force_tz" etc. I guess one way of doing that all at once is to put all of my start/end times into a data structure and apply the time zone all at once... Or maybe just something more like:

startraw="2019-08-01T12:00:00"
endraw="2019-08-31T00:00:00"
starttime=force_tz(ymd_hms(startraw),"US/Mountain")
endtime=force_tz(ymd_hms(endraw),"US/Mountain")

And then I can just copy and paste these blocks and separate the local times out onto separate lines. I guess this is really a pretty trivial thing, but it seems like there is clunkiness here. Maybe that's just the way it is with dates and times...

It looks & feels clunky indeed. The copy / pasting feels like a chore, and in addition is prone to risk.

I suggest outsourcing it to a function = declare once, and then call with different parameters of the date to be adjusted.

Would something like this do the trick?

mountainize <- function(timestamp) {
  lubridate::force_tz(lubridate::ymd_hms(timestamp),"US/Mountain")
}

starttime=mountainize("2019-08-01T12:00:00")
endtime=mountainize("2019-08-31T00:00:00")
2 Likes

Yeah, that's a good idea that now seems like it should have been obvious, thanks!

Glad to be of service! :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.